I am pleased to announce the publishing of our latest journal article titled, “The kiss of death – Unearthing conversations surrounding Chagas disease on YouTube”. This study discussed the motivations that attract social media users to YouTube as well as their health belief towards Chagas disease, and how health communication experts can take advantage of various message appeals while conducting health campaigns.
While the world is gripped by the Coronavirus (COVID-19) pandemic, other emerging infectious diseases also remain public health threats having the potential to disrupt daily lives. In recent years, Chagas disease, traditionally endemic in Latin America, especially in rural areas where there is high poverty, has made its way to the United States. It is estimated that at least 300,000 people live with Chagas disease in the United States.
This study employed Uses and Gratification Theory (UGT), Health Belief Model (HBM) and a mix of social media analytics techniques to highlight the important role of social media in health communication. YouTube comments surrounding Chagas disease were analyzed. A web-based software called Netlytic was used to capture and conduct text analytics. The sentiment of user comments on each of the five videos selected for this analysis was measured using SentiStrength.
The study found out that YouTube comments associated with Chagas disease news information that elicited active engagement amongst YouTube users were appreciative, had an element of sympathy, emotional appeal, or were entertaining. 11% of YouTube users had personal experiences with the deadly kissing bugs. Lack of public understanding about Chagas disease necessitated 20% of YouTube users to seek additional information on how to diagnose, prevent, and cure Chagas disease after watching the YouTube videos. In as much as 24% of the YouTube comments were supportive and appreciative of the information about Chagas disease disseminated through the videos, 8% were highly critical of the videos. Unfortunately, 3% of the comments had xenophobic sentiments. However, more than half of the comments were neutral (54.7%). In addition, 82% of YouTube comments had no information about the susceptibility to Chagas disease and thus failed to indicate that Chagas disease is also a threat to residents of the United States.
This study highlighted the great potential for YouTube as a tool for health communication. Significant number of YouTube users in this study had low awareness about the effectiveness of the prevention strategies employed to prevent the spread of the Kissing bug as well as their susceptibility to Chagas disease. This calls for more sustained awareness raising activities since Chagas disease is also a threat to residents of the United States. Sustained health communication campaigns that target policymakers will lead to improvement of the implementation, coverage, access, and quality of health care for Chagas disease patients, including early diagnosis and treatment interventions. Health communication practitioners have been the go-to source for health information, especially of neglected tropical diseases such as Chagas. However, due to the current digital age and concomitant proliferation of social media platforms such as YouTube, social media users affected or living within disease prone environments have turned to social media including YouTube to seek as well as share information about diseases. This change of information landscape necessitates the use of YouTube by health communication professionals as a channel for health communication campaigns.
The study was led by a SMART Lab team member and a PhD student Aggrey Willis.
Otieno, A. W., Roark, J., Khan, M. Laeeq, Pant, S., Grijalva, M. J., & Titsworth, Scott, (2020). The kiss of death – Unearthing conversations surrounding Chagas disease on YouTube, Cogent Social Sciences, 7:1, 1858561, https://doi.org/10.1080/23311886.2020.1858561
Note: A similar version of this post is also available on the SMART Lab website.
I have recently published a book chapter titled, “Big Data and Entrepreneurship” in the Handbook of Media Management and Business. The chapter helps students, industry professionals, and researchers understand big data analytics and the role of data scientists in media management and entrepreneurship. It also brings to light the opportunities and challenges brought by data and analytics.
Media managers should use the following Action Plan as a guide as they develop a big data strategy.
Hire and train skilled data scientists who understand the company goals.
Never break audience trust for the sake of obtaining data. Be transparent about how you are going to use data and user information to increase revenue. Protect the privacy and security of your audience.
Use big data during almost every media production and post-production process.
Remember to also rely on qualitative and hybrid methodologies to better understand why your audience may be behaving in certain ways.
Big data is best characterized by a huge volume of frequently updated data in various formats, including numeric, textual, and images/videos.
The application of big data has proven a major disruptor in today’s media marketplace, especially in the music, film, and advertising industries.
Media managers are able to use big data to better understand audience behavior and better connect them to their product.
Many key challenges still exist in big data, including extracting value data, the rapid spread of misinformation, and privacy/security concerns of audiences.
The chapter offers the following conclusion:
Data and analytics lie at the heart of the digital revolution. Capitalizing on data and leveraging the power of analytics for entrepreneurship and various other hinges on a carefully planned and sustained effort. Big data is already an integral element of the overall business strategy for many media organizations, and it is expected to become even more important for managers and various types and size of business in an increasingly competitive and convergent environment. It is believed that by the end of 2020, the big data volume is expected to surpass 44 trillion gigabytes or 44 zettabytes (EMC, 2014). This indicates a major challenge in terms of data volume and complexity, but also an opportunity that needs to be seized.
Here we can unequivocally see time and speed as the two integral components of big data. In order to turn big data into something useful for media businesses, analytics must be carried out swiftly so that these data can be efficaciously categorized and structured. Media organizations that do not try to stay on top of their analysis of these data might occupy a disadvantaged position. Davenport (2014) highlights the possibility of big data to introduce media organizations with more information and materials about how customers react and behave toward certain products, therefore leading to the proliferation of advertisements, products, services that are customized and created for particular segments of these customers.
While technology allows media organizations to gather more data, more attention should be paid toward how entrepreneurs adapt to a big-data environment and how they make sense of and structure big data. Such a change that embraces the big data analytics requires “considerable imagination, courage, and commitment” as essential entrepreneurial characteristics (Davenport, 2014). Within this context, one can understand the interplay of various and disparate factors that can work together to make the best out of big data. What make big data appear appealing to businesses, corporations, and organizations stems from the notion that big data can reduce cost as well as contribute to the development of new ways to improve data gathering and collection.
While many companies have embraced big data and analytics as part of their strategic mix, a large number still lag behind in full utilization of the data advantage. It is clear that big data analytics enables informed decision making. What is required is the realization of the importance to cultivate a culture that values data. The advent of cloud-based computing has lowered the barriers to the adoption of big data analytics. This has certainly opened up more opportunities especially for small and medium-sized entrepreneurial ventures as they too can embrace analytics technologies to their advantage.
It is worth remembering that in the globally competitive world only the smartest would survive. Being smart implies that companies and organizations are agile, embrace changes, and inculcate newer solutions that help them make informed decisions in a timely manner. Since data is being constantly generated, opportunities also continually expand.
Media scholars are beginning to incorporate big data into their own academic research. Results are being met with a combination of enthusiasm and skepticism. On one end of the spectrum, it is finally possible for the average researcher to deal with datasets that are affordable and include a representative sample. One the other end, new waves of research illustrates that data size really doesn’t matter at all (Davenport, 2014). Instead, it matters much more what you do in your analysis. Academics must be careful not to rely solely on big data, especially those generated on social media. It must be carefully considered which populations are included and excluded from these measurements. However, as more Ph.D. programs train future data scientists in big data measurements, the results should only improve.
Big data has proven itself as one of the biggest drivers of success in today’s convergent environment. Like most things, we must be cautious that just because something is new, it does that mean that it is better. The next chapter will explore the best way to merge “new” concepts and trends in media management with more traditional “old” foundations.
You can cite the book chapter as follows:
Khan, M. Laeeq (2020). Big Data and Entrepreneurship. In L. M. Mahoney & T. Tang (Eds.), Handbook of Media Management and Business (Volume 2, pp. 391-406). Rowman & Littlefield. ISBN-13 : 978-1538115305
You can download a copy of the complete book chapter here: Download PDF
Network analysis can simply be understood as the analysis of social networks. The analysis provides a visual representation of different members (or nodes) within a network, and how they connect to each other.
I use Gephi to visualize networks based on data from Twitter and Facebook. I would highly recommend the following book: Analyzing the Social Web by Jennifer Golbeck (available on Amazon), to understand social network analysis.
Here are a few videos that are useful in understanding social network analysis using Gephi.
The following is a tutorial for conducting a quality sentiment analysis of social media data (in this case Twitter). I describe what sentiment analysis is, how it started, and why it is important. I also offer a sentiment analysis process that I believe sums up the technique. I then introduce a valuable tool called SentiStrength. Following data cleaning and analysis, sentiment is visualized.
As of now there isn’t a comprehensive (or even a brief tutorial) for this tool. So the motivation to write this tutorial stems from this shortage. SentiStrength has already been employed by researchers and findings have been published in a range of scholarly research journals. I am quite confident that you will find this sentiment analysis tutorial beneficial.
What is Sentiment Analysis?
Sentiment analysis is the automated process of understanding opinions and emotions about a given subject from written or spoken language. Sentiment analysis is also known as opinion mining, opinion extraction, sentiment mining, subjectivity analysis, affect analysis, emotion analysis, and review mining.
According to the Merriam-Webster’s Collegiate Dictionary, sentiment is defined as an attitude, thought, or judgment prompted by feeling.
Sentiment analysis presents an active area of research in natural language processing (NLP). NLP is considered a sub-field in artificial intelligence whereby computers are able to interpret and process human language.
How it all started?
Sentiment analysis has been used across various disciplines. It is believed to have started from computer science. Later, management and then social sciences adopted sentiment analysis. Sentiment analysis has been extensively used in linguistic and machine learning studies.
Large corporations have built their own in-house capabilities (e.g., Microsoft, Google, IBM, SAP, and SAS).
Basic Sentiment Analysis: Classifying the polarity of a given text at the document, sentence, or tweet—positive, negative, or neutral.
Advanced Sentiment Analysis: Understanding emotional states. For example, happy, angry, and sad.
Why is it important?
Sentiment analysis has attracted interest from researchers, journalists, companies, and governments. Opinions and sentiments are extracted to create structured and actionable knowledge that can be used by a decision maker.
The advent of social media has increased the value of sentiment analysis. Social networks are not only fueling the digital revolution, but also enabling the expression and spread of emotions and opinions through the network.
Leveraging of new media requires constant monitoring of information. In the political arena, sentiments can determine election outcomes; business carefully guard their brand image and user sentiment on social media needs to be constantly monitored.
Issues in Sentiment Analysis
The most problematic figures of speech in NLP are irony and sarcasm. Another issue is of the rules to detect implicit sentiment (e.g., through misspellings or exclamation marks).
A sentiment analysis program typically achieves 70% accuracy in classifying sentiment .
Human raters typically only agree about 80% (Ogneva, 2012)
Sentiment Analysis Process
What are you interested in knowing? State the research question. Why does it matter? Who cares?
Identify where you want to study the sentiment. Will it be user generated content on social media? (YouTube comments, tweets on Twitter, Facebook posts, blogposts etc.)
Define keywords through which you will get the desired data. Clearly defined search parameters are of vital importance in getting the right kind of data that relates to the initial research questions.
Raw data is full of noise. Data cleaning (especially social media data) requires ample sifting. Spam, fake accounts, data produced by bots, different languages etc. need to be cleaned or removed to create a clean data file.
The clean data file can then be used to run the sentiment analysis.
Once the sentiment analysis is completed, data needs to visualized or put in an organized format to make sense of it.
The program also provides a separate score for each word within a sentence thereby giving the average sentiment strength of the content (e.g. tweet).
Psychologists believe that human emotion can be positive and negative at the same time (Norman et al., 2011). These are commonly known as mixed emotions. Inspired by this psychological reasoning SentiStrength was created to detect both positive and negative sentiment at the simultaneously.
Emotions are socially constructed (Cornelius, 1996; Fox, 2008).
SentiStrength uses a lexical approach. At its heart is a lexicon of I, 125 words and 1,364-word stems, each with a score for positive or negative sentiment. When these match a word in a text then this suggests the presence of sentiment and its strength. For example, ailing has a score of -3 in the lexicon, and so sentences containing this word may have a moderate negative sentiment.
Positive sentiments can include words such as: good, happy, great, fantastic, wonderful, lovely, excited, lovely, nice, and kind. Negative sentiments can include words such as: terrible, lazy, crazy, hurt, bad, and disappointed.
Negation is commonly used when expressing opinions. A positive term that is preceded by a negating word (e.g., not, don’t) has its sentiment flipped by SentiStrength(e.g., I don’t like it), whereas negative terms are neutralized (e.g., I don’t hate you).
Terms preceded by booster words like very and extremely have their positive or negative sentiment strength increased, whereas quite decreases the sentiment strength of the next word. There are also rules for questions, idioms, spelling correction and punctuation as well as rules that are specific to computer- mediated communication methods of expressing sentiment.
As part of this, SentiStrength has a list of emoticons, together with sentiment strength scores for them (e.g., smiley faces like=) score +2).
SentiStrength is very fast and can process 14,000 tweets per second on a standard PC), is transparent (shows how its scores were calculated), and includes other languages (Vural, Cambazoglu, Senkul, & Tokgoz 2013).
Familiarizing with Twitter Data and data cleaning
Before we start the analysis of social media data (in this case tweets), we need to clean the data and bring it in .txt file format so that it can be analyzed for sentiment in SentiStrength.
We will be analyzing the sentiment around the Boeing 737 Max airplane which had caught international news headlines. Twitter data was obtained for this purpose using the keywords “737 Max”.
For this tutorial, download the data file: “737_Max.xls”
Open the Excel file which contains the data for Bowing 737 Max tweets. View the raw datasheet “737_max_Raw”.
The raw data from Twitter is depicted in the screenshot below.
Now click on “Clean_737_Max” worksheet.
You will notice that the data file has been cleaned for (i) Retweets, (ii) languages other than English (iii) text that made no sense.
For example, Spanish language tweets were deleted using the “filter” function in Excel.
We will export this worksheet and save it as a .txt file.
Following is a screenshot of the datafile in .TXT format.
The clean data file just containing the tweets is ready to be analyzed for sentiment in SentiStrength.
Fill in the fields above with your name, email, and organization. You will be prompted to save the zip file on your computer. Save it in a new folder on your computer.
Unzip SentiStrength_Data.zip, then start SentiStrength.exe and point to the unzipped SentiStrength_Data folder.
Click on the .exe file and launch SentiStrength. As you will notice, the most recent version is 2.3
Explore the top menus. “Sentiment Strength Analysis” gives a list of options regarding the type of analysis that can be done. For this tutorial, we will be selecting “Analyse All Texts in File (each line separately)”. This is because our data file in .txt format contains all tweets in separate lines.
The following screenshot depicts the “Sentiment Analysis Options” which allows you to choose how you want your analysis to be done.
We can leave the default options selected.
From the “Sentiment Strength Analysis” menu, we will be selecting “Analyse All Texts in File (each line separately)”. You will be prompted to choose the data file. Select the clean data file in .txt format.
SentiStrength will now analyze the data and prompt to save a data file in which the sentiment has been performed (the file name will have “+results”).
This new file is in .txt format and now has to be imported in Excel so that the analysis can be understood.
Excel has a text import wizard which works when you try to open a .txt file.
As you can see above, there is a sentiment column for negative and positive. There is also a column for emotion rationale which provides the sentiment score next to each word in the tweet.
The final step is to visualize the overall sentiment by creating a new worksheet with the two sentiment columns. While selecting the sentiment columns, click on “Insert” and then select a “Column” chart to create a chart.
You can create even better visualizations using Excel. As you can see in the above depiction, a simple column chart gives a general idea about the overall sentiment from this dataset.
2019 starts with an important research publication concerning the rise of misinformation and the importance of information verification:
Khan, M. L., & Idris, I. K. (2019). Recognize Misinformation and Verify Before Sharing: A Reasoned Action and Information Literacy Perspective, Behavior & Information Technology, https://doi.org/10.1080/0144929X.2019.1578828.
After a rigorous double-blind review by three reviewers, the paper is now finally published in Behavior and Information Technology journal. The topic is important from the perspective of dealing with irresponsible online sharing. Verifying information is vital against the backdrop of rising fake news and spread of misinformation on the Internet. Facebook, Twitter, and YouTube are popular sites for getting news and information. Sometimes, social media sites are criticized for serving as conduits of misinformation. Surveys and polls reveal heightened mistrust amongst people regarding media and information on social media. It is noticed that there is a great deal of carelessness amongst people who share information on social media. Surprisingly, a majority of people share links on Twitter without even reading them!
The spread of misinformation poses real threats for our societies; having various types of negative consequences in the forms of stock price fluctuations, false advertising, health emergencies and crises, and even election outcomes. While social media platforms such as Facebook and Google are seen to be making efforts to tackle fake news and misinformation on their sites, such efforts have often been less than satisfactory. Our research therefore lays emphasis on what individuals can do to tackle misinformation instead of solely relying on social platforms and their fact-checking systems. We argue that users through information sharing behaviors, share responsibility for spreading inaccurate or false information on social media either intentionally or unintentionally.
Rooted in the Theory of Planned Behavior (TPB) and information literacy factors, our research model helps us understand factors that can influence people perceived self-efficacy in recognizing misinformation as well as determine sharing behavior without verification.
This study is based on the premise that as long as individuals try to distinguish false or inaccurate information from the accurate information, there is a lesser likelihood of them being misled. We believe that individuals lie at the center of any efforts in tackling the spread of misinformation. The abstract of the study is as follows:
Abstract: The menace of misinformation online has gained considerable media attention and plausible solutions for combatting misinformation have often been less than satisfactory. In an environment of ubiquitous online social sharing, we contend that it is the individuals that can play a major role in halting the spread of misinformation. We conducted a survey (n = 396) to illuminate the factors that predict (i) the perceived ability to recognize false information on social media, and (ii) the behavior of sharing of information without verification. A set of regression analyses reveal that the perceived self-efficacy to detect misinformation on social media is predicted by income and level of education, Internet skills of information seeking and verification, and attitude towards information verification. We also found that sharing of information on social media without verification is predicted by Internet experience, Internet skills of information seeking, sharing, and verification, attitude towards information verification, and belief in the reliability of information. Recommendations regarding information literacy, the role of individuals as media gatekeepers who verify social media information, and the importance of independent corroboration are discussed.
I was invited by the Office of Vice President for Research and Creativity and the Ohio University to deliver a talk in Café Conversations discussing “Competing with Data Analytics in a Social World” on Wednesday, Sept. 26, at 5 p.m. in the Front Room at Baker University Center.
Responding to the changing environment, organizations are employing data analytics to analyze big data to understand human behavior in digital spaces and make informed decisions that are aligned with their overall goals. Analyzing and visualizing data is a challenging endeavor. Organizations are struggling to understand what analytics can do and how it can provide a competitive advantage.
While the focus is increasingly shifting from descriptive analytics to more sophisticated autonomous analytics, one issue remains the same: extracting value from data and telling a compelling story. Analytics has become a matter of survival, and proper harnessing of these capabilities can provide an organization a distinct advantage.
I was met with a packed audience who were fully engaged and participated in the analytics trivia. Here is a video of the talk:
The annual Association for Education in Journalism and Mass Communication (AEJMC) conference took place in Washington, D.C., from August 6th through August 9th. This year too, the gathering brought together hundreds of research scholars, educators, and practitioners from around the world. The conference is a go to place for anyone interested in knowing more about the latest research in the wider field of Communication.
The AEJMC conference featured a number of high quality presentations, papers and demonstrations. This year, I attended AEJMC to present on a panel discussing the practical, theoretical and ethical challenges and strategies of teaching digital analytics. As Director of the SMART Lab at Ohio University, I shared my experiences regarding ethical challenges of teaching digital analytics. The overall panel comprised the following members:
The Institute for International Journalism (IIJ) in Ohio University’s E.W. Scripps School of Journalism hosts the Study of the U.S. Institute (SUSI) on Journalism and Media program. The program brings together journalism and media scholars from 18 different countries. This year too (SUSI 2018) the scholars attended an insightful and interactive presentation by Dr. Laeeq Khan, Director of SMART Lab and learnt about our work and research in social data analytics.
SUSI program aims to “foster a deeper understanding of the roles that journalism and the media play in U.S. society”. The program has been successfully hosted at Ohio University for the last seven years.
The SUSI program is an annual gathering of scholars and media experts from universities and academic institutions from around the world. The program is funded by the U.S. Department of State. What I really like about the program is how it brings together people from around the world and provides them a life-changing American experience. Not only does it connect diverse individuals but also enriches the environment at Ohio University during summer.
Besides spending time at Ohio University, participants also visit various media outlets in across the United States. Attending the major journalism and media conference–AEJMC included in the overall program. The following countries are represented in the program: Botswana, Burma/Myanmar, Chile, China, Ecuador, Greece, India, Kyrgyz Republic, Lebanon, Malawi, Mongolia, Nepal, Pakistan, Uganda, Ukraine, Vietnam and Zambia.
The Broadcast Education Association (BEA) had its 63rd Annual Convention in Las Vegas, Nevada, from April 07 – 10, 2018. If you haven’t heard about BEA, here is a brief introduction from their website:
“The Broadcast Education Association (BEA) is the premier international academic media organization, driving insights, excellence in media production, and career advancement for educators, students, and professionals. The association’s publications, annual convention, web-based programs, and regional district activities provide opportunities for juried production competition and presentation of current scholarly research related to aspects of the electronic media. These areas include media audiences, economics, law and policy, regulation, news, management, aesthetics, social effects, history, and criticism, among others. BEA is concerned with electronic media curricula, placing an emphasis on interactions among the purposes, developments, and practices of the industry and imparting this information to future professionals. BEA serves as a forum for exposition, analysis and debate of issues of social importance to develop members’ awareness and sensitivity to these issues and to their ramifications, which will ultimately help students develop as more thoughtful practitioners.” [https://www.beaweb.org/]
With colleagues from different universities I presented a panel titled: “Social Media Analytics at Crossroads”.
The pervasive use of social media has led to an increasing realization about the need to measure and assess its impact. Measuring online social activity is commonly seen under the banner of social media analytics which comprises a set of interdisciplinary techniques and methods to evaluate big data. Social media analytics is at crossroads because of its evolving nature. Scholars in business, communication, and informatics domains are active in solving data-related complexities by employing social analytic software. The ability to use such software that allow data to be gathered, analyzed, and visualized, present a unique set of challenges and opportunities. It is this interdisciplinary focus that mandates the creation of a new skills-based curriculum that effectively meets the needs of students in disparate academic domains. This panel aims to provide valuable insights into how these unique challenges are being met.
Other Presentations at the panel were as follows:
From the Structuration Theory to Active within Structures: An Integration of Divergent Audience Approaches – RogerCooper, Ohio University
Communication Research: An Evolving Industry Perspective – MattKaiser, Ohio University
Social Media Analytics at Crossroads – LaeeqKhan, Ohio University
Integrating Big Data with Audience Reception Research – L. MeghanMahoney, West Chester University of Pennsylvania
Measurement in Mass Communication Research: Trends, Opportunities, and Challenges – TangTang, Kent State University
Widespread social media use is impacting just about every aspect of life. Users around the globe are employing social media to connect and engage. Businesses, organizations and state institutions have all found social media to be of immense benefit in reaching their audiences effectively.
My team at the SMART Lab in Scripps College of Communication at Ohio university keenly observes and researches the various social media affordances. One such affordance is the use of social media to promote charitable causes.
We concentrated our efforts to look outside the developed world to understand how users in regions that are directly impacted by the digital divide (both access and skills) employ social media such as Twitter.
We gathered online social data from an interesting and unique charity initiative called the Wall of Kindness that gained significant prominence in Pakistan, Afghanistan, and Iran. Our study is titled:
“Communicating on Twitter for Charity: Understanding the Walls of Kindness Initiative in Afghanistan, Iran, and Pakistan”
The study’s abstract is as follows:
This study highlights the important role of social media for charity through an analysis of tweets about the “Wall of Kindness” charity initiative in Afghanistan, Iran, and Pakistan. Using both quantitative and qualitative methods, we employ a theoretical lens of Social Influence to explore how individuals and organizations employed Twitter to promote charitable initiatives. User engagement on Twitter centered on content sharing, identification through hashtags, and imitative behaviors promoted the Wall of Kindness initiative across countries. Results from the thematic analysis revealed that Twitter users were tweeting about the Wall of Kindness to provide information, encourage donations, inspire others into action, and build an online community. Our content analysis revealed that a majority of the tweets were neutral and supportive of the initiative; users mostly shared textual information, followed by images, videos, tweeted news links, and solicited donations about the Wall of Kindness. Furthermore, media organizations, wall enthusiasts, and journalists were most active in tweeting about the charity initiative. Implications for future research are discussed.