Twitter debuted in 2006 as a web application for microblogging. Everything users are able to do on Twitter—such as send tweets, links, images, audio, or video; repost and retweet; send direct messages to specific individuals or groups; and show approval of existing tweets—provides potential data for researchers hoping to better understand changing attitudes and behavior. Guides and software have been created to help academics, individuals, and companies access Twitter data for their own analysis.
Pakistani researchers note that Twitter has “propelled online communities to flourish by enabling people to create, share and disseminate free-flowing messages and information. This may consist of information on certain facts and opinions generated by online communities and individuals relating to current news, new products, policies, political views, travel and services, etc. The content is of immense importance, especially to business communities to receive feedback on their products from customers. This additionally allows users to read reviews posted by other users of products before buying.”
“Twitter is the reigning champion of Internet-based social network analysis,” researcher Bernie Hogan writes. “Although Twitter does not have the sort of market share or audience of Facebook or Sina Weibo, it does have a very convenient Application Programming Interface (API) for accessing a substantial portion of Twitter’s functionality. Furthermore, those who post on Twitter often speak about current political events, protests and social issues, making the analysis of Twitter data particularly ripe.”
Using Twitter for Sentiment Analysis
In the past 10 years, researchers have sought to develop the means to use social commentaries for sentiment or opinion analysis and to automatically generate trends from these huge repositories of online content. Recently, researchers from IBM, Columbia University, and Hamad bin Khalifa University conducted a study on using Twitter data in “detecting whether a piece of text expresses a positive, a negative, or a neutral sentiment; the sentiment can be general or about a specific topic, e.g., a person, a product, or an event.” Examples of the types of sentiment analysis researchers have been using with Twitter and similar programs have been described as including:
- Subjectivity detection: texts are classified as containing expressions of sentiment (subjective) or not (objective).
- Polarity detection: texts are classified as positive or negative overall.
- Sentiment strength detection: texts are classified for the overall strength of positive and/or negative sentiment or for the overall strength of sentiment and its polarity.
- Emotion detection: texts are classified for the predominant emotion (e.g., unhappy, angry), perhaps in addition to its strength, or the degree to which a fixed number of different emotions are evident.
- Aspect-based sentiment analysis: texts are dissected to identify the aspects of a product that are discussed and the sentiments expressed about these aspects.
However, language—especially in a tight 140-character (now 280-character) environment—is far more complex to analyze than researchers originally hoped. Where space is limited, comments are much more likely to be laced with sarcasm, metaphors, irony, humor, similes, oxymorons, symbolism, jargon, parody, regionalisms, and other techniques that make sentiment analysis, especially machine-based analysis, difficult and imprecise. Add to that the use of emojis and other symbols, and the task becomes even more complex—certainly beyond the capability of today’s technologies.
Researching the way people use communication technologies is not new; researchers studied patterns in the early days of the telephone as well. However, the clear dominance of social media for all types of communication, learning, ecommerce, and recreation has resulted in the ready availability of huge stores of electronic data on the messages, profiles, and behavior of individuals that has never existed before. For many researchers, this treasure trove is seen as being key to better understanding how these technologies and applications impact people’s status, opinions, ideas, sense of power, and identity.
How Reliable Are Tweets for Analysis?
Twitter is an unreliable witness to the world’s emotions, according to University of Warwick sociology expert Eric Allen Jensen. Calling this era a “big data gold rush,” he cautions that although the use of such a huge data store is “particularly alluring,” there is no evidence that what is tweeted is a reflection of people’s true feelings.
Jensen notes that “it is important that long-established principles of good social research are not ignored.” When discussing a social media study, he points out something information professionals have always known: “Even if individual words have been reliably scored for this study, the use of a simple dictionary method for categorizing the inherent ‘happiness’ of a tweet is clearly prone to a substantial amount of error (e.g. due to the use of irony, different meanings of words in different contexts, words that negate meaning such as ‘not’, etc.).” Take some of the president’s tweets as an example. They are off-the-cuff and often contradictory, and although they are highly emotional in tenor, we have no standard or comparative way to judge them, since Twitter has existed for such a short time, and there is no data from which to determine the reliability or interpretation of any findings.
“[A]ll of the factors that affect social reality offline also play out online: power, voice, symbolic representation, identity, leadership, struggles over scarce resources and visual representations continue to exert strong influence on the web,” Jensen writes. “This raises complexities that must be addressed before claims about happiness and its causes can be approached using tweets and correlations.”
“About 90 percent of today’s data has been provided during the last two years,” according to Iowa State University researchers writing in November 2017, “and getting insight into this large scale data is not trivial.” A June 2017 study says that there needs to be not only “natural language processing and machine learning to determine the attitude of a writer towards a subject,” but also emotion mining, which balances this with “detecting and classifying [writers’] emotions toward events or topics.” A 2018 study suggests using “fuzzy thesaurus and sentiment replacement” as a way around the current limitations. These researchers from the United Arab Emirates are developing a method that “measures the semantic similarity of tweets with features in the feature space instead of using terms’ presence or frequency feature vectors. Thus, [they] account for the sentiment of the context instead of just counting sentiment words.”
Data, Data Everywhere …
In 2010, the Library of Congress (LC) began archiving every single public tweet ever created. However, the size of this growing dataset quickly outstripped the LC’s ability to keep up. The LC announced in December 2017 that “effective Jan. 1, 2018, [it] will acquire tweets on a selective basis—similar to [their] collections of web sites.” In a white paper, the LC provides details of the new archival programs for tweets, saying that “the tweets collected and archived will be thematic and event-based, including events such as elections, or themes of ongoing national interest, e.g. public policy.” The LC has yet to figure how to make these items accessible and searchable—both the texts and images that comprise much of today’s Twitter communication.
Business consultant Barbara Farfan writes, “With 1.3 billion accounts, and 320 million active users, Twitter is not the most popular social media channel in the world, but it is undeniably the most powerful real-time communication tool, with the power to get employees fired, professional sports heroes dethroned, and governments overthrown.” Social media can’t be ignored; however, it is clear that we still have a long way to go before we can understand, analyze, and benefit from this freewheeling communications titan.