Using a narrative style is an effective way to communicate health information both on and off social media. Given the amount of misinformation being spread online and its potential negative effects, it is crucial to investigate the interplay between narrative communication style and misinformative health content on user engagement on social media platforms. To explore this in the context of Twitter, we start with previously annotated health misinformation tweets (n \approx15,000) and annotate a subset of the data (n=3,000) for the presence of narrative style. We then use these manually assigned labels to train text classifiers, experimenting with supervised fine-tuning and in-context learning for automatic narrative detection. We use our best model to label remaining portion of the dataset, then statistically analyze the relationship between narrative style, misinformation, and user-level features on engagement, finding that narrative use is connected to increased tweet engagement and can, in some cases, lead to increased engagement with misinformation. Finally, we analyze the general categories of language used in narratives and health misinformation in our dataset.
ESEM
ToxiSpanSE: An Explainable Toxicity Detection in
Code Review Comments
Jaydeb Sarker,
Sayma Sultana,
Steven R. Wilson,
and Amiangshu Bosu
In Proceedings of the 17th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement
Oct
2023
Background: The existence of toxic conversations in open-source platforms can degrade relationships among software developers and may negatively impact software product quality. To help mitigate this, some initial work has been done to detect toxic comments in the Software Engineering (SE) domain.Aims: Since automatically classifying an entire text as toxic or non-toxic does not help human moderators to understand the specific reason(s) for toxicity, we worked to develop an explainable toxicity detector for the SE domain. Method: Our explainable toxicity detector can detect specific spans of toxic content from SE texts, which can help human moderators by automatically highlighting those spans. This toxic span detection model, ToxiSpanSE, is trained with the 19,651 code review (CR) comments with labeled toxic spans. Our annotators labeled the toxic spans within 3,757 toxic CR samples. We explored several types of models, including one lexicon-based approach and five different transformer-based encoders. Results: After an extensive evaluation of all models, we found that our fine-tuned RoBERTa model achieved the best score with 0.88 F1, 0.87 precision, and 0.93 recall for toxic class tokens, providing an explainable toxicity classifier for the SE domain. Conclusion: Since ToxiSpanSE is the first tool to detect toxic spans in the SE domain, this tool will pave a path to combat toxicity in the SE community.
SemEval
Mr-wallace at SemEval-2023 Task 5: Novel Clickbait Spoiling Algorithm
Using Natural Language Processing
Vineet Saravanan,
and Steven Wilson
In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Jul
2023
Clickbait creates a nuisance in the online experience by creating a lure towards poor content in order to generate ad revenue. With the use of natural language processing models, we can save users time and reduce the need to follow clickbait links. Task 5 at SemEval-2023 focused on precisely this problem and was broken into two steps: identifying the clickbait spoiler type and then identifying the clickbait itself. Our approach involves the use of fine-tuned text classification and question-answering models. Our classification model is able to determine the type of clickbait with 65.3% accuracy. The question-answering model exactly spoiled clickbait generated around 42.5% of the time. Efforts toward solving this task may have an impact by helping to save users’ time and quickly give an insight into the answer of what the clickbait/article is about.
ICWSM
What Are You Anxious About? Examining Subjects of Anxiety during the COVID-19 Pandemic
Lucia Chen,
Steven Wilson,
Daniela Negraia,
and Sophie Lohmann
In Proceedings of the 17th International AAAI Conference on Web and Social Media
Jun
2023
COVID-19 poses disproportionate mental health consequences to the public during different phases of the pandemic. We use a computational approach to capture the specific aspects that trigger an online community’s anxiety about the pandemic and investigate how these aspects change over time. First, we identified nine subjects of anxiety (SOAs) in a sample of Reddit posts (N=86) from r/COVID19_support using thematic analysis. Then, we quantified Reddit users’ anxiety by training algorithms on a manually annotated sample (N=793) to automatically label the SOAs in a larger chronological sample (N=6,535). The nine SOAs align with items in various recently developed pandemic anxiety measurement scales. We observed that Reddit users’ concerns about health risks remained high in the first eight months of the pandemic. These concerns diminished dramatically despite the surge of cases occurring later. In general, users’ language disclosing the SOAs became less intense as the pandemic progressed. However, worries about mental health and the future increased steadily throughout the period covered in this study. People also tended to use more intense language to describe mental health concerns than health risks or death concerns. Our results suggest that this online group’s mental health condition does not necessarily improve despite COVID-19 gradually weakening as a health threat due to appropriate countermeasures. Our system lays the groundwork for population health and epidemiology scholars to examine aspects that provoke pandemic anxiety in a timely fashion.
2022
EvoNLP
Leveraging time-dependent lexical features for offensive language detection
Barbara McGillivray,
Malithi Alahapperuma,
Jonathan Cook,
Chiara Di Bonaventura,
Albert Meroño-Peñuela,
Gareth Tyson,
and Steven Wilson
In Proceedings of the The First Workshop on Ever Evolving NLP (EvoNLP)
Dec
2022
We present a study on the integration of time-sensitive information in lexicon-based offensive language detection systems. Our focus is on Offenseval sub-task A, aimed at detecting offensive tweets. We apply a semantic change detection algorithm over a short time span of two years to detect words whose semantics has changed and we focus particularly on those words that acquired or lost an offensive meaning between 2019 and 2020. Using the output of this semantic change detection approach, we train an SVM classifier on the Offenseval 2019 training set. We build on the already competitive SINAI system submitted to Offenseval 2019 by adding new lexical features, including those that capture the change in usage of words and their association with emerging offensive usages. We discuss the challenges, opportunities and limitations of integrating semantic change detection in offensive language detection models. Our work draws attention to an often neglected aspect of offensive language, namely that the meanings of words are constantly evolving and that NLP systems that account for this change can achieve good performance even when not trained on the most recent training data.
EMNLP-Findings
Sarcasm Detection is Way Too Easy! An Empirical Comparison of Human and Machine Sarcasm Detection
Ibrahim Abu Farha,
Steven Wilson,
Silviu Oprea,
and Walid Magdy
In Findings of the Association for Computational Linguistics: EMNLP 2022
Dec
2022
Recently, author-annotated sarcasm datasets, which focus on intended, rather than perceived sarcasm, have been introduced. Although datasets collected using first-party annotation have important benefits, there is no comparison of human and machine performance on these new datasets. In this paper, we collect new annotations to provide human-level benchmarks for these first-party annotated sarcasm tasks in both English and Arabic, and compare the performance of human annotators to that of state-of-the-art sarcasm detection systems. Our analysis confirms that sarcasm detection is extremely challenging, with individual humans performing close to or slightly worse than the best trained models. With majority voting, however, humans are able to achieve the best results on all tasks. We also perform error analysis, finding that some of the most challenging examples are those that require additional context. We also highlight common features and patterns used to express sarcasm in English and Arabic such as idioms and proverbs. We suggest that to better capture sarcasm, future sarcasm detection datasets and models should focus on representing conversational and cultural context while leveraging world knowledge and common sense.
COLING
SOS: Systematic Offensive Stereotyping Bias in Word Embeddings
Fatma Elsafoury,
Steven R. Wilson,
Katsigiannis Stamos,
and Naeem Ramzan
In Proceedings of the 29th International Conference on Computational Linguistics
Oct
2022
Systematic Offensive stereotyping (SOS) in word embeddings could lead to associating marginalised groups with hate speech and profanity, which might lead to blocking and silencing those groups, especially on social media platforms. In this work, we introduce a quantitative measure of the SOS bias, validate it in the most commonly used word embeddings, and investigate if it explains the performance of different word embeddings on the task of hate speech detection. Results show that SOS bias exists in almost all examined word embeddings and that the proposed SOS bias metric correlates positively with the statistics of published surveys on online extremism. We also show that the proposed metric reveals distinct information compared to established social bias metrics. However, we do not find evidence that SOS bias explains the performance of hate speech detection models based on the different word embeddings.
SocInfo
Don’t Take it Personally: Analyzing Gender and Age Differences in Ratings of Online Humor
J. A. Meaney,
Steven Wilson,
Luis Chiruzzo,
and Walid Magdy
In Proceedings of the 13th International Conference on Social Informatics
Oct
2022
Computational humor detection systems rarely model the subjectivity of humor responses, or consider alternative reactions to humor - namely offense. We analyzed a large dataset of humor and offense ratings by male and female annotators of different age groups. We find that women link these two concepts more strongly than men, and they tend to give lower humor ratings and higher offense scores. We also find that the correlation between humor and offense increases with age. Although there were no gender or age differences in humor detection, women and older annotators signalled that they did not understand joke texts more often than men. We discuss implications for computational humor detection and downstream tasks.
SocialNLP
A Comparative Study on Word Embeddings and Social NLP Tasks
Fatma Elsafoury,
Steven R. Wilson,
and Naeem Ramzan
In Proceedings of the Tenth International Workshop on Natural Language Processing for Social Media
Jul
2022
In recent years, gray social media platforms, those with a loose moderation policy on cyberbullying, have been attracting more users. Recently, data collected from these types of platforms have been used to pre-train word embeddings (social-media-based), yet these word embeddings have not been investigated for social NLP related tasks. In this paper, we carried out a comparative study between social-media-based and non-social-media-based word embeddings on two social NLP tasks: Detecting cyberbullying and Measuring social bias. Our results show that using social-media-based word embeddings as input features, rather than non-social-media-based embeddings, leads to better cyberbullying detection performance. We also show that some word embeddings are more useful than others for categorizing offensive words. However, we do not find strong evidence that certain word embeddings will necessarily work best when identifying certain categories of cyberbullying within our datasets. Finally, We show even though most of the state-of-the-art bias metrics ranked social-media-based word embeddings as the most socially biased, these results remain inconclusive and further research is required.
Narrative
Narrative Detection and Feature Analysis in Online Health Communities
Achyutarama Ganti,
Steven Wilson,
Zexin Ma,
Xinyan Zhao,
and Rong Ma
In Proceedings of the 4th Workshop of Narrative Understanding (WNU2022)
Jul
2022
Narratives have been shown to be an effective way to communicate health risks and promote health behavior change, and given the growing amount of health information being shared on social media, it is crucial to study health-related narratives in social media. However, expert identification of a large number of narrative texts is a time consuming process, and larger scale studies on the use of narratives may be enabled through automatic text classification approaches. Prior work has demonstrated that automatic narrative detection is possible, but modern deep learning approaches have not been used for this task in the domain of online health communities. Therefore, in this paper, we explore the use of deep learning methods to automatically classify the presence of narratives in social media posts, finding that they outperform previously proposed approaches. We also find that in many cases, these models generalize well across posts from different health organizations. Finally, in order to better understand the increase in performance achieved by deep learning models, we use feature analysis techniques to explore the features that most contribute to narrative detection for posts in online health communities.
SemEval
SemEval-2022 Task 6: iSarcasmEval, Intended Sarcasm Detection in English and Arabic
Ibrahim Abu Farha,
Silviu Vlad Oprea,
Steven Wilson,
and Walid Magdy
In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Jul
2022
iSarcasmEval is the first shared task to target intended sarcasm detection: the data for this task was provided and labelled by the authors of the texts themselves. Such an approach minimises the downfalls of other methods to collect sarcasm data, which rely on distant supervision or third-party annotations. The shared task contains two languages, English and Arabic, and three subtasks: sarcasm detection, sarcasm category classification, and pairwise sarcasm identification given a sarcastic sentence and its non-sarcastic rephrase. The task received submissions from 60 different teams, with the sarcasm detection task being the most popular. Most of the participating teams utilised pre-trained language models. In this paper, we provide an overview of the task, data, and participating teams.
NLPerspectives
Analyzing the Effects of Annotator Gender Across NLP Tasks
Laura Biester,
Vanita Sharma,
Ashkan Kazemi,
Naihao Deng,
Steven Wilson,
and Rada Mihalcea
In First Workshop on Perspectivist Approaches to NLP
Jun
2022
Recent studies have shown that for subjective annotation tasks, the demographics, lived experiences, and identity of annotators can have a large impact on how items are labeled. We expand on this work, hypothesizing that gender may correlate with differences in annotations for a number of NLP benchmarks, including those that are fairly subjective (e.g., affect in text) and those that are typically considered to be objective (e.g., natural language inference). We develop a robust framework to test for differences in annotation across genders for four benchmark datasets. While our results largely show a lack of statistically significant differences in annotation by males and females for these tasks, the framework can be used to analyze differences in annotation between various other demographic groups in future work. Finally, we note that most datasets are collected without annotator demographics and released only in aggregate form; we call on the community to consider annotator demographics as data is collected, and to release dis-aggregated data to allow for further work analyzing variability among annotators.
WebSci
LIWC-UD: Classifying Online Slang Terms into LIWC Categories
Mohamed Bahgat,
Steven Wilson,
and Walid Magdy
In Proceedings of the 14th International ACM Conference on Web Science
Jun
2022
Linguistic Inquiry and Word Count (LIWC), a popular tool for automated text analysis, relies on an expert-crafted internal dictionary of psychologically relevant words and their corresponding categories. While LIWC’s dictionary covers a significant portion of commonly used words, the continuous evolution of language and the usage of slang in settings such as social media requires fixed resources to be frequently updated in order to stay relevant. In this work we present LIWC-UD, an automatically generated extension to LIWC’s dictionary which includes terms defined in Urban Dictionary. While original LIWC contains 6,547 unique entries, LIWC-UD consists of 141K unique terms automatically categorized into LIWC categories with high confidence using BERT classifier. LIWC-UD covers many additional terms that are commonly used on social media platforms like Twitter. We release LIWC-UD publicly to the community as a supplement to the original LIWC lexicon.
ACL
Should a Chatbot be Sarcastic? Understanding User Preferences Towards Sarcasm Generation
Silviu Vlad Oprea,
Steven Wilson,
and Walid Magdy
In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
May
2022
Previous sarcasm generation research has focused on how to generate text that people perceive as sarcastic to create more human-like interactions. In this paper, we argue that we should first turn our attention to the question of when sarcasm should be generated, finding that humans consider sarcastic responses inappropriate to many input utterances. Next, we use a theory-driven framework for generating sarcastic responses, which allows us to control the linguistic devices included during generation. For each device, we investigate how much humans associate it with sarcasm, finding that pragmatic insincerity and emotional markers are devices crucial for making sarcasm recognisable.
2021
EMNLP-Demo
Chandler: An Explainable Sarcastic Response Generator
Silviu Oprea,
Steven Wilson,
and Walid Magdy
In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Nov
2021
We introduce Chandler, a system that generates sarcastic responses to a given utterance. Previous sarcasm generators assume the intended meaning that sarcasm conceals is the opposite of the literal meaning. We argue that this traditional theory of sarcasm provides a grounding that is neither necessary, nor sufficient, for sarcasm to occur. Instead, we ground our generation process on a formal theory that specifies conditions that unambiguously differentiate sarcasm from non-sarcasm. Furthermore, Chandler not only generates sarcastic responses, but also explanations for why each response is sarcastic. This provides accountability, crucial for avoiding miscommunication between humans and conversational agents, particularly considering that sarcastic communication can be offensive. In human evaluation, Chandler achieves comparable or higher sarcasm scores, compared to state-of-the-art generators, while generating more diverse responses, that are more specific and more coherent to the input.
SemEval
SemEval 2021 Task 7: HaHackathon, Detecting and Rating Humor and Offense
J. A. Meaney,
Steven Wilson,
Luis Chiruzzo,
Adam Lopez,
and Walid Magdy
In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
Aug
2021
SemEval 2021 Task 7, HaHackathon, was the first shared task to combine the previously separate domains of humor detection and offense detection. We collected 10,000 texts from Twitter and the Kaggle Short Jokes dataset, and had each annotated for humor and offense by 20 annotators aged 18-70. Our subtasks were binary humor detection, prediction of humor and offense ratings, and a novel controversy task: to predict if the variance in the humor ratings was higher than a specific threshold. The subtasks attracted 36-58 submissions, with most of the participants choosing to use pre-trained language models. Many of the highest performing teams also implemented additional optimization techniques, including task-adaptive training and adversarial training. The results suggest that the participating systems are well suited to humor detection, but that humor controversy is a more challenging task. We discuss which models excel in this task, which auxiliary techniques boost their performance, and analyze the errors which were not captured by the best systems.
SIGIR
Does BERT Pay Attention to Cyberbullying?
Fatma Elsafoury,
Stamos Katsigiannis,
Steven Wilson,
and Naeem Ramzan
In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
Jul
2021
Social media have brought threats like cyberbullying, which can lead to stress, anxiety, depression, and in some severe cases, suicide attempts. Detecting cyberbullying can help to warn/ block bullies and provide support to victims. However, very few studies have used self-attention-based language models like BERT for cyberbullying detection and they typically only report BERT’s performance without examining in depth the reasons for its performance. In this work, we examine the use of BERT for cyberbullying detection on various datasets and attempt to explain its performance by analyzing its attention weights and gradient-based feature importance scores for textual and linguistic features. Our results show that attention weights do not correlate with feature importance scores and thus do not explain the model’s performance. Additionally, they suggest that BERT relies on syntactical biases in the datasets to assign feature importance scores to class-related wordsrather than cyberbullying-related linguistic features.
2020
WebSci
Analyzing temporal relationships between trending terms on twitter and urban dictionary activity
Steven Wilson,
Walid Magdy,
Barbara McGillivray,
and Gareth Tyson
As an online, crowd-sourced, open English-language slang dictionary, the Urban Dictionary platform contains a wealth of opinions, jokes, and definitions of terms, phrases, acronyms, and more. However, it is unclear exactly how activity on this platform relates to larger conversations happening elsewhere on the web, such as discussions on larger, more popular social media platforms. In this research, we study the temporal activity trends on Urban Dictionary and provide the first analysis of how this activity relates to content being discussed on a major social network: Twitter. By collecting the whole of Urban Dictionary, as well as a large sample of tweets over seven years, we explore the connections between the words and phrases that are defined and searched for on Urban Dictionary and the content that is talked about on Twitter. Through a series of cross-correlation calculations, we identify cases in which Urban Dictionary activity closely reflects the larger conversation happening on Twitter. Then, we analyze the types of terms that have a stronger connection to discussions on Twitter, finding that Urban Dictionary activity that is positively correlated with Twitter is centered around terms related to memes, popular public figures, and offline events. Finally, We explore the relationship between periods of time when terms are trending on Twitter and the corresponding activity on Urban Dictionary, revealing that new definitions are more likely to be added to Urban Dictionary for terms that are currently trending on Twitter.
SemEval
Smash at SemEval-2020 Task 7: Optimizing the Hyperparameters of ERNIE 2.0 for Humor Ranking and Rating
J. A. Meaney,
Steven Wilson,
and Walid Magdy
In Proceedings of the Fourteenth Workshop on Semantic Evaluation
Dec
2020
The use of pre-trained language models such as BERT and ULMFiT has become increasingly popular in shared tasks, due to their powerful language modelling capabilities. Our entry to SemEval uses ERNIE 2.0, a language model which is pre-trained on a large number of tasks to enrich the semantic and syntactic information learned. ERNIE’s knowledge masking pre-training task is a unique method for learning about named entities, and we hypothesise that it may be of use in a dataset which is built on news headlines and which contains many named entities. We optimize the hyperparameters in a regression and classification model and find that the hyperparameters we selected helped to make bigger gains in the classification model than the regression model.
NLP+CSS
Diachronic Embeddings for People in the News
Felix Hennig,
and Steven Wilson
In Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science
Nov
2020
Previous English-language diachronic change models based on word embeddings have typically used single tokens to represent entities, including names of people. This leads to issues with both ambiguity (resulting in one embedding representing several distinct and unrelated people) and unlinked references (leading to several distinct embeddings which represent the same person). In this paper, we show that using named entity recognition and heuristic name linking steps before training a diachronic embedding model leads to more accurate representations of references to people, as compared to the token-only baseline. In large news corpus of articles from The Guardian, we provide examples of several types of analysis that can be performed using these new embeddings. Further, we show that real world events and context changes can be detected using our proposed model.
NLP+CSS
Emoji and Self-Identity in Twitter Bios
Jinhang Li,
Giorgos Longinos,
Steven Wilson,
and Walid Magdy
In Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science
Nov
2020
Emoji are widely used to express emotions and concepts on social media, and prior work has shown that users’ choice of emoji reflects the way that they wish to present themselves to the world. Emoji usage is typically studied in the context of posts made by users, and this view has provided important insights into phenomena such as emotional expression and self-representation. In addition to making posts, however, social media platforms like Twitter allow for users to provide a short bio, which is an opportunity to briefly describe their account as a whole. In this work, we focus on the use of emoji in these bio statements. We explore the ways in which users include emoji in these self-descriptions, finding different patterns than those observed around emoji usage in tweets. We examine the relationships between emoji used in bios and the content of users’ tweets, showing that the topics and even the average sentiment of tweets varies for users with different emoji in their bios. Lastly, we confirm that homophily effects exist with respect to the types of emoji that are included in bios of users and their followers.
ICWSM
Towards Using Word Embedding Vector Space for Better Cohort Analysis
Mohamed Bahgat,
Steve Wilson,
and Walid Magdy
In Proceedings of the International AAAI Conference on Web and Social Media
Jun
2020
On websites like Reddit, users join communities where they discuss specific topics which cluster them into possible cohorts. The authors within these cohorts have the opportunity to post more openly under the blanket of anonymity, and such openness provides a more accurate signal on the real issues individuals are facing. Some communities contain discussions about mental health struggles such as depression and suicidal ideation. To better understand and analyse these individuals, we propose to exploit properties of word embeddings that group related concepts close to each other in the embeddings space. For the posts from each topically situated sub-community, we build a word embeddings model and use handcrafted lexicons to identify emotions, values and psycholinguistically relevant concepts. We then extract insights into ways users perceive these concepts by measuring distances between them and references made by users either to themselves, others or other things around them. We show how our proposed approach can extract meaningful signals that go beyond the kinds of analyses performed at the individual word level.
Insights
Embedding Structured Dictionary Entries
Steven Wilson,
Walid Magdy,
Barbara McGillivray,
and Gareth Tyson
In Proceedings of the First Workshop on Insights from Negative Results in NLP
Nov
2020
Previous work has shown how to effectively use external resources such as dictionaries to improve English-language word embeddings, either by manipulating the training process or by applying post-hoc adjustments to the embedding space. We experiment with a multi-task learning approach for explicitly incorporating the structured elements of dictionary entries, such as user-assigned tags and usage examples, when learning embeddings for dictionary headwords. Our work generalizes several existing models for learning word embeddings from dictionaries. However, we find that the most effective representations overall are learned by simply training with a skip-gram objective over the concatenated text of all entries in the dictionary, giving no particular focus to the structure of the entries.
LREC
Urban Dictionary Embeddings for Slang NLP Applications
Steven Wilson,
Walid Magdy,
Barbara McGillivray,
Kiran Garimella,
and Gareth Tyson
In Proceedings of the 12th Language Resources and Evaluation Conference
May
2020
The choice of the corpus on which word embeddings are trained can have a sizable effect on the learned representations, the types of analyses that can be performed with them, and their utility as features for machine learning models. To contribute to the existing sets of pre-trained word embeddings, we introduce and release the first set of word embeddings trained on the content of Urban Dictionary, a crowd-sourced dictionary for slang words and phrases. We show that although these embeddings are trained on fewer total tokens (by at least an order of magnitude compared to most popular pre-trained embeddings), they have high performance across a range of common word embedding evaluations, ranging from semantic similarity to word clustering tasks. Further, for some extrinsic tasks such as sentiment analysis and sarcasm detection where we expect to require some knowledge of colloquial language on social media data, initializing classifiers with the Urban Dictionary Embeddings resulted in improved performance compared to initializing with a range of other well-known, pre-trained embeddings that are order of magnitude larger in size.
LREC
Small Town or Metropolis? Analyzing the Relationship between Population Size and Language
Amy Rechkemmer,
Steven Wilson,
and Rada Mihalcea
In Proceedings of the 12th Language Resources and Evaluation Conference
May
2020
The variance in language used by different cultures has been a topic of study for researchers in linguistics and psychology, but often times, language is compared across multiple countries in order to show a difference in culture. As a geographically large country that is diverse in population in terms of the background and experiences of its citizens, the U.S. also contains cultural differences within its own borders. Using a set of over 2 million posts from distinct Twitter users around the country dating back as far as 2014, we ask the following question: is there a difference in how Americans express themselves online depending on whether they reside in an urban or rural area? We categorize Twitter users as either urban or rural and identify ideas and language that are more commonly expressed in tweets written by one population over the other. We take this further by analyzing how the language from specific cities of the U.S. compares to the language of other cities and by training predictive models to predict whether a user is from an urban or rural area. We publicly release the tweet and user IDs that can be used to reconstruct the dataset for future studies in this direction.
2019
SocInfo
Measuring personal values in cross-cultural user-generated content
Yiting Shen,
Steven Wilson,
and Rada Mihalcea
In International Conference on Social Informatics
Nov
2019
There are several standard methods used to measure personal values, including the Schwartz Values Survey and the World Values Survey. While these tools are based on well-established questionnaires, they are expensive to administer at a large scale and rely on respondents to self-report their values rather than observing what people actually choose to write about. We employ a lexicon-based method that can computationally measure personal values on a large scale. Our approach is not limited to word-counting as we explore and evaluate several alternative approaches to quantifying the usage of value-related themes in a given document. We apply our methodology to a large blog dataset comprised of text written by users from different countries around the world in order to quantify cultural differences in the expression of person values on blogs. Additionally, we analyze the relationship between the value themes expressed in blog posts and the values measured for some of the same countries using the World Values Survey.
ACL
Predicting Human Activities from User-Generated Content
Steven Wilson,
and Rada Mihalcea
In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Jul
2019
The activities we do are linked to our interests, personality, political preferences, and decisions we make about the future. In this paper, we explore the task of predicting human activities from user-generated content. We collect a dataset containing instances of social media users writing about a range of everyday activities. We then use a state-of-the-art sentence embedding framework tailored to recognize the semantics of human activities and perform an automatic clustering of these activities. We train a neural network model to make predictions about which clusters contain activities that were performed by a given user based on the text of their previous posts and self-description. Additionally, we explore the degree to which incorporating inferred user traits into our model helps with this prediction task.
Thesis
Natural language processing for personal values and human activities
Personal values are theorized to influence thought and decision making patterns, which often manifest themselves in the things that people say and do. We explore the degree to which we can employ computational models to infer people’s values from the text that they write and the everyday activities that they perform. In addition to investigating how personal values are expressed in language, we use natural language processing methods to automatically discover relationships between a person’s values, behaviors, and cultural background. To this end, we show that the automatic analysis of less constrained, open-ended essay questions leads to a model of personal values that is more strongly connected to behaviors than traditional forced-choice value surveys, and that cultural background has a significant influence these connections. To help measure personal values in textual data, we use a novel crowd-powered sorting algorithm to construct a hierarchical lexicon of words and phrases related to human values. Additionally, we develop semantic representations of human activities that capture a variety of useful dimensions such the motivation for which they are typically done. We leverage these representations to build deep neural models that are able to make predictions about a person’s activities based on their observed linguistic patterns and inferred values.
*SEM
Multi-Label Transfer Learning for Multi-Relational Semantic Similarity
Li Zhang,
Steven Wilson,
and Rada Mihalcea
In Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)
Jun
2019
Multi-relational semantic similarity datasets define the semantic relations between two short texts in multiple ways, e.g., similarity, relatedness, and so on. Yet, all the systems to date designed to capture such relations target one relation at a time. We propose a multi-label transfer learning approach based on LSTM to make predictions for several relations simultaneously and aggregate the losses to update the parameters. This multi-label regression approach jointly learns the information provided by the multiple relations, rather than treating them as separate tasks. Not only does this approach outperform the single-task approach and the traditional multi-task learning approach, but it also achieves state-of-the-art performance on all but one relation of the Human Activity Phrase dataset.
TAC
Building a Flexible Knowledge Graph to Capture Real-World Events.
Laura Burdick,
Oana Ignat,
Yiming Zhang,
Rada Mihalcea,
Mingzhe Wang,
Steven Wilson,
Yumou Wei,
and Jia Deng
We introduce a crowd-powered approach for the creation of a lexicon for any theme given a set of seed words that cover a variety of concepts within the theme. Terms are initially sorted by automatically clustering their embeddings and subsequently rearranged by crowd workers in order to create a tree structure. This type of organization captures hierarchical relationships between concepts and allows for a tunable level of specificity when using the lexicon to collect measurements from a piece of text. We use a lexicon expansion method to increase the overall coverage of the produced resource. Using our proposed approach, we create a hierarchical lexicon of personal values and evaluate its internal and external consistency. We release this novel resource to the community as a tool for measuring value content within text corpora.
SocInfo
Text-based detection and understanding of changes in mental health
Yaoyiran Li,
Rada Mihalcea,
and Steven Wilson
In International Conference on Social Informatics
Sep
2018
Previous work has investigated the identification of mental health issues in social media users, yet the way that users’ mental states and related behavior change over time remains relatively understudied. This paper focuses on online mental health communities and studies how users’ contributions to these communities change over one year. We define a metric called the Mental Health Contribution Index (MHCI), which we use to measure the degree to which users’ contributions to mental health topics change over a one-year period. In this work, we study the relationship between MHCI scores and the online expression of mental health symptoms by extracting relevant linguistic features from user-generated content and conducting statistical analyses. Additionally, we build a classifier to predict whether or not a user’s contributions to mental health subreddits will increase or decrease. Finally, we employ propensity score matching to identify factors that correlate with an increase or a decrease in mental health forum contributions. Our work provides some of the first insights into detecting and understanding social media users’ changes in mental health states over time.
ArXiv
Direct network transfer: Transfer learning of sentence embeddings for semantic similarity
Sentence encoders, which produce sentence embeddings using neural networks, are typically evaluated by how well they transfer to downstream tasks. This includes semantic similarity, an important task in natural language understanding. Although there has been much work dedicated to building sentence encoders, the accompanying transfer learning techniques have received relatively little attention. In this paper, we propose a transfer learning setting specialized for semantic similarity, which we refer to as direct network transfer. Through experiments on several standard text similarity datasets, we show that applying direct network transfer to existing encoders can lead to state-of-the-art performance. Additionally, we compare several approaches to transfer sentence encoders to semantic similarity tasks, showing that the choice of transfer learning setting greatly affects the performance in many cases, and differs by encoder and dataset.
TAC
Entity and Event Extraction from Scratch Using Minimal Training Data.
Laura Wendlandt,
Steve Wilson,
Oana Ignat,
Charles Welch,
Li Zhang,
Mingzhe Wang,
Jia Deng,
and Rada Mihalcea
The things people do in their daily lives can provide valuable insights into their personality, values, and interests. Unstructured text data on social media platforms are rich in behavioral content, and automated systems can be deployed to learn about human activity on a broad scale if these systems are able to reason about the content of interest. In order to aid in the evaluation of such systems, we introduce a new phrase-level semantic textual similarity dataset comprised of human activity phrases, providing a testbed for automated systems that analyze relationships between phrasal descriptions of people’s actions. Our set of 1,000 pairs of activities is annotated by human judges across four relational dimensions including similarity, relatedness, motivational alignment, and perceived actor congruence. We evaluate a set of strong baselines for the task of generating scores that correlate highly with human ratings, and we introduce several new approaches to the phrase-level similarity task in the domain of human activities.
2016
ACL
Finding Optimists and Pessimists on Twitter
Xianzhi Ruan,
Steven Wilson,
and Rada Mihalcea
In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Aug
2016
Optimism is linked to various personality factors as well as both psychological and physical health, but how does it relate to the way a person tweets? We analyze the online activity of a set of Twitter users in order to determine how well machine learning algorithms can detect a person’s outlook on life by reading their tweets. A sample of tweets from each user is manually annotated in order to establish ground truth labels, and classifiers are trained to distinguish between optimistic and pessimistic users. Our results suggest that the words in people’s tweets provide ample evidence to identify them as optimists, pessimists, or somewhere in between. Additionally, several applications of these trained models are explored.
NLP+CSS
Disentangling Topic Models: A Cross-cultural Analysis of Personal Values through Words
Steven Wilson,
Rada Mihalcea,
Ryan Boyd,
and James Pennebaker
In Proceedings of the First Workshop on NLP and Computational Social Science
Nov
2016
We present a methodology based on topic modeling that can be used to identify and quantify sociolinguistic differences between groups of people, and describe a regression method that can disentangle the influences of different attributes of the people in the group (e.g., culture, gender, age). As an example, we explore the concept of personal values, and present a cross-cultural analysis of value-behavior relationships spanning writers from the United States and India.
AAAI-OSSM
Cultural influences on the measurement of personal values through words
Steven Wilson,
Rada Mihalcea,
Ryan L Boyd,
and James W Pennebaker
In 2016 AAAI Spring Symposium Series: Observational Studies using Social Media Data
2016
Texts posted on the web by users from diverse cultures provide a nearly endless source of data that researchers can use to study human thoughts and language patterns. However, unless care is taken to avoid it, models may be developed in one cultural setting and deployed in another, leading to unforeseen consequences. We explore the effects of using models built from a corpus of texts from multiple cultures in order to learn about each represented people group separately. To do this, we employ a topic modeling approach to quantify open-ended writing responses describing personal values and everyday behaviors in two distinct cultures. We show that some topics are more prominent in one culture compared to the other, while other topics are mentioned to similar degrees. Furthermore, our results indicate that culture influences how value-behavior relationships are exhibited. While some relationships exist in both cultural groups, in most cases we see that the observed relations are dependent on the cultural background of the data set under examination.
arXiv
Cruciform: Solving crosswords with natural language processing
Dragomir Radev,
Rui Zhang,
Steve Wilson,
Derek Van Assche,
Henrique Spyra Gubert,
Alisa Krivokapic,
MeiXing Dong,
Chongruo Wu,
Spruce Bondera,
Luke Brandl,
and others
Crossword puzzles are popular word games that require not only a large vocabulary, but also a broad knowledge of topics. Answering each clue is a natural language task on its own as many clues contain nuances, puns, or counter-intuitive word definitions. Additionally, it can be extremely difficult to ascertain definitive answers without the constraints of the crossword grid itself. This task is challenging for both humans and computers. We describe here a new crossword solving system, Cruciform. We employ a group of natural language components, each of which returns a list of candidate words with scores when given a clue. These lists are used in conjunction with the fill intersections in the puzzle grid to formulate a constraint satisfaction problem, in a manner similar to the one used in the Dr. Fill system. We describe the results of several of our experiments with the system.
arXiv
Stateology: State-level interactive charting of language, feelings, and values
Konstantinos Pappas,
Steven Wilson,
and Rada Mihalcea
People’s personality and motivations are manifest in their everyday language usage. With the emergence of social media, ample examples of such usage are procurable. In this paper, we aim to analyze the vocabulary used by close to 200,000 Blogger users in the U.S. with the purpose of geographically portraying various demographic, linguistic, and psychological dimensions at the state level. We give a description of a web-based tool for viewing maps that depict various characteristics of the social media users as derived from this large blog dataset of over two billion words.
2015
ICWSM
Values in words: Using language to evaluate and understand personal values
Ryan Boyd,
Steven Wilson,
James Pennebaker,
Michal Kosinski,
David Stillwell,
and Rada Mihalcea
In Proceedings of the International AAAI Conference on Web and Social Media
May
2015
People’s values provide a decision-making framework that helps guide their everyday actions. Most popular methods of assessing values show tenuous relationships with everyday behaviors. Using a new Amazon Mechanical Turk dataset (N = 767) consisting of people’s language, values, and behaviors, we explore the degree to which attaining "ground truth" is possible with regards to such complicated mental phenomena. We then apply our findings to a corpus of Facebook user (N=130,828) status updates in order to understand how core values influence the personal thoughts and behaviors that users share through social media. Our findings suggest that self-report questionnaires for abstract and complex phenomena, such as values, are inadequate for painting an accurate picture of individual mental life. Free response language data and language modeling show greater promise for understanding both the structure and content of concepts such as values and, additionally, exhibit a predictive edge over self-report questionnaires.