The vast amount of data on social media presents significant opportunities and challenges for utilizing it as a resource for health informatics. The aim of the Social Media Mining for Health Applications (#SMM4H) shared tasks is to take a community-driven approach to address natural language processing challenges of using social media data for health informatics, including informal, colloquial expressions of clinical concepts, noise, data sparsity, ambiguity, and multilingual posts.
In this talk, Elena Tutubalina and Ilseyar Alimova will introduce SMM4H 2020 & 2021 shared tasks and the first Russian adverse drug reaction corpus of tweets. Elena will describe three tasks on mining adverse drug effects using annotated datasets, focusing on current challenges and the imbalanced nature of the datasets. Ilseyar will describe the creation of the Russian dataset, focusing on the data collection and annotation process. At the end of the talk, the results of participants of the SMM4H shared tasks on the classification of Russian tweets will be discussed. This is joint work with the University of Pennsylvania, USA.