Analyzing the Impact of COVID-19 Measures on Individuals with Chronic Illnesses through Social Media using Data Mining Methods
Context
COVID-19 can be said to be one of the most impactful events in recent years. This was due to the rapid spread of the virus and its significant adverse effects on the body and mind. For this reason, many countries and organizations implemented measures to curb the spread of COVID-19 and to reduce the death toll. While these measures have been effective, it can also be said that they had a deep economical and sociological impact on businesses, institutions, and the average individual. Additionally, it can also be said that certain demographic groups have been more disproportionately affected by the measures. One of these groups being individuals with chronic illnesses.
Goal
The main aim and goal of this thesis was to analyze the impact of COVID-19 measures on individuals with chronic illnesses using social media data, data mining methods, and LLM tools as the main research methods. The main machine learning method to achieve this goal was association rules. To guide the thesis, the following research questions were asked:
- What meaningful association rules result from the social media data?
- Is there a relationship between specific chronic illnesses and sentiment towards certain COVID-19 measures?
- Which chronic illnesses were most affected by COVID-19 measures?
- How do the results of the social media analysis compare with the findings from the literature analysis?
Additionally, the scope of this work was limited explicitly to data and text from the United States of America, as well as the social media site Twitter.
Methods
In a first step, a literature review was conducted. This was to establish the most common chronic illnesses and COVID-19 measures in the U.S.A to limit the amount of data to a processable amount. Then, a final state of the art was done. The current research shows that COVID-19 measures might have had an adverse effect on the mental health of chronically ill individuals due to less physical contact. However, remote work and remote healthcare contact was noted as being a positive development, meaning that some aspects of COVID-19 measures helped individuals with chronic illnesses.
The next step was to conduct the Knowledge Discovery Process. This process is commonly used in machine learning and data mining research and commonly consists of the steps:
- Data Selection
- Preprocessing
- Transformation
- Data Mining
- Interpretation and Evaluation
In the data selection phase, tweets from Twitter were scrapped and put into a CSV file, while also categorizing the chronic illness and COVID-19 measure mentioned. Then, in the preprocessing phase, additional context such as the sentiment and the stance for or against the measure mentioned was added using the ChatGPT API. In this phase, the text was also cleaned up to remove any unnecessary or duplicate values. In the transformation phase, the CSV file was then transformed into a relational database to do the actual data mining process.
Using this database, with about 11'000 rows of data, association rules were generated.
Results
Generally, it can be said, that when a tweet is "against" a COVID-19 measure, it is often negative. This also holds true for when a tweet is "for" a COVID-19 measure and it being positive. These made the bulk of the association rules. However, there were also several unique association rules. For example, the chronic illness "cancer" when combined with the measures "masks" often had a "supportive" stance for the measure. In contrast, "depression" and "travel" or "depression" and "isolation" often led to a negative sentiment. One of the more unique rules was "dementia" and "isolation" often being followed by a positive sentiment. This showcases that either dementia patients or people working with dementia patients were able to handle COVID-19 measures pertaining to isolation better than others.
It can be said that generally, measures that had to do with travel and isolation often led to a negative sentiment, except for dementia and isolation. The most common association rules found were for the chronic illnesses dementia, depression, cancer, and arthritis.
The literature review and the analysis of the association rules mostly overlap when the topic is about isolation, travel, declining mental health, and the negative aspects of measures. There are some divergences in the form of rules such as cancer and mask leading to a supportive stance and dementia and isolation leading to a positive sentiment. One important thing to note is that association rules and by extension this work only prove correlation and not causation. Further studies using supplementary methods should be done to further improve the quality of the results and give a causal relationship for the data.