Aspect Based Sentiment Analysis: Feature Extraction using Latent Dirichlet Allocation (LDA) and Term Frequency - Inverse Document Frequency (TF-IDF) in Machine Learning (ML)

Authors

  • Shakirah Mohd Sofi Malaysia-Japan International Institute of Technology (MJIIT), Universiti Teknologi Malaysia Kuala Lumpur, Jalan Sultan Yahya Petra, Kuala Lumpur 54100, Malaysia Author
  • Ali Selamat Center for Basic and Applied Research, Faculty of Informatics and Management, University of Hradec Kralove, Rokitanskeho 62, 50003 Hradec Kralove, Czech Republic Author

DOI:

https://doi.org/10.53840/myjict8-2-102

Keywords:

Aspect-Based Sentiment Analysis, Opinion Mining, Feature Extraction, Top Modeling, LDA, Count Vectorizer, TF-IDF, SVM, NB

Abstract

The growth and development of social networks, blogs, forums, and e-commerce websites has produced a number of data, notably textual data, which has increased tremendously. Twitter is one of the most popular media social platforms; during the COVID-19 pandemic, people all around the world use social media to share their opinions or concerns about the pandemic that has changed their lives. It revealed a significant rise in tweets on coronavirus, including positive, negative, and neutral tweets about the virus's impact. Sentiment analysis faces challenges: sparse data limits understanding, while topic coherence and interpretability demand improvement for clearer insights. The primary goal of this paper is to improve the accuracy and effectiveness of sentiment analysis during the COVID-19 pandemic through the application of advanced techniques and classifiers. In this article, we experiment with such Support Vector Machines (SVM) and Naive Bayes (NB) on Twitter data for high-accuracy machine learning models. Using Latent Dirichlet Allocation (LDA)for feature extraction, we aim to capture comprehensive aspects and topics for sentiment analysis. Additionally, we explore Count Vectorizer and Term Frequency - Inverse Document Frequency (TF-IDF) as word embedding techniques. The main objectives are to extract topics, understand public concerns about Covid-19, and compare classifier performance in Aspect-Based Sentiment Analysis on Covid-19 tweets. This paper introduces advanced sentiment analysis techniques, such as LDA, Count Vectorizer, and SVM, enhancing nuanced sentiment analysis during the COVID-19 pandemic with notable 85% accuracy in SVM classification.

Downloads

Download data is not yet available.

References

Abd-Alrazaq, A., Alhuwail, D., Househ, M., Hamdi, M., & Shah, Z. (2020). Top Concerns of Tweeters During the COVID-19 Pandemic: Infoveillance Study. Journal of Medical Internet Research, 22(4), e19016. https://doi.org/10.2196/19016

Abdulaziz, M., Alotaibi, A., Alsolamy, M., & Alabbas, A. (2021). Topic based Sentiment Analysis for COVID-19 Tweets. International Journal of Advanced Computer Science and Applications, 12(1), 626–636. https://doi.org/10.14569/IJACSA.2021.0120172

Apuke, O. D., & Omar, B. (2021). Fake news and COVID-19: modelling the predictors of fake news sharing among social media users. Telematics and Informatics, 56(March 2020), 101475. https://doi.org/10.1016/j.tele.2020.101475

Avasthi, S., Chauhan, R., & Acharjya, D. P. (2022). Information Extraction and Sentiment Analysis to Gain Insight into the COVID-19 Crisis. January, 343–353. https://doi.org/10.1007/978-981-16-2594-7_28

Cambria, E., Poria, S., Gelbukh, A., & Thelwall, M. (2017). Sentiment Analysis Is a Big Suitcase. IEEE Intelligent Systems, 32(6), 74–80. https://doi.org/10.1109/MIS.2017.4531228

Chakraborty, K., Bhatia, S., Bhattacharyya, S., Platos, J., Bag, R., & Hassanien, A. E. (2020). Sentiment Analysis of COVID-19 tweets by Deep Learning Classifiers—A study to show how popularity is affecting accuracy in social media. Applied Soft Computing Journal, 97, 106754. https://doi.org/10.1016/j.asoc.2020.106754

Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. https://doi.org/10.1145/1014052.1014073

Kausar, M. A., Soosaimanickam, A., & Nasar, M. (2021). Public Sentiment Analysis on Twitter Data during COVID-19 Outbreak. International Journal of Advanced Computer Science and Applications, 12(2), 415–422. https://doi.org/10.14569/IJACSA.2021.0120252

Naseem, U., Razzak, I., Khushi, M., Eklund, P. W., & Kim, J. (2021). COVIDSenti: A Large-Scale Benchmark Twitter Data Set for COVID-19 Sentiment Analysis. IEEE Transactions on Computational Social Systems, 8(4), 976–988. https://doi.org/10.1109/TCSS.2021.3051189

Pontiki, M., Galanis, D., Papageorgiou, H., Androutsopoulos, I., Manandhar, S., AL-Smadi, M., Al-Ayyoub, M., Zhao, Y., Qin, B., De Clercq, O., Hoste, V., Apidianaki, M., Tannier, X., Loukachevitch, N., Kotelnikov, E., Bel, N., Jiménez-Zafra, S. M., & Eryiğit, G. (2016). SemEval-2016 Task 5: Aspect Based Sentiment Analysis. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), 19–30. https://doi.org/10.18653/v1/S16-1002

Priya, A., & Kumar, A. (2021). Deep Ensemble Approach for COVID-19 Fake News Detection from Social Media. Proceedings of the 8th International Conference on Signal Processing and Integrated Networks, SPIN 2021, 396–401. https://doi.org/10.1109/SPIN52536.2021.9565958

Rapanta, C., Botturi, L., Goodyear, P., Guàrdia, L., & Koole, M. (2020). Online University Teaching During and After the Covid-19 Crisis: Refocusing Teacher Presence and Learning Activity. Postdigital Science and Education, 2(3), 923–945. https://doi.org/10.1007/s42438-020-00155-y

Raza, G. M., Butt, Z. S., Latif, S., & Wahid, A. (2021). Sentiment Analysis on COVID Tweets: An Experimental Analysis on the Impact of Count Vectorizer and TF-IDF on Sentiment Predictions using Deep Learning Models. 2021 International Conference on Digital Futures and Transformative Technologies, ICoDT2 2021. https://doi.org/10.1109/ICoDT252288.2021.9441508

Rustam, F., Khalid, M., Aslam, W., Rupapara, V., Mehmood, A., & Choi, G. S. (2021). A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis. PLoS ONE, 16(2), 1–23. https://doi.org/10.1371/journal.pone.0245909

Sayed, S. A. F., Elkorany, A. M., & Mohammad, S. S. (2021). Applying Different Machine Learning Techniques for Prediction of COVID-19 Severity. IEEE Access, 9, 135697–135707. https://doi.org/10.1109/ACCESS.2021.3116067

World Health Organization. (2021). WHO Coronavirus (COVID-19) Dashboard. In WHO.int.

Yousefinaghani, S., Dara, R., Mubareka, S., Papadopoulos, A., & Sharif, S. (2021). An analysis of COVID-19 vaccine sentiments and opinions on Twitter. International Journal of Infectious Diseases, 108, 256–262. https://doi.org/10.1016/j.ijid.2021.05.059

Published

09-07-2024

Issue

Section

Articles

How to Cite

Aspect Based Sentiment Analysis: Feature Extraction using Latent Dirichlet Allocation (LDA) and Term Frequency - Inverse Document Frequency (TF-IDF) in Machine Learning (ML). (2024). Malaysian Journal of Information and Communication Technology (MyJICT), 8(2), 169-179. https://doi.org/10.53840/myjict8-2-102