Detection of Bahasa Cyberbullying Speech Using Large-scale N-Gram Machine Learning Models with Increased  Document-Terms Probability

Yudi Setiawan; Endina Putri Purwandari; Andang Wijanarko; Yusran Putra Panca; Ferzha Putra Utama

doi:10.5269/bspm.78695

Yudi Setiawan Information Systems Study Program, University of Bengkulu
Endina Putri Purwandari Information System Study Program, Department of Engineering, University of Bengkulu, Bengkulu, Indonesia
Andang Wijanarko Information System Study Program, Department of Engineering, University of Bengkulu, Bengkulu, Indonesia
Yusran Putra Panca Information System Study Program, Department of Engineering, University of Bengkulu, Bengkulu, Indonesia
Ferzha Putra Utama Information System Study Program, Department of Engineering, University of Bengkulu, Bengkulu, Indonesia

Abstract

A rising number of bullying incidents, whether between people or groups (cyberbullying), can be attributed to the proliferation of social media technologies and sharing websites. One difficulty in identifying cyberbullying in Bahasa is that words can have more than one meaning when combined with another, making them ambiguous or even negative. In this article, we look at how to increase the probability value of document-terms in a machine learning model to achieve high classification accuracy in the detection of Bahasa cyberbullying, which features a wide range of meanings, word spellings, and meaning shifts on social networking platforms. In addition, a language model with sequential sequences of n-words to capture patterns and statistics in the text data (Large-scale N-Gram) is applied throughout the detection phase to categorize texts based on the cyberbullying corpus created during training and testing. Our research shows that the accuracy of Indonesian cyberbullying detection may be greatly enhanced by collecting trends and boosting the probability value of document-terms.

Downloads

Download data is not yet available.

References

Ali, M. M., Paul, B. K., Ahmed, K., Bui, F. M., Quinn, J. M. W., & Moni, M. A. (2021). Heart disease prediction using supervised machine learning algorithms: Performance analysis and comparison. Computers in Biology and Medicine, 136, 104672. https://doi.org/10.1016/j.compbiomed.2021.104672
Arroyo-Fernández, I., Méndez-Cruz, C.-F., Sierra, G., Torres-Moreno, J.-M., & Sidorov, G. (2019). Unsupervised sentence representations as word information series: Revisiting TF–IDF. Computer Speech & Language, 56, 107–129. https://doi.org/10.1016/j.csl.2019.01.005
Baggini, J., & Fosl, P. S. (2010). The Philosophers (2nd ed.). Blackwell Publishing Ltd.
Balakrishnan, V., Khan, S., & Arabnia, H. R. (2020). Improving cyberbullying detection using Twitter users’ psychological features and machine learning. Computers & Security, 90, 101710. https://doi.org/10.1016/j.cose.2019.101710
Barlett, C. P. (2019). Chapter 2—Cyberbullying, Traditional Bullying, and Aggression: A Complicated Relationship. In C. P. Barlett (Ed.), Predicting Cyberbullying (pp. 11–16). Academic Press. https://doi.org/10.1016/B978-0-12-816653-6.00002-9
Chan, T. K. H., Cheung, C. M. K., & Lee, Z. W. Y. (2021). Cyberbullying on social networking sites: A literature review and future research directions. Information & Management, 58(2), 103411. https://doi.org/10.1016/j.im.2020.103411
Chawla, P., Hazarika, S., & Shen, H.-W. (2020). Token-wise sentiment decomposition for ConvNet: Visualizing a sentiment classifier. PacificVis 2020 Workshop on Visualization Meets AI, 4(2), 132–141. https://doi.org/10.1016/j.visinf.2020.04.006
Cohen-Shapira, N., & Rokach, L. (2021). Automatic selection of clustering algorithms using supervised graph embedding. Information Sciences, 577, 824–851. https://doi.org/10.1016/j.ins.2021.08.028
Genoud, A. P., Gao, Y., Williams, G. M., & Thomas, B. P. (2020). A comparison of supervised machine learning algorithms for mosquito identification from backscattered optical signals. Ecological Informatics, 58, 101090. https://doi.org/10.1016/j.ecoinf.2020.101090
I. Ting, W. S. Liou, D. Liberona, S. Wang, & G. M. Tarazona Bermudez. (2017). Towards the detection of cyberbullying based on social network mining techniques. 2017 International Conference on Behavioral, Economic, Socio-Cultural Computing (BESC), 1–2. https://doi.org/10.1109/BESC.2017.8256403
Imura, T., Toda, H., Iwamoto, Y., Inagawa, T., Imada, N., Tanaka, R., Inoue, Y., Araki, H., & Araki, O. (2021). Comparison of Supervised Machine Learning Algorithms for Classifying of Home Discharge Possibility in Convalescent Stroke Patients: A Secondary Analysis. Journal of Stroke and Cerebrovascular Diseases, 30(10), 106011. https://doi.org/10.1016/j.jstrokecerebrovasdis.2021.106011
Li, Q. (2007). New bottle but old wine: A research of cyberbullying in schools. Computers in Human Behavior, 23(4), 1777–1791. https://doi.org/10.1016/j.chb.2005.10.005
Li, S., Pan, R., Luo, H., Liu, X., & Zhao, G. (2021). Adaptive cross-contextual word embedding for word polysemy with unsupervised topic modeling. Knowledge-Based Systems, 218, 106827. https://doi.org/10.1016/j.knosys.2021.106827
López-Vizcaíno, M. F., Nóvoa, F. J., Carneiro, V., & Cacheda, F. (2021). Early detection of cyberbullying on social media networks. Future Generation Computer Systems, 118, 219–229. https://doi.org/10.1016/j.future.2021.01.006
Michael A, P., Sharon, R., Mats, H., & Tina, B. (2018). Post-Truth, Fake News. Springer International Publishing. https://doi.org/10.1007/978-981-10-8013-5
Noviantho, S. M. Isa, & L. Ashianti. (2017). Cyberbullying classification using text mining. 2017 1st International Conference on Informatics and Computational Sciences (ICICoS), 241–246. https://doi.org/10.1109/ICICOS.2017.8276369
Ozbay, F. A., & Alatas, B. (2020). Fake news detection within online social media using supervised artificial intelligence algorithms. Physica A: Statistical Mechanics and Its Applications, 540, 123174. https://doi.org/10.1016/j.physa.2019.123174
Rajput, A. (2020). Chapter 3—Natural Language Processing, Sentiment Analysis, and Clinical Analytics. In M. D. Lytras & A. Sarirete (Eds.), Innovation in Health Informatics (pp. 79–97). Academic Press. https://doi.org/10.1016/B978-0-12-819043-2.00003-4
S. Salawu, Y. He, & J. Lumsden. (2020). Approaches to Automated Detection of Cyberbullying: A Survey. IEEE Transactions on Affective Computing, 11(1), Article 1. https://doi.org/10.1109/TAFFC.2017.2761757
Sharma, P., & Sharma, A. K. (2020). Experimental investigation of automated system for twitter sentiment analysis to predict the public emotions using machine learning algorithms. Materials Today: Proceedings. https://doi.org/10.1016/j.matpr.2020.09.351
Sheldon, P., Rauschnabel, P. A., & Honeycutt, J. M. (2019). Chapter 3—Cyberstalking and Bullying. In P. Sheldon, P. A. Rauschnabel, & J. M. Honeycutt (Eds.), The Dark Side of Social Media (pp. 43–58). Academic Press. https://doi.org/10.1016/B978-0-12-815917-0.00003-4
T. Mahlangu & C. Tu. (2019). Deep Learning Cyberbullying Detection Using Stacked Embbedings Approach. 2019 6th International Conference on Soft Computing & Machine Intelligence (ISCMI), 45–49. https://doi.org/10.1109/ISCMI47871.2019.9004292
T.K., B., Annavarapu, C. S. R., & Bablani, A. (2021). Machine learning algorithms for social media analysis: A survey. Computer Science Review, 40, 100395. https://doi.org/10.1016/j.cosrev.2021.100395
V. Banerjee, J. Telavane, P. Gaikwad, & P. Vartak. (2019). Detection of Cyberbullying Using Deep Neural Network. 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS), 604–607. https://doi.org/10.1109/ICACCS.2019.8728378
W. M. Al-Rahmi, N. Yahaya, M. M. Alamri, N. A. Aljarboa, Y. B. Kamin, & F. A. Moafa. (2019). A Model of Factors Affecting Cyber Bullying Behaviors Among University Students. IEEE Access, 7, 2978–2985. https://doi.org/10.1109/ACCESS.2018.2881292
W. M. Al-Rahmi, N. Yahaya, M. M. Alamri, N. A. Aljarboa, Y. B. Kamin, & M. S. B. Saud. (2019). How Cyber Stalking and Cyber Bullying Affect Students’ Open Learning. IEEE Access, 7, 20199–20210. https://doi.org/10.1109/ACCESS.2019.2891853