Call Number | T-1431 (softcopy T-1140) MAK PI-231 TR-CSUI-103 Source Code-396 |
Collection Type | Tesis |
Title | Ekspansi Data Menggunakan Forward-Backward Translation untuk Deteksi Ujaran Kebencian Multi-Label dalam Bahasa Indonesia |
Author | Fairuz Astari Devianty; |
Publisher | Depok: Fasilkom UI, 2023 |
Subject | hate speech, multi-label classification, |
Location | FASILKOM-UI; |
Nomor Panggil | ID Koleksi | Status |
---|---|---|
T-1431 (softcopy T-1140) MAK PI-231 TR-CSUI-103 Source Code-396 | TERSEDIA |
Name : Fairuz Astari Devianty Study Program : Magister Ilmu Komputer Title : Data Expansion using Forward-Backward Translation for Multi-Label Hate Speech Detection in Bahasa Indonesia Counsellor : Bayu Distiawan Trisedya, S.Kom., M.Kom., Ph.D. Meganingrum Arista Jiwanggi, S.Kom., M.Kom., M.C.S. The growth and development of social media platforms make communication easier. However, this can be misused. For example, the spread of hate speech via social media is increasing. Freedom of speech is everyone's right in Indonesia, but malicious content must be eliminated due to its negative impact. One solution is to build a model that can automatically detect hate speech. Building a good hate speech detection model requires a large amount of annotated data to train the model. It is also necessary to pay attention to the target, category, and level of hate speech. However, there is currently only one multi-label hate speech dataset in Bahasa Indonesia available and the proportion of data for each label is unequal. To overcome this data scarcity problem, we propose a forward-backward translation method to generate data automatically. This method combines forward and backward translation. A forward translation is performed for dataset in high-resource languages and a backward translation is performed for dataset in low-resource languages. By combining these two processes, the resulting dataset will have a large amount of data and good translation quality. This method will be used to add data on multi-label hate speech detection in Bahasa Indonesia with additional data from English. As a result of this study, the performance of multi-label hate speech detection in the new dataset improved compared to the existing Bahasa Indonesia hate speech dataset. This dataset gets an F1-score of 0.76 for multi-label classification and an F1-score of 0.78 for hierarchical classification