Library Automation and Digital Archive
LONTAR
Fakultas Ilmu Komputer
Universitas Indonesia

Pencarian Sederhana

Find Similar Add to Favorite

Call Number T-1431 (softcopy T-1140) MAK PI-231 TR-CSUI-103 Source Code-396
Collection Type Tesis
Title Ekspansi Data Menggunakan Forward-Backward Translation untuk Deteksi Ujaran Kebencian Multi-Label dalam Bahasa Indonesia
Author Fairuz Astari Devianty;
Publisher Depok: Fasilkom UI, 2023
Subject hate speech, multi-label classification,
Location FASILKOM-UI;
Lokasi : Perpustakaan Fakultas Ilmu Komputer
Nomor Panggil ID Koleksi Status
T-1431 (softcopy T-1140) MAK PI-231 TR-CSUI-103 Source Code-396 TERSEDIA
Tidak ada review pada koleksi ini: 56241
ABSTRAK

Name : Fairuz Astari Devianty Study Program : Magister Ilmu Komputer Title : Data Expansion using Forward-Backward Translation for Multi-Label Hate Speech Detection in Bahasa Indonesia Counsellor : Bayu Distiawan Trisedya, S.Kom., M.Kom., Ph.D. Meganingrum Arista Jiwanggi, S.Kom., M.Kom., M.C.S. The growth and development of social media platforms make communication easier. However, this can be misused. For example, the spread of hate speech via social media is increasing. Freedom of speech is everyone's right in Indonesia, but malicious content must be eliminated due to its negative impact. One solution is to build a model that can automatically detect hate speech. Building a good hate speech detection model requires a large amount of annotated data to train the model. It is also necessary to pay attention to the target, category, and level of hate speech. However, there is currently only one multi-label hate speech dataset in Bahasa Indonesia available and the proportion of data for each label is unequal. To overcome this data scarcity problem, we propose a forward-backward translation method to generate data automatically. This method combines forward and backward translation. A forward translation is performed for dataset in high-resource languages and a backward translation is performed for dataset in low-resource languages. By combining these two processes, the resulting dataset will have a large amount of data and good translation quality. This method will be used to add data on multi-label hate speech detection in Bahasa Indonesia with additional data from English. As a result of this study, the performance of multi-label hate speech detection in the new dataset improved compared to the existing Bahasa Indonesia hate speech dataset. This dataset gets an F1-score of 0.76 for multi-label classification and an F1-score of 0.78 for hierarchical classification