Abstract- The objective of our work is to detect hate speech in the indonesian language. As far as we know, the research on this subject is still very rare. The only research we found has created a dataset is inadequate. Our research aimed to create a new dataset that covers hate speecch in general, including hatred for religion, race, ethnicity,, and gender. In addition, we also conducted a preliminary study using machine learning approach. Machnie learning so far is the most frequently used approach in classifying text. We compered the performance of several featurs and machine learning algorith for hate speech detection. Features that extracted were word n-gram n=1 and n=2, character n-gram with n=3 and n=4 and negative sentiment. The classification was performed using naive bayes, support vector machine, bayesian logistic regression, and random forest decision tree. An F-measure of 93.5% was achieved when using word n-gram feature with random forest decision tree algorithm. Result also show that word n-gram feature outperformed character n-gram.
|
|