ABSTRAC
Name : Hajra Faki Ali
Study Program : Master of Computer Science
Title : Monolingual BERT Is Better Than Multilingual
BERT For Natural Language Inference In Swahili
Supervisor : Adila Alfa Krisnadhi, S.Kom, M.Sc., Ph.D
This research proposes the development of a monolingual model for Natural Language
Inference (NLI) in Swahili to overcome the limitations of current multilingual models.
The study fine-tunes the pre-trained SwahBERT model to capture Swahili's unique
semantic relationships and contextual nuances. A critical component of this research is
the creation of a SwahiliNLI dataset, crafted to reflect the intricacies of the language,
thereby avoiding reliance on translated English text. Furthermore, the performance of the
fine-tuned SwahBERT model is evaluated using both SwahiliNLI and the XNLI dataset,
and compared with the multilingual mBERT model. The results reveal that the
SwahBERT model outperforms the multilingual model, achieving an accuracy rate of
78.78% on the SwahiliNLI dataset and 73.51% on the XNLI dataset. The monolingual
model also exhibits superior precision, recall, and F1 scores, particularly in recognizing
linguistic patterns and predicting sentence pairings. This research underscores the
importance of using manually generated datasets and monolingual models in lowresource languages, providing valuable insights for the development of more efficient and
contextually relevant NLI systems, thereby advancing natural language processing for
Swahili and potentially benefiting other languages facing similar resource constraints.
Keywords: Monolingual, Multilingual, Natural Language Inference, Swahili, SwahBERT
|