Announcing PuoBERTa: a tailor-made masked language model for Setswana
Work by Vukosi Marivate, Valencia K. Wagner, Moseli Motsoehli, Richard Lastrucci, Isheanesu Dzingirai
Announcing PuoBERTa
π Exciting News! After years of dedicated work, coinciding with the challenges of the COVID-19 pandemic, our collaborative effort to bolster NLP resources for Setswana has borne fruit! π
Weβre thrilled to unveil PuoBERTa, a tailor-made masked language model for Setswana. Our journey involved collecting, curating, and preparing a diverse set of monolingual texts to breathe life into a model thatβs not just technically adept but culturally attuned. ππ [Example shown is of PuoBERTa-News, finetuned for news categorisation - test it here https://huggingface.co/dsfsi/PuoBERTa-News]
Weβve expanded the horizons for Setswana, enhancing part-of-speech tagging, named entity recognition, and news categorisation, marking a significant stride in reducing the language resource disparity. πͺπ½π
Stay tuned for more as we continue exploring this terrain, ensuring languages like Setswana donβt just survive but thrive in the world of AI! Together, weβre weaving a world where every language finds its digital voice. π£οΈπ»
Learn more about PuoBERTa:
- π©πΎβπ» Github: https://github.com/dsfsi/PuoBERTa
- π€ Huggingface: https://huggingface.co/dsfsi/PuoBERTa
- π Arxiv Preprint: https://arxiv.org/abs/2310.09141
Work with Valencia K. Wagner, Moseli Motsoehli, Richard Lastrucci, Isheanesu Dzingirai
We want to acknowledge the feedback received from colleagues at Data Science for Social Impact Research Group and Lelapa AI colleagues.
With generous support from NVIDIA, Google Research and Absa Group