18 Oct 2023

Announcing PuoBERTa: a tailor-made masked language model for Setswana

Work by Vukosi Marivate, Valencia K. Wagner, Moseli Motsoehli, Richard Lastrucci, Isheanesu Dzingirai

Announcing PuoBERTa

πŸŽ‰ Exciting News! After years of dedicated work, coinciding with the challenges of the COVID-19 pandemic, our collaborative effort to bolster NLP resources for Setswana has borne fruit! πŸš€

We’re thrilled to unveil PuoBERTa, a tailor-made masked language model for Setswana. Our journey involved collecting, curating, and preparing a diverse set of monolingual texts to breathe life into a model that’s not just technically adept but culturally attuned. πŸŒπŸ“š [Example shown is of PuoBERTa-News, finetuned for news categorisation - test it here https://huggingface.co/dsfsi/PuoBERTa-News]

We’ve expanded the horizons for Setswana, enhancing part-of-speech tagging, named entity recognition, and news categorisation, marking a significant stride in reducing the language resource disparity. πŸ’ͺ🏽🌟

Stay tuned for more as we continue exploring this terrain, ensuring languages like Setswana don’t just survive but thrive in the world of AI! Together, we’re weaving a world where every language finds its digital voice. πŸ—£οΈπŸ’»

Learn more about PuoBERTa:

Work with Valencia K. Wagner, Moseli Motsoehli, Richard Lastrucci, Isheanesu Dzingirai

We want to acknowledge the feedback received from colleagues at Data Science for Social Impact Research Group and Lelapa AI colleagues.

With generous support from NVIDIA, Google Research and Absa Group