13 Oct 2023

#DS4SocietySeminar 2023 <> Digital Resources for Code-Switching in Under-Resourced Languages

Thipe Modipa

Talk Details


Code-switching (CS) refers to the use of more than one language within a single sentence. Multilingual communities are more likely to use CS through speech than through text. In today’s society, text communication is predominant in social media. There is, however, a lack of code-switched text data to model code-switching for languages with limited resources. Despite this challenge, a variety of sources of code-switched data exist, including social media platforms, speech recordings, transcriptions, and news reports. Because borrowed words are prevalent, these sources are insufficient. Furthermore, the data is normally small in terms of sizing. This has led to the generation of synthetic data as a means of compensating for these limitations.

Speaker Bio

Thipe Modipa received his PhD (Information Technology) from North-West University. He is currently a senior lecturer at the University of Limpopo. He is also the Node coordinator for the National e-Science Post-Graduate Teaching and Training Platform (NEPTTP) program. He is a Speech Technology research group coordinator of the CAIR Development Initiative. His research work includes the modeling of code-switching for under-resourced languages.

Video, Slides and Notes

  • content will be uploaded soon