24 Jun 2022

[Publication] LiSTra Automatic Speech Translation: English to Lingala Case Study

Paper by Salomon Kabongo Kabenamualu, Vukosi Marivate, and Herman Kamper, African Masters of Machine Intelligence, University of Pretoria, Stellenbosch University

Members

Salomon Kabongo Kabenamualu, Vukosi Marivate, Herman Kamper.

Abstract

In recent years there has been great interest in addressing the data scarcity of African languages and providing baseline models for different Natural Language Processing tasks (Orife et al., 2020). Several initiatives (Nekoto et al., 2020) on the continent uses the Bible as a data source to provide proof of concept for some NLP tasks. In this work, we present the Lingala Speech Translation (LiSTra) dataset, release a full pipeline for the construction of such dataset in other languages, and report baselines using both the traditional cascade approach (Automatic Speech Recognition - Machine Translation), and a revolutionary transformer based End-2-End architecture (Liu et al., 2020) with a custom interactive attention that allows information sharing between the recognition decoder and the translation decoder.

Publications

  • S. Kabongo Kabenamualu, V. Marivate, and H. Kamper. LiSTra Automatic Speech Translation: English to Lingala Case Study, * Proceedings of The Workshop on Dataset Creation for Lower-Resourced Languages within the 13th Language Resources and Evaluation Conference. 2022. [NLP] <> [Paper URL] DOI: -