22 Dec 2023

Combating Online Misinformation @ Data Science for Social Impact

Work advised by Vukosi Marivate and Seani Rananga

The internet, once a beacon of connectivity and knowledge, now struggles with an insidious threat: misinformation. False narratives and malicious disinformation campaigns threaten social cohesion and erode trust. Recognizing this danger, the Data Science for Social Impact (DSFSI) research group leverages cutting-edge natural language processing (NLP) to combat this phenomenon.

Unveiling Hidden Agendas: NLP Innovations in Misinformation Detection

Misinformation lurks within viral rumours and coordinated influence campaigns. Our NLP team delves into these digital shadows, analysing language signals across platforms to uncover hidden agendas and assess content credibility. Our research projects delve into diverse aspects of this challenge:

Multimodal Misinformation Detection: Amica de Jager’s research explores misinformation detection in the South African context using text, image, and combined text-image models. Her findings showcase the potential of multimodal approaches, particularly when training models on local data.
Video Misinformation Detection: David Walker’s demonstrate the effectiveness of pre-trained deep learning models for identifying misinformation in YouTube videos based on captions. This study highlights the potential of transfer learning across domains.
Misinformation in Underrepresented Languages: Mulweli Mukwevho’s project tackles the challenge of detecting misinformation in Tshivenda, an under-resourced language. Their work adapts existing NLP techniques and uses LSTM models to achieve promising results, highlighting the need for diverse datasets and language-specific approaches.
Image-based Misinformation in Low-Resource Languages: Kganshi Molokomme’s research tackles the complex problem of image-based misinformation in low-resource languages. While facing hurdles due to limited datasets, their work emphasises the critical need for creating and sharing language-specific datasets to address this growing challenge.
COVID-19 Misinformation: Nhlakanipho Ngwenya’s study focuses on evaluating the reliability of COVID-19-related statements using an NLP model. Their analysis reveals both the potential and limitations of such models, emphasising the importance of human judgement in misinformation detection.

Building Trust and Transparency in the Digital Landscape

Beyond these ongoing projects, DSFSI boasts a strong track record in the field of misinformation detection. We encourage your exploration of our past work, including research papers on South African media disinformation and semi-supervised learning for political sentiment prediction. Furthermore, we provide open access to datasets like the South African Disinformation Website Dataset, fostering broader collaboration and progress in this crucial field.

By illuminating the techniques used to distort truth and developing innovative NLP solutions, DSFSI strives to cultivate trust and transparency in the digital landscape. We invite you to join us in this critical fight against misinformation and empower a healthier, more informed online world.

Publications and Datasets

Publications

A. De Jager, V. Marivate, and A. Modupe. Multimodal Misinformation Detection in a South African Social Media Environment, Artificial Intelligence Research. SACAIR 2023. Communications in Computer and Information Science. 2023. [NLP][SOC] <> [Paper URL] [Preprint URL] DOI: 10.1007/978-3-031-49002-6_19
H. de Wet and V. Marivate. Is it Fake? News Disinformation Detection on South African News Websites, 2021 IEEE AFRICON. 2021. [NLP][SOC] <> [Paper URL] [Preprint URL] [Dataset] DOI: 10.1109/AFRICON51333.2021.9570905
M. Ledwaba and V. Marivate. Semi-Supervised Learning Approaches for Predicting South African Political Sentiment for Local Government Elections, DG.O 2022: The 23rd Annual International Conference on Digital Government Research. 2022. [ML][NLP] <> [Paper URL] [Preprint URL] DOI: 10.1145/3543434.3543484

Datasets

South African Disinformation Website Dataset Zenodo