Readings

For this class, a number of readings are used to reinforce concepts or encourage debate.

Good to read (Online and Books)

Speech and Language Processing (3rd ed. draft) Dan Jurafsky and James H. Martin URL
Machine Learning for Humans: Demystifying artificial intelligence & machine learning URL [ML]
Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. Christoph Molnar URL [ML]
Natural Language Processing with Python. Steven Bird, Ewan Klein, and Edward Loper URL [NLP]
Neural Network Methods for Natural Language Processing. Yoav Goldberg [NLP][ML]
NLP with sklearn. URL [NLP]
Automate the Boring Stuff with Python. See Chapter 7 on regular experessions. URL [NLP][Python]

On the Impact of Data Augmentation on Downstream Performance in Natural Language Processing - acl
Sentiment Classification in Swahili Language Using Multilingual BERT - arxiv
Learning Contextualised Cross-lingual Word Embeddings for Extremely Low-Resource Languages Using Parallel Corpora - arxiv
Massive vs. Curated Embeddings for Low-Resourced Languages: the Case of Yoruba and Twi - arxiv
MasakhaNER: Named Entity Recognition for African Languages - arxiv
Small Data? No Problem! Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages - acl
A New Corpus for Low-Resourced Sindhi Language with Word Embeddings - arxiv

Sentiment Classification in Swahili Language Using Multilingual BERT - arxiv
Learning Contextualised Cross-lingual Word Embeddings for Extremely Low-Resource Languages Using Parallel Corpora - arxiv
Massive vs. Curated Embeddings for Low-Resourced Languages: the Case of Yoruba and Twi - arxiv Unlink the Link Between COVID-19 and 5G Networks: An NLP and SNA Based Approach - ieee
MasakhaNER: Named Entity Recognition for African Languages - arxiv
Automatic Detection of Cyberbullying in Social Media Text arxiv
Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science acl
A New Corpus for Low-Resourced Sindhi Language with Word Embeddings - arxiv

EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks - arxiv
GoodNewsEveryone: A Corpus of News Headlines Annotated with Emotions, Semantic Roles, and Reader Perception - arxiv
Breaking the News: First Impressions Matter on Online New - arxiv
Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science aclweb
Predicting Sentiments and Aspects on Financial Tweets and News Headlines - acl
Fake News Detection on Social Media: A Data Mining Perspective arxiv
Hate Speech Detection: A Solved Problem? The Challenging Case of Long Tail on Twitter arxiv
Automatic Detection of Cyberbullying in Social Media Text arxiv
Automatic Rumor Detection on Microblogs: A Survey arxiv