For this class, a number of readings are used to reinforce concepts or encourage debate.
Good to read (Online and Books)
- Speech and Language Processing (3rd ed. draft) Dan Jurafsky and James H. Martin URL
- Machine Learning for Humans: Demystifying artificial intelligence & machine learning URL [ML]
- Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. Christoph Molnar URL [ML]
- Natural Language Processing with Python. Steven Bird, Ewan Klein, and Edward Loper URL [NLP]
- Neural Network Methods for Natural Language Processing. Yoav Goldberg [NLP][ML]
- NLP with sklearn. URL [NLP]
- Automate the Boring Stuff with Python. See Chapter 7 on regular experessions. URL [NLP][Python]
Papers
2022
- On the Impact of Data Augmentation on Downstream Performance in Natural Language Processing - acl
- Sentiment Classification in Swahili Language Using Multilingual BERT - arxiv
- Learning Contextualised Cross-lingual Word Embeddings for Extremely Low-Resource Languages Using Parallel Corpora - arxiv
- Massive vs. Curated Embeddings for Low-Resourced Languages: the Case of Yoruba and Twi - arxiv
- MasakhaNER: Named Entity Recognition for African Languages - arxiv
- Small Data? No Problem! Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages - acl
- A New Corpus for Low-Resourced Sindhi Language with Word Embeddings - arxiv
2021
- Sentiment Classification in Swahili Language Using Multilingual BERT - arxiv
- Learning Contextualised Cross-lingual Word Embeddings for Extremely Low-Resource Languages Using Parallel Corpora - arxiv
- Massive vs. Curated Embeddings for Low-Resourced Languages: the Case of Yoruba and Twi - arxiv Unlink the Link Between COVID-19 and 5G Networks: An NLP and SNA Based Approach - ieee
- MasakhaNER: Named Entity Recognition for African Languages - arxiv
- Automatic Detection of Cyberbullying in Social Media Text arxiv
- Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science acl
- A New Corpus for Low-Resourced Sindhi Language with Word Embeddings - arxiv
2020
- EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks - arxiv
- GoodNewsEveryone: A Corpus of News Headlines Annotated with Emotions, Semantic Roles, and Reader Perception - arxiv
- Breaking the News: First Impressions Matter on Online New - arxiv
- Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science aclweb
- Predicting Sentiments and Aspects on Financial Tweets and News Headlines - acl
- Fake News Detection on Social Media: A Data Mining Perspective arxiv
- Hate Speech Detection: A Solved Problem? The Challenging Case of Long Tail on Twitter arxiv
- Automatic Detection of Cyberbullying in Social Media Text arxiv
- Automatic Rumor Detection on Microblogs: A Survey arxiv