Readings

For this class, a number of readings are used to reinforce concepts or encourage debate.

Good to read (Online and Books)

  • Speech and Language Processing (3rd ed. draft) Dan Jurafsky and James H. Martin URL
  • Machine Learning for Humans: Demystifying artificial intelligence & machine learning URL [ML]
  • Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. Christoph Molnar URL [ML]
  • Natural Language Processing with Python. Steven Bird, Ewan Klein, and Edward Loper URL [NLP]
  • Neural Network Methods for Natural Language Processing. Yoav Goldberg [NLP][ML]
  • NLP with sklearn. URL [NLP]
  • Automate the Boring Stuff with Python. See Chapter 7 on regular experessions. URL [NLP][Python]

Papers

2022

  • On the Impact of Data Augmentation on Downstream Performance in Natural Language Processing - acl
  • Sentiment Classification in Swahili Language Using Multilingual BERT - arxiv
  • Learning Contextualised Cross-lingual Word Embeddings for Extremely Low-Resource Languages Using Parallel Corpora - arxiv
  • Massive vs. Curated Embeddings for Low-Resourced Languages: the Case of Yoruba and Twi - arxiv
  • MasakhaNER: Named Entity Recognition for African Languages - arxiv
  • Small Data? No Problem! Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages - acl
  • A New Corpus for Low-Resourced Sindhi Language with Word Embeddings - arxiv

2021

  • Sentiment Classification in Swahili Language Using Multilingual BERT - arxiv
  • Learning Contextualised Cross-lingual Word Embeddings for Extremely Low-Resource Languages Using Parallel Corpora - arxiv
  • Massive vs. Curated Embeddings for Low-Resourced Languages: the Case of Yoruba and Twi - arxiv Unlink the Link Between COVID-19 and 5G Networks: An NLP and SNA Based Approach - ieee
  • MasakhaNER: Named Entity Recognition for African Languages - arxiv
  • Automatic Detection of Cyberbullying in Social Media Text arxiv
  • Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science acl
  • A New Corpus for Low-Resourced Sindhi Language with Word Embeddings - arxiv

2020

  • EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks - arxiv
  • GoodNewsEveryone: A Corpus of News Headlines Annotated with Emotions, Semantic Roles, and Reader Perception - arxiv
  • Breaking the News: First Impressions Matter on Online New - arxiv
  • Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science aclweb
  • Predicting Sentiments and Aspects on Financial Tweets and News Headlines - acl
  • Fake News Detection on Social Media: A Data Mining Perspective arxiv
  • Hate Speech Detection: A Solved Problem? The Challenging Case of Long Tail on Twitter arxiv
  • Automatic Detection of Cyberbullying in Social Media Text arxiv
  • Automatic Rumor Detection on Microblogs: A Survey arxiv

Video

  • What is Natural Language Processing? Rachel Thomas URL [NLP]
  • A Code-First Introduction to Natural Language Processing fast.ai URL