15 Sep 2023

#DS4SocietySeminar 2023 <> Lost in Translation: Large Language Models and Non-English Content Analysis

Aliya Bhatia and Gabriel Nicholas

Talk Details

Abstract

In recent years, large language models (e.g., Open AI’s GPT-4, Meta’s LLaMa, Google’s PaLM) have become the dominant approach for building AI systems to generate and analyze language online. However, most of these automated systems that increasingly mediate our interactions online – such as chatbots, content moderation systems, and search engines – are primarily designed for and work far more effectively in English than in the world’s other 7,000 languages. Recently, researchers and technology companies have attempted to extend the capabilities of large language models into languages other than English by building what are called multilingual language models. In this talk we will explain how these multilingual language models work and explore their capabilities and limits. We will also talk more broadly about how companies, researchers, and policymakers can lift the bar of language resourcing and work towards building tools that work equitably across languages and speakers.

Speaker Bio

Aliya Bhatia is a policy analyst on CDT’s Free Expression team, which works to promote users’ free expression rights in the United States and around the world. Aliya works on issues regarding online safety and content moderation, and is dedicated to upholding media freedom and creative expression online.

Gabriel Nicholas is a Research Fellow at the Center for Democracy and Technology where his research focuses on automated content moderation and data governance. He is also a joint fellow at the NYU School of Law Information Law Institute and the NYU Center for Cybersecurity.

Gabriel is a software engineer by training and has a Masters in Information Management and Systems from the UC Berkeley School of Information. His written work has appeared in academic journals, law reviews and journalistic outlets, including The Atlantic, The Washington Post, Slate, and Wired. His website can be found [here]

Video, Slides and Notes

  • content will be uploaded soon

Talk