Power to the people: How crowdsourcing can democratize LLMs


In this talk, we discuss several gaping problems of modern LLMs that became particularly salient after the release of ChatGPT.

How can we make models like ChatGPT truly multilingual? We discuss a set of performance evaluation experiments and the discrepancies between the distribution of languages in the training data and the distribution of languages in terms of their native speakers across the globe. We illustrate how crowdsourcing can amend these challenges using the example of WMT'22 Shared Task on Large-Scale Machine Translation Evaluation for African Languages.

Another question we have to answer is how LLMs can work in narrow hands-on applications reliably and ethically. With a series of experiments with LLMs and multimodal generative models, we try to illustrate how modern crowdsourcing can facilitate prompt engineering and even assist in fine-tuning a model for a more narrow use case. Finally, we discuss the demand for more skilled and committed data labelers that arises as LLMs get broader adoption.


When & where

Location: Tel Aviv University, Smolarz and Jagolm Auditoria
Dates: February 13, 2023 – February 15, 2023


Olga Megorskaya
