Aligning LLMs to Low-Resource Languages
Where:
Date:
Feb 22, 2024
00:00 GMT+2
Overview
This tutorial provides a detailed guide on collecting data for aligning large language models (LLMs) with low-resource languages (LRLs). It addresses the challenge of data scarcity in these languages and introduces a pipeline for generating high-quality data, using Swahili as a primary example. The tutorial covers strategies for dataset collection and alignment of LLMs to LRLs, offering comprehensive guidance on producing and utilizing high-quality data for language technology development in under-resourced languages.