Shambhavi Srivastava, Appen, AI Solutions Architect
Data annotation involves assigning relevant information to raw data to enhance machine learning (ML) model performance. While this process is crucial, it can be time-consuming and expensive. The emergence of Large Language Models (LLMs) offers a unique opportunity to automate data annotation. However, the complexity of data annotation, stemming from unclear task instructions and subjective human judgment on equivocal data points, presents challenges that are not immediately apparent.
In this session, Chris Stephens, Field CTO and Head of AI Solutions at Appen will provide a overview of an experiment that the company recently conducted to test the tradeoff between quality and cost of training ML models via LLMs vs human input. Their goal was to differentiate between utterances that could be confidently annotated by LLMs, and those that required human intervention. This differentiation was crucial to ensure a diverse range of opinions or to prevent incorrect responses from overly general models. Chris will walk audience members through the dataset used as well as methodology for the experiment, as well as the company’s research findings.