ELOQUENT lab for evaluation of generative language model quality
The ELOQUENT lab for evaluation of generative language model quality and usefulness addresses high-level quality criteria through a set of open-ended shared tasks implemented to require minimal human assessment effort.
Tasks:
-
Task 1 - Voight-Kampff
Generate text samples for a classifier to distinguish between human-authored and machine-generated text.
-
Task 2 - Robustness and Consistency
Explore how much a generative language model's output is affected by stylistic, dialectal, or other non-topical variation in the input.
-
Task 3 - Dictionary Definition
Have a generative language model provide a definition for a previously unseen word and identify false or erroneous definitions.
-
Task 4 - Preference Score Prediction
Predict human preferences between sets of LLM-generated responses collected from human assessors, and generate judgments to explain the choice made.
-
Task 5 - Sensemaking
Given a set of possibly noisy texts, generate questions and answers about the topic.
Organizers
- Jussi Karlgren, Silo AI
- Ekaterina Artemova, Toloka AI
- Ondřej Bojar, Charles University
- Timothee Mickus, University of Helsinki
- Vladislav Mikhailov, University of Oslo
- Aarne Talman, University of Helsinki
- Magnus Sahlgren, AI Sweden
- Erik Velldal, University of Oslo
- Lilja Øvrelid, University of Oslo
Contact
- eloquent-clef2025-organizers@gmail.com