ELOQUENT lab for evaluation of generative language model quality
The ELOQUENT lab for evaluation of generative language model quality and usefulness addresses high-level quality criteria through a set of open-ended shared tasks implemented to require minimal human assessment effort.
Tasks:
-
Task 1 - Voight-Kampff
Generate text samples for a classifier to distinguish between human-authored and machine-generated text.
-
Task 2 - Robustness and Consistency
Explore how much a generative language model's output is affected by stylistic, dialectal, or other non-topical variation in the input.
-
Task 3 - Preference Score Prediction
Predict human preferences between sets of LLM-generated responses collected from human assessors, and generate judgments to explain the choice made.
-
Task 4 - Sensemaking
Given a set of possibly noisy texts, generate questions and answers about the topic.
Organizers
- Jussi Karlgren, Silo AI
- Ekaterina Artemova, Toloka AI
- Ondřej Bojar, Charles University
- Pavel Šindelář , Charles University
- Vladislav Mikhailov, University of Oslo
- Marie Engels, Fraunhofer IAIS
- Erik Velldal, University of Oslo
- Lilja Øvrelid, University of Oslo
Contact
- eloquent-clef2025-organizers@gmail.com