Search Results - RepositoryStats

6

152

other

4

[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation

gpt llm phi qwen regex chatglm dataset xfinder benchmark evaluation judge-model reliability open-compass lm-evaluation llm-as-a-judge llm-as-evaluator reliable-evaluation key-answer-extraction large-language-models

Created 2024-05-19

40 commits to main branch, last one about a month ago

4

40

other

10

CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.

lm-evaluation citation-dataset citation-attribution

Created 2024-06-11

23 commits to main branch, last one 3 months ago

1

26

mit

4

Latxa: An Open Language Model and Evaluation Suite for Basque

llm latxa basque gpt-neox evaluation huggingface lm-evaluation language-model

Created 2024-02-19

55 commits to main branch, last one 8 months ago