6 results found Sort:

OpenAI API client library for Rust (unofficial)
Created 2022-12-12
313 commits to main branch, last one 7 days ago
4
137
unknown
5
This benchmark tests how well LLMs incorporate a set of 10 mandatory story elements (characters, objects, core concepts, attributes, motivations, etc.) in a short creative story
Created 2025-01-05
44 commits to main branch, last one 2 days ago
A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private conversations, form alliances, and vote to eliminate each other
Created 2025-02-22
27 commits to main branch, last one 2 days ago
Benchmark that evaluates LLMs using 601 NYT Connections puzzles extended with extra trick words
Created 2024-10-15
38 commits to master branch, last one 2 days ago
Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure. A multi-player “step-race” that challenges LLMs to engage in public conversation before secretly picking a mo...
Created 2025-01-21
36 commits to main branch, last one 2 days ago
Thematic Generalization Benchmark: measures how effectively various LLMs can infer a narrow or specific "theme" (category/rule) from a small set of examples and anti-examples, then detect which item t...
Created 2025-01-14
29 commits to main branch, last one 2 days ago