4 results found Sort:

Hallucinations (Confabulations) Document-Based Benchmark for RAG
Created 2024-10-10
60 commits to master branch, last one 3 days ago
Ranking LLMs on agentic tasks
Created 2025-02-10
7 commits to main branch, last one 20 days ago
29
79
mit
4
Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
Created 2024-08-08
514 commits to main branch, last one 4 days ago
one click to open multi AI sites | 一键打开多个 AI 站点,查看 AI 结果
Created 2020-05-20
58 commits to master branch, last one about a month ago