4 results found Sort:
- Filter by Primary Language:
- HTML (1)
- Jupyter Notebook (1)
- TypeScript (1)
- +
Hallucinations (Confabulations) Document-Based Benchmark for RAG
Created
2024-10-10
60 commits to master branch, last one 3 days ago
Ranking LLMs on agentic tasks
Created
2025-02-10
7 commits to main branch, last one 20 days ago
Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
Created
2024-08-08
514 commits to main branch, last one 4 days ago
one click to open multi AI sites | 一键打开多个 AI 站点,查看 AI 结果
Created
2020-05-20
58 commits to master branch, last one about a month ago