7 results found Sort:

Examining how large language models (LLMs) perform across various synthetic regression tasks when given (input, output) examples in their context, without any parameter update
Created 2024-01-16
62 commits to main branch, last one 4 months ago
A benchmark for prompt injection detection systems.
Created 2024-03-27
57 commits to main branch, last one 4 months ago
A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessmen...
Created 2024-04-02
325 commits to main branch, last one 3 days ago
Hallucinations (Confabulations) Document-Based Benchmark for RAG
Created 2024-10-10
41 commits to master branch, last one 12 days ago
A collection of LLM related papers, thesis, tools, datasets, courses, open source models, benchmarks
Created 2024-01-18
84 commits to main branch, last one 3 months ago
Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"
Created 2024-06-11
29 commits to main branch, last one 2 months ago
4
24
mpl-2.0
28
LLM-KG-Bench is a Framework and task collection for automated benchmarking of Large Language Models (LLMs) on Knowledge Graph (KG) related tasks.
Created 2023-05-24
513 commits to main branch, last one 19 days ago