4 results found Sort:

Examining how large language models (LLMs) perform across various synthetic regression tasks when given (input, output) examples in their context, without any parameter update
Created 2024-01-16
62 commits to main branch, last one 18 days ago
A benchmark for prompt injection detection systems.
Created 2024-03-27
57 commits to main branch, last one 18 days ago
Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"
Created 2024-06-11
25 commits to main branch, last one 2 months ago
A collection of LLM related papers, thesis, tools, datasets, courses, open source models, benchmarks
Created 2024-01-18
83 commits to main branch, last one 24 days ago