1 result found Sort:

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
Created 2024-06-13
4 commits to main branch, last one 4 months ago