3 results found Sort:
An easy-to-use Python framework to generate adversarial jailbreak prompts.
Created
2024-01-31
89 commits to master branch, last one 2 months ago
Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"
Created
2024-04-06
24 commits to master branch, last one 2 months ago
Restore safety in fine-tuned language models through task arithmetic
Created
2024-02-17
83 commits to main branch, last one 7 months ago