Search Results - RepositoryStats

19

145

mit

3

Attack to induce LLMs within hallucinations

llm nlp ai-safety llm-safety deep-learning hallucinations machine-learning adversarial-attacks

Created 2023-09-29

22 commits to master branch, last one about a year ago

5

98

mit

9

Papers about red teaming LLMs and Multimodal models.

papers safety llm-safety redteaming awesome-list language-model

Created 2024-02-05

36 commits to main branch, last one 3 months ago

8

39

other

2

Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"

ai llm nlp benchmark llm-safety red-teaming bias-detection llm-evaluation safety-monitoring transformers-models llm-safety-benchmark nlp-machine-learning artificial-intelligence

Created 2024-04-06

24 commits to master branch, last one 5 months ago

2

27

unknown

1

Restore safety in fine-tuned language models through task arithmetic

llm llms safety alignment llm-safety llms-benchmarking alignment-algorithm llm-safety-benchmark

Created 2024-02-17

83 commits to main branch, last one 11 months ago