Search Results - RepositoryStats

2 results found Sort:

129

unknown

A survey on harmful fine-tuning attack for large language model

llms attack safety survey defense harmful alignment finetuning fine-tuning

Created 2024-09-04

68 commits to main branch, last one 14 days ago

apache-2.0

This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)

llm attack harmful finetuning fine-tuning

Created 2024-01-30

17 commits to main branch, last one 2 months ago