Statistics for topic adversarial-attacks
RepositoryStats tracks 631,873 Github repositories, of these 139 are tagged with the adversarial-attacks topic. The most common primary language for repositories using this topic is Python (95). Other languages include: Jupyter Notebook (19)
Stargazers over time for topic adversarial-attacks
Most starred repositories for topic adversarial-attacks (view more)
Trending repositories for topic adversarial-attacks (view more)
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
⚡ Vigil ⚡ Detect prompt injections, jailbreaks, and other potentially risky Large Language Model (LLM) inputs
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
⚡ Vigil ⚡ Detect prompt injections, jailbreaks, and other potentially risky Large Language Model (LLM) inputs
Spectrum simulation attack (ECCV'2022 Oral) towards boosting the transferability of adversarial examples
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Sa...
[NeurIPS 2020, Spotlight] Code for "Robust Deep Reinforcement Learning against Adversarial Perturbations on Observations"
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
[ICLR 2025] Dissecting Adversarial Robustness of Multimodal LM Agents
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers
This repository provide the studies on the security of language models for code (CodeLMs).
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
[ICLR 2025] Dissecting Adversarial Robustness of Multimodal LM Agents
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers
(CVPR 2024) "Unsegment Anything by Simulating Deformation"
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).