Trending repositories for topic adversarial-attacks
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
⚡ Vigil ⚡ Detect prompt injections, jailbreaks, and other potentially risky Large Language Model (LLM) inputs
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Sa...
PyTorch implementation of adversarial attacks [torchattacks]
Spectrum simulation attack (ECCV'2022 Oral) towards boosting the transferability of adversarial examples
[NeurIPS 2020, Spotlight] Code for "Robust Deep Reinforcement Learning against Adversarial Perturbations on Observations"
A suite for hunting suspicious targets, expose domains and phishing discovery
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
⚡ Vigil ⚡ Detect prompt injections, jailbreaks, and other potentially risky Large Language Model (LLM) inputs
Spectrum simulation attack (ECCV'2022 Oral) towards boosting the transferability of adversarial examples
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Sa...
[NeurIPS 2020, Spotlight] Code for "Robust Deep Reinforcement Learning against Adversarial Perturbations on Observations"
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
A suite for hunting suspicious targets, expose domains and phishing discovery
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
PyTorch implementation of adversarial attacks [torchattacks]
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Sa...
[ICLR 2025] Dissecting Adversarial Robustness of Multimodal LM Agents
⚡ Vigil ⚡ Detect prompt injections, jailbreaks, and other potentially risky Large Language Model (LLM) inputs
A Python toolbox to create adversarial examples that fool neural networks in PyTorch, TensorFlow, and JAX
PyTorch implementation of adversarial attacks [torchattacks]
A suite for hunting suspicious targets, expose domains and phishing discovery
A Model for Natural Language Attack on Text Classification and Inference
A pytorch adversarial library for attack and defense methods on images and graphs
Code relative to "Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks"
A curated list of adversarial attacks and defenses papers on graph-structured data.
[ICLR 2025] Dissecting Adversarial Robustness of Multimodal LM Agents
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers
This repository provide the studies on the security of language models for code (CodeLMs).
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Sa...
The official code for the paper "Delving Deep into Label Smoothing", IEEE TIP 2021
⚡ Vigil ⚡ Detect prompt injections, jailbreaks, and other potentially risky Large Language Model (LLM) inputs
AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM
APBench: A Unified Availability Poisoning Attack and Defenses Benchmark (TMLR 08/2024)
Evading Provenance-Based ML Detectors with Adversarial System Actions
Defending graph neural networks against adversarial attacks (NeurIPS 2020)
This repository includes code for the AutoML-based IDS and adversarial attack defense case studies presented in the paper "Enabling AutoML for Zero-Touch Network Security: Use-Case Driven Analysis" pu...
The official code of KDD22 paper "FLDetecotor: Defending Federated Learning Against Model Poisoning Attacks via Detecting Malicious Clients"
[ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"
[IEEE TGRS 2022] Universal Adversarial Examples in Remote Sensing: Methodology and Benchmark
A toolkit for detecting and protecting against vulnerabilities in Large Language Models (LLMs).
Spectrum simulation attack (ECCV'2022 Oral) towards boosting the transferability of adversarial examples
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
PyTorch implementation of adversarial attacks [torchattacks]
A Python toolbox to create adversarial examples that fool neural networks in PyTorch, TensorFlow, and JAX
⚡ Vigil ⚡ Detect prompt injections, jailbreaks, and other potentially risky Large Language Model (LLM) inputs
Beacon Object File (BOF) launcher - library for executing BOF files in C/C++/Zig applications
A pytorch adversarial library for attack and defense methods on images and graphs
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Sa...
A suite for hunting suspicious targets, expose domains and phishing discovery
[ICML 2024] Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models
Security and Privacy Risk Simulator for Machine Learning (arXiv:2312.17667)
Code relative to "Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks"
[ICLR 2025] Dissecting Adversarial Robustness of Multimodal LM Agents
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers
(CVPR 2024) "Unsegment Anything by Simulating Deformation"
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
[ICML 2024] Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models
The official implementation of ECCV'24 paper "To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now". This work introduces one fast and effe...
This repository includes code for the AutoML-based IDS and adversarial attack defense case studies presented in the paper "Enabling AutoML for Zero-Touch Network Security: Use-Case Driven Analysis" pu...
A re-implementation of the "Red Teaming Language Models with Language Models" paper by Perez et al., 2022
🛡 A curated list of adversarial attacks in PyTorch, with a focus on transferable black-box attacks.
A library of reference materials, tools, and other resources to aid threat profiling, threat quantification, and cyber adversary defense
APBench: A Unified Availability Poisoning Attack and Defenses Benchmark (TMLR 08/2024)
Evading Provenance-Based ML Detectors with Adversarial System Actions
Fantastic Robustness Measures: The Secrets of Robust Generalization [NeurIPS 2023]
Beacon Object File (BOF) launcher - library for executing BOF files in C/C++/Zig applications
[ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"
WACV 2024 Papers: Discover cutting-edge research from WACV 2024, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support ...
6G Wireless Communication Security - Deep Learning Based Channel Estimation Dataset