11 results found Sort:
- Filter by Primary Language:
- Python (3)
- Jupyter Notebook (1)
- +
A curated list of trustworthy deep learning papers. Daily updating...
privacy
backdoor
fairness
green-ai
security
causality
ownership
poisoning
robustness
uncertainty
ai-alignment
watermarking
deep-learning
hallucinations
gradient-leakage
machine-unlearning
interpretable-deep-learning
membership-inference-attack
adversarial-machine-learning
out-of-distribution-generalization
Created
2020-07-19
646 commits to master branch, last one a day ago
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Sa...
Created
2022-10-25
2 commits to main branch, last one 2 years ago
Code accompanying the paper Pretraining Language Models with Human Preferences
Created
2023-02-20
5 commits to master branch, last one 10 months ago
How to Make Safe AI? Let's Discuss! 💡|💬|🙌|📚
Created
2023-02-27
11 commits to main branch, last one about a year ago
📚 A curated list of papers & technical articles on AI Quality & Safety
Created
2023-04-19
28 commits to main branch, last one about a year ago
[AAAI'25 Oral] "MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector".
Created
2023-12-20
1 commits to main branch, last one 7 days ago
Reading list for adversarial perspective and robustness in deep reinforcement learning.
ai-safety
safe-rlhf
ai-alignment
responsible-ai
adversarial-policies
machine-learning-safety
robust-machine-learning
deep-reinforcement-learning
meta-reinforcement-learning
safe-reinforcement-learning
adversarial-machine-learning
explainable-machine-learning
reinforcement-learning-safety
robust-reinforcement-learning
reinforcement-learning-alignment
artificial-intelligence-alignment
multiagent-reinforcement-learning
adversarial-reinforcement-learning
robust-deep-reinforcement-learning
Created
2023-09-08
15 commits to main branch, last one 6 months ago
A curated list of awesome resources for Artificial Intelligence Alignment research
Created
2018-11-16
19 commits to master branch, last one about a year ago
A curated list of awesome academic research, books, code of ethics, data sets, institutes, newsletters, principles, podcasts, reports, tools, regulations and standards related to Responsible, Trustwor...
Created
2021-09-05
296 commits to main branch, last one 2 days ago
Sparse probing paper full code.
Created
2023-05-02
5 commits to main branch, last one about a year ago
Directional Preference Alignment
Created
2024-02-27
11 commits to main branch, last one 2 months ago