9 results found Sort:
- Filter by Primary Language:
- Python (2)
- Jupyter Notebook (1)
- +
A curated list of trustworthy deep learning papers. Daily updating...
privacy
backdoor
fairness
green-ai
security
causality
ownership
poisoning
robustness
uncertainty
ai-alignment
watermarking
deep-learning
hallucinations
gradient-leakage
machine-unlearning
interpretable-deep-learning
membership-inference-attack
adversarial-machine-learning
out-of-distribution-generalization
Created
2020-07-19
604 commits to master branch, last one 5 days ago
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Sa...
Created
2022-10-25
2 commits to main branch, last one about a year ago
How to Make Safe AI? Let's Discuss! 💡|💬|🙌|📚
Created
2023-02-27
11 commits to main branch, last one about a year ago
Code accompanying the paper Pretraining Language Models with Human Preferences
Created
2023-02-20
5 commits to master branch, last one 4 months ago
📚 A curated list of papers & technical articles on AI Quality & Safety
Created
2023-04-19
28 commits to main branch, last one about a year ago
Reading list for adversarial perspective and robustness in deep reinforcement learning.
ai-safety
safe-rlhf
ai-alignment
explainable-rl
responsible-ai
adversarial-attacks
adversarial-policies
machine-learning-safety
robust-machine-learning
deep-reinforcement-learning
meta-reinforcement-learning
safe-reinforcement-learning
adversarial-machine-learning
explainable-machine-learning
reinforcement-learning-safety
robust-reinforcement-learning
multiagent-reinforcement-learning
adversarial-reinforcement-learning
reinforcement-learning-generalization
robust-adversarial-reinforcement-learning
Created
2023-09-08
15 commits to main branch, last one 10 days ago
A curated list of awesome resources for getting-started-with and staying-in-touch-with Artificial Intelligence Alignment research.
Created
2018-11-16
19 commits to master branch, last one 11 months ago
Sparse probing paper full code.
Created
2023-05-02
5 commits to main branch, last one 6 months ago
Directional Preference Alignment
Created
2024-02-27
10 commits to main branch, last one about a month ago