12 results found Sort:
- Filter by Primary Language:
- Python (4)
- Jupyter Notebook (1)
- +
Control GenAI interactions with power, precision, and consistency using Conversation Modeling paradigms
Created
2024-02-15
2,790 commits to develop branch, last one 2 days ago
A curated list of trustworthy deep learning papers. Daily updating...
privacy
backdoor
fairness
green-ai
security
causality
ownership
poisoning
robustness
uncertainty
ai-alignment
watermarking
deep-learning
hallucinations
gradient-leakage
machine-unlearning
interpretable-deep-learning
membership-inference-attack
adversarial-machine-learning
out-of-distribution-generalization
Created
2020-07-19
664 commits to master branch, last one 3 days ago
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Sa...
Created
2022-10-25
2 commits to main branch, last one 2 years ago
Code accompanying the paper Pretraining Language Models with Human Preferences
Created
2023-02-20
5 commits to master branch, last one about a year ago
📚 A curated list of papers & technical articles on AI Quality & Safety
Created
2023-04-19
28 commits to main branch, last one about a year ago
How to Make Safe AI? Let's Discuss! 💡|💬|🙌|📚
Created
2023-02-27
11 commits to main branch, last one 2 years ago
[AAAI'25 Oral] "MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector".
Created
2023-12-20
2 commits to main branch, last one 26 days ago
Reading list for adversarial perspective and robustness in deep reinforcement learning.
ai-safety
safe-rlhf
ai-alignment
robot-safety
responsible-ai
adversarial-policies
machine-learning-safety
robust-machine-learning
deep-reinforcement-learning
safe-reinforcement-learning
adversarial-machine-learning
explainable-machine-learning
reinforcement-learning-safety
robust-reinforcement-learning
reinforcement-learning-alignment
artificial-intelligence-alignment
multiagent-reinforcement-learning
adversarial-reinforcement-learning
robust-deep-reinforcement-learning
Created
2023-09-08
16 commits to main branch, last one 2 days ago
A curated list of awesome academic research, books, code of ethics, data sets, institutes, maturity models, newsletters, principles, podcasts, reports, tools, regulations and standards related to Resp...
Created
2021-09-05
372 commits to main branch, last one 4 days ago
A curated list of awesome resources for Artificial Intelligence Alignment research
Created
2018-11-16
19 commits to master branch, last one about a year ago
Sparse probing paper full code.
Created
2023-05-02
5 commits to main branch, last one about a year ago
Directional Preference Alignment
Created
2024-02-27
11 commits to main branch, last one 6 months ago