25 results found Sort:
- Filter by Primary Language:
- Python (16)
- Jupyter Notebook (2)
- Makefile (1)
- TypeScript (1)
- +
A curated list of awesome responsible machine learning resources.
r
xai
python
awesome
fairness
ai-safety
secure-ml
reliable-ai
awesome-list
data-science
transparency
explainable-ml
interpretability
interpretable-ai
interpretable-ml
machine-learning
interpretable-machine-learning
privacy-enhancing-technologies
machine-learning-interpretability
privacy-preserving-machine-learning
Created
2018-06-21
1,043 commits to master branch, last one a day ago
🐢 Open-Source Evaluation & Testing for LLMs and ML models
Created
2022-03-06
9,675 commits to main branch, last one a day ago
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Created
2023-05-15
110 commits to main branch, last one about a month ago
Open Source LLM toolkit to build trustworthy LLM applications. TigerArmor (AI safety), TigerRAG (embedding, RAG), TigerTune (fine-tuning)
Created
2023-10-23
137 commits to main branch, last one 6 months ago
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Sa...
Created
2022-10-25
2 commits to main branch, last one about a year ago
[NeurIPS '23 Spotlight] Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
Created
2023-01-13
15 commits to main branch, last one 3 months ago
Aligning AI With Shared Human Values (ICLR 2021)
Created
2020-08-06
25 commits to master branch, last one about a year ago
RuLES: a benchmark for evaluating rule-following in language models
Created
2023-11-03
22 commits to main branch, last one about a month ago
How to Make Safe AI? Let's Discuss! 💡|💬|🙌|📚
Created
2023-02-27
11 commits to main branch, last one about a year ago
Code accompanying the paper Pretraining Language Models with Human Preferences
Created
2023-02-20
5 commits to master branch, last one 3 months ago
📚 A curated list of papers & technical articles on AI Quality & Safety
Created
2023-04-19
28 commits to main branch, last one about a year ago
An unrestricted attack based on diffusion models that can achieve both good transferability and imperceptibility.
Created
2023-04-29
11 commits to main branch, last one 3 months ago
A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use
Created
2023-09-26
13 commits to main branch, last one 2 months ago
Safety Score for Pre-Trained Language Models
Created
2022-07-02
31 commits to main branch, last one 7 months ago
Attack to induce LLMs within hallucinations
Created
2023-09-29
22 commits to master branch, last one 7 months ago
BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
Created
2023-06-14
3 commits to main branch, last one 10 months ago
Feature Space Singularity for Out-of-Distribution Detection. (SafeAI 2021)
Created
2020-08-05
15 commits to master branch, last one 3 years ago
Reading list for adversarial perspective and robustness in deep reinforcement learning.
ai-safety
safe-rlhf
ai-alignment
explainable-rl
responsible-ai
adversarial-attacks
adversarial-policies
machine-learning-safety
robust-machine-learning
deep-reinforcement-learning
meta-reinforcement-learning
safe-reinforcement-learning
adversarial-machine-learning
explainable-machine-learning
reinforcement-learning-safety
robust-reinforcement-learning
multiagent-reinforcement-learning
adversarial-reinforcement-learning
reinforcement-learning-generalization
robust-adversarial-reinforcement-learning
Created
2023-09-08
14 commits to main branch, last one 8 months ago
A project to add scalable state-of-the-art out-of-distribution detection (open set recognition) support by changing two lines of code! Perform efficient inferences (i.e., do not increase inference tim...
Created
2019-08-16
50 commits to master branch, last one about a year ago
[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning
Created
2023-10-08
31 commits to main branch, last one 9 days ago
[ICCV2021 Oral] Fooling LiDAR by Attacking GPS Trajectory
Created
2020-10-06
27 commits to master branch, last one about a year ago
A curated list of awesome resources for getting-started-with and staying-in-touch-with Artificial Intelligence Alignment research.
Created
2018-11-16
19 commits to master branch, last one 10 months ago
A project to improve out-of-distribution detection (open set recognition) and uncertainty estimation by changing a few lines of code in your project! Perform efficient inferences (i.e., do not increas...
Created
2022-05-10
40 commits to master branch, last one about a year ago
Sparse probing paper full code.
Created
2023-05-02
5 commits to main branch, last one 5 months ago
AI Safety Q&A web frontend
Created
2022-02-17
1,059 commits to master branch, last one 6 days ago