33 results found Sort:
- Filter by Primary Language:
- Python (20)
- Jupyter Notebook (3)
- Makefile (1)
- Svelte (1)
- TypeScript (1)
- +
🐢 Open-Source Evaluation & Testing for ML & LLM systems
Created
2022-03-06
10,112 commits to main branch, last one a day ago
A curated list of awesome responsible machine learning resources.
r
xai
python
awesome
fairness
ai-safety
secure-ml
reliable-ai
awesome-list
data-science
transparency
explainable-ml
interpretability
interpretable-ai
interpretable-ml
machine-learning
interpretable-machine-learning
privacy-enhancing-technologies
machine-learning-interpretability
privacy-preserving-machine-learning
Created
2018-06-21
1,165 commits to master branch, last one 7 days ago
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Created
2023-05-15
111 commits to main branch, last one 5 months ago
Deliver safe & effective language models
Created
2022-11-18
5,461 commits to main branch, last one about a month ago
Open Source LLM toolkit to build trustworthy LLM applications. TigerArmor (AI safety), TigerRAG (embedding, RAG), TigerTune (fine-tuning)
Created
2023-10-23
137 commits to main branch, last one 11 months ago
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Sa...
Created
2022-10-25
2 commits to main branch, last one 2 years ago
Aligning AI With Shared Human Values (ICLR 2021)
Created
2020-08-06
25 commits to master branch, last one about a year ago
[NeurIPS '23 Spotlight] Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
Created
2023-01-13
16 commits to main branch, last one 4 months ago
RuLES: a benchmark for evaluating rule-following in language models
Created
2023-11-03
31 commits to main branch, last one about a month ago
Code accompanying the paper Pretraining Language Models with Human Preferences
Created
2023-02-20
5 commits to master branch, last one 9 months ago
How to Make Safe AI? Let's Discuss! 💡|💬|🙌|📚
Created
2023-02-27
11 commits to main branch, last one about a year ago
📚 A curated list of papers & technical articles on AI Quality & Safety
Created
2023-04-19
28 commits to main branch, last one about a year ago
An unrestricted attack based on diffusion models that can achieve both good transferability and imperceptibility.
Created
2023-04-29
16 commits to main branch, last one about a month ago
Toolkits to create a human-in-the-loop approval layer to monitor and guide AI agents workflow in real-time.
Created
2024-10-13
137 commits to main branch, last one a day ago
[ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use
Created
2023-09-26
13 commits to main branch, last one 8 months ago
BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
Created
2023-06-14
3 commits to main branch, last one about a year ago
Attack to induce LLMs within hallucinations
Created
2023-09-29
22 commits to master branch, last one about a year ago
[CCS'24] SafeGen: Mitigating Unsafe Content Generation in Text-to-Image Models
Created
2023-12-04
17 commits to main branch, last one about a month ago
Safety Score for Pre-Trained Language Models
Created
2022-07-02
31 commits to main branch, last one about a year ago
Reading list for adversarial perspective and robustness in deep reinforcement learning.
ai-safety
safe-rlhf
ai-alignment
responsible-ai
adversarial-policies
machine-learning-safety
robust-machine-learning
deep-reinforcement-learning
meta-reinforcement-learning
safe-reinforcement-learning
adversarial-machine-learning
explainable-machine-learning
reinforcement-learning-safety
robust-reinforcement-learning
reinforcement-learning-alignment
artificial-intelligence-alignment
multiagent-reinforcement-learning
adversarial-reinforcement-learning
robust-deep-reinforcement-learning
Created
2023-09-08
15 commits to main branch, last one 5 months ago
[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning
Created
2023-10-08
31 commits to main branch, last one 6 months ago
[SafeAI'21] Feature Space Singularity for Out-of-Distribution Detection.
Created
2020-08-05
15 commits to master branch, last one 3 years ago
A project to add scalable state-of-the-art out-of-distribution detection (open set recognition) support by changing two lines of code! Perform efficient inferences (i.e., do not increase inference tim...
Created
2019-08-16
50 commits to master branch, last one 2 years ago
[ICCV2021 Oral] Fooling LiDAR by Attacking GPS Trajectory
Created
2020-10-06
27 commits to master branch, last one 2 years ago
A curated list of awesome resources for Artificial Intelligence Alignment research
Created
2018-11-16
19 commits to master branch, last one about a year ago
LangFair is a Python library for conducting use-case level LLM bias and fairness assessments
Created
2024-09-20
69 commits to main branch, last one 9 days ago
A curated list of awesome academic research, books, code of ethics, data sets, institutes, newsletters, principles, podcasts, reports, tools, regulations and standards related to Responsible, Trustwor...
Created
2021-09-05
270 commits to main branch, last one 3 days ago
Sparse probing paper full code.
Created
2023-05-02
5 commits to main branch, last one 11 months ago
A project to improve out-of-distribution detection (open set recognition) and uncertainty estimation by changing a few lines of code in your project! Perform efficient inferences (i.e., do not increas...
Created
2022-05-10
40 commits to master branch, last one 2 years ago
AI Safety Q&A web frontend
Created
2022-02-17
1,480 commits to master branch, last one 5 days ago