Search Results - RepositoryStats

giskard Giskard-AI

277

4.1k

apache-2.0

33

🐢 Open-Source Evaluation & Testing for AI & LLM systems

Created 2022-03-06

10,170 commits to main branch, last one 2 days ago

awesome-machine-learning-interpretability jphall663

589

3.7k

cc0-1.0

133

A curated list of awesome responsible machine learning resources.

Created 2018-06-21

1,173 commits to master branch, last one a day ago

safe-rlhf PKU-Alignment

119

1.4k

apache-2.0

18

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

Created 2023-05-15

111 commits to main branch, last one 6 months ago

MOSS-RLHF OpenLMLab

101

1.3k

apache-2.0

35

Secrets of RLHF in Large Language Models Part I: PPO

rlhf ai-safety alignment

Created 2023-07-05

47 commits to main branch, last one 9 months ago

langtest JohnSnowLabs

41

505

apache-2.0

10

Deliver safe & effective language models

llm nlp mlops llm-test ai-safety ml-safety ai-testing benchmarks ml-testing llm-testing ethics-in-ai responsible-ai trustworthy-ai llm-as-evaluator model-assessment benchmark-framework large-language-models llm-evaluation-toolkit artificial-intelligence

Created 2022-11-18

5,461 commits to main branch, last one 2 months ago

tiger tigerlab-ai

26

390

apache-2.0

11

Open Source LLM toolkit to build trustworthy LLM applications. TigerArmor (AI safety), TigerRAG (embedding, RAG), TigerTune (fine-tuning)

llm rag aisafety ai-safety fine-tuning llm-training classification data-augmentation large-language-models

Created 2023-10-23

137 commits to main branch, last one about a year ago

PromptInject agencyenterprise

32

320

mit

11

PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Sa...

agi gpt-3 ai-safety ml-safety ai-alignment agi-alignment language-models chain-of-thought machine-learning prompt-engineering adversarial-attacks large-language-models

Created 2022-10-25

2 commits to main branch, last one 2 years ago

ethics hendrycks

44

261

mit

9

Aligning AI With Shared Human Values (ICLR 2021)

gpt-3 ai-safety ml-safety ethical-ai machine-ethics

Created 2020-08-06

25 commits to master branch, last one about a year ago

Thought-Cloning ShengranHu

21

253

mit

2

[NeurIPS '23 Spotlight] Thought Cloning: Learning to Think while Acting by Imitating Human Thinking

pytorch ai-safety deep-learning imitation-learning reinforcement-learning artificial-intelligence

Created 2023-01-13

16 commits to main branch, last one 5 months ago

llm_rules normster

15

214

apache-2.0

2

RuLES: a benchmark for evaluating rule-following in language models

gpt-4 ai-safety ai-security

Created 2023-11-03

32 commits to main branch, last one 27 days ago

pretraining-with-human-feedback tomekkorbak

14

178

mit

6

Code accompanying the paper Pretraining Language Models with Human Preferences

gpt rlhf ai-safety pretraining ai-alignment language-models decision-transformers reinforcement-learning

Created 2023-02-20

5 commits to master branch, last one 10 months ago

make-safe-ai lets-make-safe-ai

7

168

unknown

2

How to Make Safe AI? Let's Discuss! 💡|💬|🙌|📚

ai agi ai-safety ai-alignment artificial-intelligence artificial-general-intelligence

Created 2023-02-27

11 commits to main branch, last one about a year ago

awesome-ai-safety Giskard-AI

13

166

apache-2.0

3

📚 A curated list of papers & technical articles on AI Quality & Safety

Created 2023-04-19

28 commits to main branch, last one about a year ago

DiffAttack WindVChen

16

165

apache-2.0

3

An unrestricted attack based on diffusion models that can achieve both good transferability and imperceptibility.

ai-safety diffusion-models adverarial-attacks transferable-attacks unrestricted-attacks imperceptible-attacks diffusion-adversarial-attack

Created 2023-04-29

16 commits to main branch, last one 2 months ago

phantasm phantasmlabs

6

151

gpl-3.0

2

Toolkits to create a human-in-the-loop approval layer to monitor and guide AI agents workflow in real-time.

llm rust llmops ai-agents ai-safety dashboard monitoring ai-security open-source control-flow llm-security automation-tools approval-workflow human-in-the-loop human-computer-interaction

Created 2024-10-13

146 commits to main branch, last one 25 days ago

ToolEmu ryoungj

13

123

apache-2.0

4

[ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use

agent ai-safety language-agent language-model prompt-engineering large-language-models

Created 2023-09-26

13 commits to main branch, last one 9 months ago

beavertails PKU-Alignment

5

116

apache-2.0

6

BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).

gpt llm llms rlhf llama beaver safety datasets ai-safety safe-rlhf human-feedback language-model human-feedback-data large-language-model

Created 2023-06-14

3 commits to main branch, last one about a year ago

Hallucination-Attack PKU-YuanGroup

13

113

mit

2

Attack to induce LLMs within hallucinations

llm nlp ai-safety llm-safety deep-learning hallucinations machine-learning adversarial-attacks

Created 2023-09-29

22 commits to master branch, last one about a year ago

SafeGen_CCS2024 LetterLiGo

19

108

apache-2.0

12

[CCS'24] SafeGen: Mitigating Unsafe Content Generation in Text-to-Image Models

ai-safety ai-security generative-ai text-to-image thrustworthy-ai

Created 2023-12-04

17 commits to main branch, last one 2 months ago

adversarial-reinforcement-learning EzgiKorkmaz

5

94

unknown

6

Reading list for adversarial perspective and robustness in deep reinforcement learning.

ai-safety safe-rlhf ai-alignment responsible-ai adversarial-policies machine-learning-safety robust-machine-learning deep-reinforcement-learning meta-reinforcement-learning safe-reinforcement-learning adversarial-machine-learning explainable-machine-learning reinforcement-learning-safety robust-reinforcement-learning reinforcement-learning-alignment artificial-intelligence-alignment multiagent-reinforcement-learning adversarial-reinforcement-learning robust-deep-reinforcement-learning

Created 2023-09-08

15 commits to main branch, last one 6 months ago

SafeNLP microsoft

7

93

other

6

Safety Score for Pre-Trained Language Models

nlp ai-safety fairness-ai

Created 2022-07-02

31 commits to main branch, last one about a year ago

langfair cvs-health

14

87

other

1

LangFair is a Python library for conducting use-case level LLM bias and fairness assessments

ai llm bias python fairness ai-safety ethical-ai fairness-ai fairness-ml bias-detection responsible-ai fairness-testing large-language-models artificial-intelligence

Created 2024-09-20

183 commits to main branch, last one 20 hours ago

RAIN SafeAILab

4

85

bsd-2-clause

1

[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning

ai-safety alignment large-language-models

Created 2023-10-08

31 commits to main branch, last one 7 months ago

FSSD_OoD_Detection megvii-research

12

80

mit

7

[SafeAI'21] Feature Space Singularity for Out-of-Distribution Detection.

anomaly ai-safety ood-detection anomaly-detection out-of-distribution-detection

Created 2020-08-05

15 commits to master branch, last one 3 years ago

entropic-out-of-distribution-detection dlmacedo

10

74

apache-2.0

4

A project to add scalable state-of-the-art out-of-distribution detection (open set recognition) support by changing two lines of code! Perform efficient inferences (i.e., do not increase inference tim...

ood osr pytorch open-set ai-safety deep-learning ood-detection trustworthy-ai machine-learning anomaly-detection novelty-detection out-of-distribution open-set-recognition robust-machine-learning trustworthy-machine-learning out-of-distribution-detection

Created 2019-08-16

50 commits to master branch, last one 2 years ago

FLAT ai4ce

10

67

apache-2.0

5

[ICCV2021 Oral] Fooling LiDAR by Attacking GPS Trajectory

gnss lidar robotics ai-safety point-cloud 3d-perception deep-learning trustworthy-ai autonomous-driving 3d-object-detection adversarial-attacks trustworthy-machine-learning

Created 2020-10-06

27 commits to master branch, last one 2 years ago

awesome-ai-alignment dit7ya

11

67

unknown

5

A curated list of awesome resources for Artificial Intelligence Alignment research

awesome ai-safety ai-alignment awesome-list

Created 2018-11-16

19 commits to master branch, last one about a year ago

AwesomeResponsibleAI AthenaCore

10

58

mit

4

A curated list of awesome academic research, books, code of ethics, data sets, institutes, newsletters, principles, podcasts, reports, tools, regulations and standards related to Responsible, Trustwor...

ai xai ai-safety ethical-ai fairness-ai ai-alignment ai-standards awesome-list ai-governance ai-regulation explainable-ai responsible-ai trustworthy-ai interpretable-ai

Created 2021-09-05

296 commits to main branch, last one a day ago