Search Results - RepositoryStats

45

518

apache-2.0

8

Deliver safe & effective language models

llm nlp mlops llm-test ai-safety ml-safety ai-testing benchmarks ml-testing llm-testing ethics-in-ai responsible-ai trustworthy-ai llm-as-evaluator model-assessment benchmark-framework large-language-models llm-evaluation-toolkit artificial-intelligence

Created 2022-11-18

5,706 commits to main branch, last one about a month ago

athina-evals athina-ai

17

277

unknown

5

Python SDK for running evaluations on LLM generated responses

llmops llm-ops llm-eval evaluation llm-evaluation evaluation-metrics evaluation-framework llm-evaluation-toolkit

Created 2023-11-22

791 commits to main branch, last one 8 days ago

just-eval Re-Align

6

85

mit

3

A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.

llm gpt4 llm-eval evaluation llm-evaluation llm-evaluation-toolkit

Created 2023-11-19

40 commits to main branch, last one about a year ago

parea-sdk-py parea-ai

6

76

apache-2.0

1

Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

llm llmops metrics llm-eval llm-tools generative-ai llm-evaluation good-first-issue llms-benchmarking prompt-engineering llm-evaluation-toolkit llm-evaluation-framework

Created 2023-07-24

1,092 commits to main branch, last one 2 months ago

KIEval zhuohaoyu

2

36

unknown

2

[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

llm acl2024 explainable-ai llm-evaluation machine-learning llm-evaluation-metrics llm-evaluation-toolkit llm-evaluation-framework

Created 2024-02-23

19 commits to master branch, last one 9 months ago

CodeEval-Pro CodeEval-Pro

2

27

unknown

2

Official repo for "HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation Task"

llm llm4code llm-reasoning llm-evaluation code-generation llm-evaluation-toolkit

Created 2024-12-05

27 commits to main branch, last one 15 days ago