242 results found Sort:

:metal: awesome-semantic-segmentation
Created 2015-10-03
417 commits to master branch, last one 3 years ago
771
7.6k
apache-2.0
38
Supercharge Your LLM Application Evaluations 🚀
Created 2023-05-08
705 commits to main branch, last one 2 days ago
667
7.2k
other
28
🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
Created 2023-05-18
3,236 commits to main branch, last one 22 hours ago
405
5.0k
mit
21
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command ...
Created 2023-04-28
3,116 commits to main branch, last one 16 hours ago
457
4.3k
apache-2.0
26
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Created 2023-06-15
807 commits to main branch, last one a day ago
507
3.8k
mit
64
Arbitrary expression evaluation for golang
Created 2014-12-19
311 commits to master branch, last one 7 years ago
760
3.5k
gpl-3.0
49
Python package for the evaluation of odometry and SLAM
Created 2017-09-13
453 commits to master branch, last one 3 days ago
Building a modern functional compiler from first principles. (http://dev.stephendiehl.com/fun/)
Created 2015-01-05
299 commits to master branch, last one 7 years ago
147
3.1k
gpl-3.0
54
Klipse is a JavaScript plugin for embedding interactive code snippets in tech blogs.
Created 2015-11-19
1,028 commits to master branch, last one 2 months ago
97
3.0k
unknown
39
SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese
Created 2023-05-02
247 commits to main branch, last one 6 months ago
228
3.0k
apache-2.0
23
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
Created 2024-01-10
835 commits to main branch, last one a day ago
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Created 2016-11-13
266 commits to master branch, last one 3 years ago
A unified evaluation framework for large language models
Created 2023-06-13
259 commits to main branch, last one 3 months ago
An open-source visual programming environment for battle-testing prompts to LLMs.
Created 2023-03-26
373 commits to main branch, last one a day ago
193
2.2k
apache-2.0
21
UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform ro...
Created 2022-11-07
770 commits to main branch, last one 4 months ago
250
2.2k
apache-2.0
10
🧊 Open source LLM-Observability Platform for Developers. One-line integration for monitoring, metrics, evals, agent tracing, prompt management, playground, etc. Supports OpenAI SDK, Vercel AI SDK, An...
Created 2023-01-31
3,061 commits to main branch, last one 18 hours ago
263
2.1k
apache-2.0
47
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
Created 2022-03-30
953 commits to main branch, last one 3 months ago
Avalanche: an End-to-End Library for Continual Learning based on PyTorch.
Created 2020-03-05
3,929 commits to master branch, last one about a month ago
800
1.8k
other
54
:cloud: :rocket: :bar_chart: :chart_with_upwards_trend: Evaluating state of the art in AI
Created 2016-10-21
2,426 commits to master branch, last one 3 months ago
403
1.7k
other
49
(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"
Created 2019-06-19
221 commits to master branch, last one about a year ago
245
1.6k
apache-2.0
8
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Created 2023-05-25
589 commits to main branch, last one about a month ago
212
1.5k
apache-2.0
11
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
Created 2023-12-01
1,084 commits to main branch, last one a day ago
The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".
Created 2023-07-02
256 commits to main branch, last one 6 months ago
224
1.4k
other
28
Evaluation code for various unsupervised automated metrics for Natural Language Generation.
Created 2017-06-27
87 commits to master branch, last one 9 months ago
68
1.3k
apache-2.0
5
Laminar - open-source all-in-one platform for engineering AI products. Crate data flywheel for you AI app. Traces, Evals, Datasets, Labels. YC S24.
Created 2024-08-29
240 commits to main branch, last one 13 days ago
132
1.2k
unknown
27
Short and sweet LISP editing
Created 2014-01-10
2,707 commits to master branch, last one about a year ago
273
1.1k
apache-2.0
35
FuzzBench - Fuzzer benchmarking as a service.
Created 2020-02-04
1,349 commits to master branch, last one 2 months ago
137
1.1k
apache-2.0
5
The production toolkit for LLMs. Observability, prompt management and evaluations.
Created 2023-05-12
1,362 commits to main branch, last one 6 days ago