266 results found Sort:

:metal: awesome-semantic-segmentation
Created 2015-10-03
417 commits to master branch, last one 4 years ago
911
9.9k
other
31
πŸͺ’ Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
Created 2023-05-18
3,908 commits to main branch, last one 19 hours ago
878
8.6k
apache-2.0
42
Supercharge Your LLM Application Evaluations πŸš€
Created 2023-05-08
791 commits to main branch, last one 5 days ago
498
6.0k
mit
20
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command ...
Created 2023-04-28
4,071 commits to main branch, last one 7 hours ago
529
5.1k
apache-2.0
27
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Created 2023-06-15
890 commits to main branch, last one a day ago
512
3.9k
mit
63
Arbitrary expression evaluation for golang
This repository has been archived (exclude archived)
Created 2014-12-19
312 commits to master branch, last one 7 days ago
291
3.8k
apache-2.0
30
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
Created 2024-01-10
847 commits to main branch, last one 29 days ago
763
3.7k
gpl-3.0
48
Python package for the evaluation of odometry and SLAM
Created 2017-09-13
468 commits to master branch, last one 12 days ago
352
3.5k
apache-2.0
20
🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 πŸ“
Created 2023-01-31
3,607 commits to main branch, last one 7 hours ago
Building a modern functional compiler from first principles. (http://dev.stephendiehl.com/fun/)
Created 2015-01-05
299 commits to master branch, last one 7 years ago
230
3.3k
other
33
The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.
Created 2024-07-23
1,481 commits to main branch, last one a day ago
104
3.1k
unknown
38
SuperCLUE: δΈ­ζ–‡ι€šη”¨ε€§ζ¨‘εž‹η»Όεˆζ€§εŸΊε‡† | A Benchmark for Foundation Models in Chinese
Created 2023-05-02
247 commits to main branch, last one 10 months ago
148
3.1k
gpl-3.0
53
Klipse is a JavaScript plugin for embedding interactive code snippets in tech blogs.
Created 2015-11-19
1,028 commits to master branch, last one 6 months ago
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Created 2016-11-13
266 commits to master branch, last one 3 years ago
A unified evaluation framework for large language models
Created 2023-06-13
259 commits to main branch, last one 6 months ago
An open-source visual programming environment for battle-testing prompts to LLMs.
Created 2023-03-26
396 commits to main branch, last one 5 days ago
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
Created 2024-03-07
1,296 commits to main branch, last one 22 hours ago
199
2.3k
apache-2.0
19
UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform ro...
Created 2022-11-07
770 commits to main branch, last one 8 months ago
271
2.2k
apache-2.0
44
πŸ€— Evaluate: A library for easily evaluating machine learning models and datasets.
Created 2022-03-30
954 commits to main branch, last one 2 months ago
308
2.1k
apache-2.0
12
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
Created 2023-12-01
1,260 commits to main branch, last one 20 hours ago
Avalanche: an End-to-End Library for Continual Learning based on PyTorch.
Created 2020-03-05
3,948 commits to master branch, last one 21 days ago
850
1.8k
other
52
:cloud: :rocket: :bar_chart: :chart_with_upwards_trend: Evaluating state of the art in AI
Created 2016-10-21
2,456 commits to master branch, last one a day ago
103
1.8k
apache-2.0
11
Laminar - open-source all-in-one platform for engineering AI products. Crate data flywheel for you AI app. Traces, Evals, Datasets, Labels. YC S24.
Created 2024-08-29
408 commits to main branch, last one 16 days ago
406
1.7k
other
50
(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"
Created 2019-06-19
221 commits to master branch, last one about a year ago
263
1.7k
apache-2.0
9
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Created 2023-05-25
595 commits to main branch, last one 3 months ago
The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".
Created 2023-07-02
256 commits to main branch, last one 10 months ago
πŸ“° Must-read papers and blogs on LLM based Long Context Modeling πŸ”₯
Created 2023-09-17
227 commits to main branch, last one a day ago
224
1.4k
other
27
Evaluation code for various unsupervised automated metrics for Natural Language Generation.
Created 2017-06-27
87 commits to master branch, last one about a year ago
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
Created 2024-01-26
351 commits to main branch, last one a day ago