193 results found Sort:

:metal: awesome-semantic-segmentation
Created 2015-10-03
417 commits to master branch, last one 3 years ago
368
4.0k
other
14
🪢 Open source LLM engineering platform: Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
Created 2023-05-18
1,889 commits to main branch, last one 20 hours ago
493
3.6k
mit
66
Arbitrary expression evaluation for golang
Created 2014-12-19
311 commits to master branch, last one 6 years ago
Building a modern functional compiler from first principles. (http://dev.stephendiehl.com/fun/)
Created 2015-01-05
299 commits to master branch, last one 7 years ago
736
3.3k
gpl-3.0
50
Python package for the evaluation of odometry and SLAM
Created 2017-09-13
428 commits to master branch, last one 6 days ago
206
3.1k
mit
17
Test your prompts, models, and RAGs. Catch regressions and improve prompt quality. LLM evals for OpenAI, Azure, Anthropic, Gemini, Mistral, Llama, Bedrock, Ollama, and other local & private models wit...
Created 2023-04-28
1,179 commits to main branch, last one 16 hours ago
153
3.1k
gpl-3.0
55
Klipse is a JavaScript plugin for embedding interactive code snippets in tech blogs.
Created 2015-11-19
1,024 commits to master branch, last one about a year ago
306
2.9k
apache-2.0
21
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Created 2023-06-15
565 commits to main branch, last one a day ago
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Created 2016-11-13
266 commits to master branch, last one 2 years ago
89
2.7k
unknown
35
SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese
Created 2023-05-02
247 commits to main branch, last one 9 days ago
A unified evaluation framework for large language models
Created 2023-06-13
239 commits to main branch, last one 5 days ago
An open-source visual programming environment for battle-testing prompts to LLMs.
Created 2023-03-26
369 commits to main branch, last one 14 days ago
170
2.0k
apache-2.0
20
UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform ro...
Created 2022-11-07
763 commits to main branch, last one 23 hours ago
228
1.8k
apache-2.0
50
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
Created 2022-03-30
943 commits to main branch, last one about a month ago
768
1.7k
other
54
:cloud: :rocket: :bar_chart: :chart_with_upwards_trend: Evaluating state of the art in AI
Created 2016-10-21
2,422 commits to master branch, last one about a month ago
Avalanche: an End-to-End Library for Continual Learning based on PyTorch.
Created 2020-03-05
3,909 commits to master branch, last one 22 hours ago
401
1.6k
other
48
(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"
Created 2019-06-19
221 commits to master branch, last one about a year ago
218
1.3k
other
29
Evaluation code for various unsupervised automated metrics for Natural Language Generation.
Created 2017-06-27
87 commits to master branch, last one 2 months ago
The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".
Created 2023-07-02
254 commits to main branch, last one 5 months ago
174
1.2k
apache-2.0
9
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Created 2023-05-25
498 commits to main branch, last one a day ago
129
1.2k
unknown
26
Short and sweet LISP editing
Created 2014-01-10
2,707 commits to master branch, last one about a year ago
255
1.1k
apache-2.0
36
FuzzBench - Fuzzer benchmarking as a service.
Created 2020-02-04
1,336 commits to master branch, last one 12 days ago
94
904
apache-2.0
8
The production toolkit for LLMs. Observability, prompt management and evaluations.
Created 2023-05-12
1,072 commits to main branch, last one a day ago
High-fidelity performance metrics for generative models in PyTorch
Created 2020-04-23
202 commits to master branch, last one 4 months ago
SemanticKITTI API for visualizing dataset, processing data, and evaluating results.
Created 2019-07-24
45 commits to master branch, last one 17 days ago
82
705
bsd-3-clause
25
Expression evaluation in golang
Created 2017-09-27
153 commits to master branch, last one 8 days ago
115
692
mit
17
A General Toolbox for Identifying Object Detection Errors
Created 2020-07-16
16 commits to master branch, last one 3 years ago
121
682
apache-2.0
18
中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Created 2021-04-30
98 commits to main branch, last one about a year ago