237 results found Sort:
:metal: awesome-semantic-segmentation
Created
2015-10-03
417 commits to master branch, last one 3 years ago
Supercharge Your LLM Application Evaluations 🚀
Created
2023-05-08
670 commits to main branch, last one 19 hours ago
🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
Created
2023-05-18
2,966 commits to main branch, last one 13 hours ago
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command ...
Created
2023-04-28
2,765 commits to main branch, last one 12 hours ago
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Created
2023-06-15
762 commits to main branch, last one 22 hours ago
Arbitrary expression evaluation for golang
Created
2014-12-19
311 commits to master branch, last one 7 years ago
Python package for the evaluation of odometry and SLAM
Created
2017-09-13
448 commits to master branch, last one 15 days ago
Building a modern functional compiler from first principles. (http://dev.stephendiehl.com/fun/)
Created
2015-01-05
299 commits to master branch, last one 7 years ago
Klipse is a JavaScript plugin for embedding interactive code snippets in tech blogs.
Created
2015-11-19
1,028 commits to master branch, last one about a month ago
SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese
Created
2023-05-02
247 commits to main branch, last one 5 months ago
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Created
2016-11-13
266 commits to master branch, last one 3 years ago
A unified evaluation framework for large language models
Created
2023-06-13
259 commits to main branch, last one 2 months ago
An open-source visual programming environment for battle-testing prompts to LLMs.
Created
2023-03-26
371 commits to main branch, last one 16 days ago
UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform ro...
Created
2022-11-07
770 commits to main branch, last one 3 months ago
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
Created
2022-03-30
953 commits to main branch, last one about a month ago
🧊 Open source LLM-Observability Platform for Developers. One-line integration for monitoring, metrics, evals, agent tracing, prompt management, playground, etc. Supports OpenAI SDK, Vercel AI SDK, An...
Created
2023-01-31
2,947 commits to main branch, last one 14 hours ago
Avalanche: an End-to-End Library for Continual Learning based on PyTorch.
Created
2020-03-05
3,929 commits to master branch, last one 16 days ago
:cloud: :rocket: :bar_chart: :chart_with_upwards_trend: Evaluating state of the art in AI
Created
2016-10-21
2,426 commits to master branch, last one 2 months ago
(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"
Created
2019-06-19
221 commits to master branch, last one about a year ago
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Created
2023-05-25
589 commits to main branch, last one 4 days ago
Multi-class confusion matrix library in Python
Created
2018-01-22
3,072 commits to master branch, last one about a month ago
The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".
Created
2023-07-02
256 commits to main branch, last one 5 months ago
Evaluation code for various unsupervised automated metrics for Natural Language Generation.
Created
2017-06-27
87 commits to master branch, last one 8 months ago
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
Created
2023-12-01
1,006 commits to main branch, last one 24 hours ago
Short and sweet LISP editing
Created
2014-01-10
2,707 commits to master branch, last one about a year ago
XAI - An eXplainability toolbox for machine learning
Created
2019-01-11
91 commits to master branch, last one 3 years ago
Laminar - open-source all-in-one platform for engineering AI products. Traces, Evals, Datasets, Labels. YC S24.
Created
2024-08-29
186 commits to main branch, last one a day ago
FuzzBench - Fuzzer benchmarking as a service.
Created
2020-02-04
1,349 commits to master branch, last one about a month ago
The production toolkit for LLMs. Observability, prompt management and evaluations.
Created
2023-05-12
1,339 commits to main branch, last one 2 days ago
High-fidelity performance metrics for generative models in PyTorch
Created
2020-04-23
202 commits to master branch, last one 9 months ago