193 results found Sort:
- Filter by Primary Language:
- Python (115)
- Jupyter Notebook (12)
- TypeScript (8)
- C++ (6)
- JavaScript (5)
- C# (5)
- MATLAB (4)
- Go (3)
- Java (3)
- CSS (2)
- Shell (2)
- Kotlin (2)
- Emacs Lisp (2)
- Rust (1)
- Svelte (1)
- PHP (1)
- Haskell (1)
- DM (1)
- MDX (1)
- HTML (1)
- R (1)
- +
:metal: awesome-semantic-segmentation
Created
2015-10-03
417 commits to master branch, last one 3 years ago
🪢 Open source LLM engineering platform: Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
Created
2023-05-18
1,889 commits to main branch, last one 20 hours ago
Arbitrary expression evaluation for golang
Created
2014-12-19
311 commits to master branch, last one 6 years ago
Building a modern functional compiler from first principles. (http://dev.stephendiehl.com/fun/)
Created
2015-01-05
299 commits to master branch, last one 7 years ago
Python package for the evaluation of odometry and SLAM
Created
2017-09-13
428 commits to master branch, last one 6 days ago
Test your prompts, models, and RAGs. Catch regressions and improve prompt quality. LLM evals for OpenAI, Azure, Anthropic, Gemini, Mistral, Llama, Bedrock, Ollama, and other local & private models wit...
Created
2023-04-28
1,179 commits to main branch, last one 16 hours ago
Klipse is a JavaScript plugin for embedding interactive code snippets in tech blogs.
Created
2015-11-19
1,024 commits to master branch, last one about a year ago
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Created
2023-06-15
565 commits to main branch, last one a day ago
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Created
2016-11-13
266 commits to master branch, last one 2 years ago
SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese
Created
2023-05-02
247 commits to main branch, last one 9 days ago
A unified evaluation framework for large language models
Created
2023-06-13
239 commits to main branch, last one 5 days ago
An open-source visual programming environment for battle-testing prompts to LLMs.
Created
2023-03-26
369 commits to main branch, last one 14 days ago
UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform ro...
Created
2022-11-07
763 commits to main branch, last one 23 hours ago
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
Created
2022-03-30
943 commits to main branch, last one about a month ago
:cloud: :rocket: :bar_chart: :chart_with_upwards_trend: Evaluating state of the art in AI
Created
2016-10-21
2,422 commits to master branch, last one about a month ago
Avalanche: an End-to-End Library for Continual Learning based on PyTorch.
Created
2020-03-05
3,909 commits to master branch, last one 22 hours ago
(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"
Created
2019-06-19
221 commits to master branch, last one about a year ago
Multi-class confusion matrix library in Python
Created
2018-01-22
3,033 commits to master branch, last one 12 months ago
Evaluation code for various unsupervised automated metrics for Natural Language Generation.
Created
2017-06-27
87 commits to master branch, last one 2 months ago
The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".
Created
2023-07-02
254 commits to main branch, last one 5 months ago
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Created
2023-05-25
498 commits to main branch, last one a day ago
Short and sweet LISP editing
Created
2014-01-10
2,707 commits to master branch, last one about a year ago
XAI - An eXplainability toolbox for machine learning
Created
2019-01-11
91 commits to master branch, last one 2 years ago
FuzzBench - Fuzzer benchmarking as a service.
Created
2020-02-04
1,336 commits to master branch, last one 12 days ago
The production toolkit for LLMs. Observability, prompt management and evaluations.
Created
2023-05-12
1,072 commits to main branch, last one a day ago
High-fidelity performance metrics for generative models in PyTorch
Created
2020-04-23
202 commits to master branch, last one 4 months ago
SemanticKITTI API for visualizing dataset, processing data, and evaluating results.
Created
2019-07-24
45 commits to master branch, last one 17 days ago
Expression evaluation in golang
Created
2017-09-27
153 commits to master branch, last one 8 days ago
A General Toolbox for Identifying Object Detection Errors
Created
2020-07-16
16 commits to master branch, last one 3 years ago
中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Created
2021-04-30
98 commits to main branch, last one about a year ago