10 results found Sort:
- Filter by Primary Language:
- Python (4)
- TypeScript (4)
- Jupyter Notebook (2)
- +
AI Observability & Evaluation
Created
2022-11-09
4,134 commits to main branch, last one a day ago
Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including CrewAI, Langchain, Autogen, AG2, and CamelAI
Created
2023-08-15
597 commits to main branch, last one 4 days ago
Laminar - open-source all-in-one platform for engineering AI products. Crate data flywheel for you AI app. Traces, Evals, Datasets, Labels. YC S24.
Created
2024-08-29
325 commits to main branch, last one 21 hours ago
The TypeScript AI framework.
Created
2024-08-06
6,318 commits to main branch, last one 3 hours ago
🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLite
Created
2024-06-10
63 commits to main branch, last one 27 days ago
Test your LLM-powered apps with TypeScript. No API key required.
Created
2024-11-12
345 commits to main branch, last one 4 days ago
Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
Created
2024-08-08
488 commits to main branch, last one 2 days ago
[NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding
evals
gpt-4
reasoning
gemini-pro
navigation
perception
neurips-2024
summarization
visual-reasoning
benchmark-dataset
egocentric-videos
spatial-intelligence
multiple-choice-questions
long-context-understanding
video-language-understanding
multimodal-large-language-models
1-hour-video-language-understanding
long-form-video-language-understanding
Created
2024-11-27
9 commits to main branch, last one 24 days ago
A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).
Created
2024-05-21
31 commits to main branch, last one 5 months ago
Evalica, your favourite evaluation toolkit
Created
2024-06-15
345 commits to master branch, last one 7 days ago