65 results found Sort:

7.8k
97.8k
mit
587
Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.
Created 2023-06-26
3,602 commits to main branch, last one 12 hours ago
2.2k
20.2k
apache-2.0
157
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Created 2023-04-17
460 commits to main branch, last one 6 months ago
503
6.0k
apache-2.0
56
SGLang is a fast serving framework for large language models and vision language models.
Created 2024-01-08
1,219 commits to main branch, last one 13 hours ago
384
4.4k
other
65
SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild. Our new online demo is also released at suppixel.ai.
Created 2023-12-21
36 commits to master branch, last one 3 months ago
373
4.2k
apache-2.0
23
Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vis...
Created 2023-08-01
1,163 commits to main branch, last one 21 hours ago
309
4.0k
apache-2.0
34
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
Created 2023-07-11
330 commits to main branch, last one 7 days ago
175
2.9k
apache-2.0
19
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Created 2023-08-01
244 commits to main branch, last one 2 days ago
347
2.7k
mit
56
A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.
Created 2023-05-09
1,726 commits to master branch, last one 20 hours ago
ChatGPT爆火,开启了通往AGI的关键一步,本项目旨在汇总那些ChatGPT的开源平替们,包括文本大模型、多模态大模型等,为大家提供一些便利
Created 2023-04-07
65 commits to main branch, last one about a year ago
188
1.3k
apache-2.0
10
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
Created 2023-12-01
1,006 commits to main branch, last one 23 hours ago
108
1.2k
cc-by-4.0
15
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for ...
Created 2023-05-18
43 commits to main branch, last one 2 months ago
62
1.0k
apache-2.0
15
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
Created 2023-02-21
293 commits to main branch, last one about a month ago
60
810
unknown
10
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
Created 2024-04-26
11 commits to main branch, last one 6 months ago
35
749
gpl-3.0
13
Tag manager and captioner for image datasets
Created 2023-03-08
530 commits to main branch, last one 13 days ago
28
691
unknown
2
OpenCV+YOLO+LLAVA powered video surveillance system
Created 2024-10-07
13 commits to main branch, last one 24 days ago
66
647
apache-2.0
12
A Framework of Small-scale Large Multimodal Models
Created 2024-02-21
211 commits to main branch, last one 29 days ago
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
Created 2023-10-08
31 commits to master branch, last one 8 months ago
44
538
apache-2.0
32
EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Created 2024-06-27
81 commits to main branch, last one about a month ago
41
489
mit
6
MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.
Created 2024-04-16
163 commits to main branch, last one a day ago
Famous Vision Language Models and Their Architectures
Created 2024-02-15
231 commits to main branch, last one 2 months ago
AI-powered assistant to help you with your daily tasks, powered by Llama 3.2. It can recognize your voice, process natural language, and perform various actions based on your commands: summarizing tex...
Created 2024-09-26
62 commits to main branch, last one about a month ago
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
Created 2024-01-24
271 commits to main branch, last one 9 days ago
An open-source implementation for training LLaVA-NeXT.
Created 2024-05-11
36 commits to master branch, last one 23 days ago
74
386
apache-2.0
10
RESTai is an AIaaS (AI as a Service) open-source platform. Built on top of LlamaIndex & Langchain. Supports any public LLM supported by LlamaIndex and any local LLM suported by Ollama/vLLM/etc. Precis...
Created 2023-05-18
798 commits to master branch, last one 21 hours ago
A Discord LLM chat bot that supports any OpenAI compatible API (OpenAI, xAI, Mistral, Groq, OpenRouter, Ollama, LM Studio and more)
Created 2023-05-08
326 commits to main branch, last one 12 hours ago
133
352
apache-2.0
22
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high per...
Created 2023-07-05
909 commits to develop branch, last one a day ago
52
309
apache-2.0
10
InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.
Created 2024-01-16
478 commits to develop branch, last one 17 days ago
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Created 2023-12-02
44 commits to main branch, last one 4 months ago
AI Device Template Featuring Whisper, TTS, Groq, Llama3, OpenAI and more
Created 2024-04-20
42 commits to main branch, last one 6 months ago
12
258
apache-2.0
5
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
Created 2023-06-27
38 commits to main branch, last one about a year ago