56 results found Sort:

5.7k
75.6k
mit
452
Get up and running with Llama 3, Mistral, Gemma, and other large language models.
Created 2023-06-26
2,989 commits to main branch, last one a day ago
1.9k
17.7k
apache-2.0
157
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Created 2023-04-17
460 commits to main branch, last one about a month ago
340
3.8k
other
69
SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild
Created 2023-12-21
26 commits to master branch, last one 16 days ago
251
3.1k
apache-2.0
31
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
Created 2023-07-11
303 commits to main branch, last one 2 days ago
296
2.2k
mit
52
A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.
Created 2023-05-09
1,460 commits to master branch, last one 18 hours ago
203
2.1k
apache-2.0
19
ms-swift: Use PEFT or Full-parameter to finetune 250+ LLMs or 35+ MLLMs. (Qwen2, GLM4, Internlm2, Yi, Llama3, Llava, MiniCPM-V, Deepseek, Baichuan2, Phi3-Vision, ...)
Created 2023-08-01
667 commits to main branch, last one 10 hours ago
ChatGPT爆火,开启了通往AGI的关键一步,本项目旨在汇总那些ChatGPT的开源平替们,包括文本大模型、多模态大模型等,为大家提供一些便利
Created 2023-04-07
65 commits to main branch, last one 10 months ago
109
1.7k
apache-2.0
14
A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据!
Created 2023-08-01
173 commits to main branch, last one 10 hours ago
92
1.0k
cc-by-4.0
14
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for ...
Created 2023-05-18
42 commits to main branch, last one 10 days ago
Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
Created 2023-11-24
71 commits to develop branch, last one 4 months ago
56
956
apache-2.0
13
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
Created 2023-02-21
286 commits to main branch, last one 2 months ago
48
726
unknown
10
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
Created 2024-04-26
11 commits to main branch, last one about a month ago
68
603
apache-2.0
8
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks
Created 2023-12-01
526 commits to main branch, last one 4 hours ago
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
Created 2023-10-08
31 commits to master branch, last one 3 months ago
38
454
apache-2.0
13
A Framework of Small-scale Large Multimodal Models
Created 2024-02-21
174 commits to main branch, last one 7 days ago
21
440
gpl-3.0
8
Tag manager and captioner for image datasets
Created 2023-03-08
511 commits to main branch, last one 5 days ago
68
337
apache-2.0
8
RestAI is an AIaaS (AI as a Service) open-source platform. Built on top of LlamaIndex, Ollama and HF Pipelines. Supports any public LLM supported by LlamaIndex and any local LLM suported by Ollama. Pr...
Created 2023-05-18
750 commits to master branch, last one 4 days ago
Ollama API bindings for .NET
Created 2023-10-15
116 commits to main branch, last one a day ago
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
Created 2024-01-24
228 commits to main branch, last one 24 days ago
AI Device Template Featuring Whisper, TTS, Groq, Llama3, OpenAI and more
Created 2024-04-20
42 commits to main branch, last one about a month ago
12
247
apache-2.0
5
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
Created 2023-06-27
38 commits to main branch, last one 9 months ago
99
235
apache-2.0
21
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high per...
Created 2023-07-05
675 commits to develop branch, last one 12 days ago
llmcord.py • Talk to LLMs with your friends!
Created 2023-05-08
199 commits to main branch, last one 21 hours ago
14
230
bsd-3-clause
11
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Created 2023-06-15
366 commits to main branch, last one 3 months ago
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Created 2023-12-02
39 commits to main branch, last one 13 days ago
3
200
bsd-3-clause
4
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Created 2023-10-22
132 commits to main branch, last one 3 months ago
LLaVA server (llama.cpp).
Created 2023-10-13
26 commits to main branch, last one 8 months ago
Famous Vision Language Models and Their Architectures
Created 2024-02-15
221 commits to main branch, last one 12 days ago
8
158
apache-2.0
3
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
Created 2023-10-11
84 commits to main branch, last one 3 months ago
12
138
mit
5
MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.
Created 2024-04-16
142 commits to main branch, last one 2 days ago