81 results found Sort:

2.6k
21.1k
mit
305
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Created 2019-07-23
1,236 commits to master branch, last one about a month ago
406
4.0k
mit
62
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
Created 2024-01-26
197 commits to main branch, last one a day ago
349
3.5k
bsd-3-clause
61
Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model
Created 2023-08-30
249 commits to main branch, last one 5 months ago
[CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System
Created 2024-11-12
62 commits to main branch, last one about a month ago
Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 🍓
Created 2022-11-05
185 commits to main branch, last one 23 days ago
SpatialLM: Large Language Model for Spatial Understanding
Created 2025-03-14
8 commits to main branch, last one 14 days ago
172
2.8k
apache-2.0
43
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Created 2023-09-26
416 commits to main branch, last one 2 months ago
128
2.2k
apache-2.0
33
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Created 2023-07-04
135 commits to main branch, last one 3 months ago
236
2.1k
apache-2.0
32
Agent S: an open agentic framework that uses computers like a human
Created 2024-10-09
213 commits to main branch, last one 2 days ago
129
1.9k
apache-2.0
23
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Created 2024-06-17
59 commits to main branch, last one 5 months ago
176
1.8k
mit
100
Pioneering Multimodal Reasoning with CoT
Created 2025-03-15
123 commits to main branch, last one 2 days ago
🚀🚀🚀 A collection of some awesome public YOLO object detection series projects and the related object detection datasets.
Created 2022-02-19
408 commits to main branch, last one 2 days ago
65
1.0k
apache-2.0
26
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Created 2025-01-06
39 commits to main branch, last one 3 days ago
75
1.0k
apache-2.0
19
A family of lightweight multimodal models.
Created 2024-01-31
114 commits to main branch, last one 4 months ago
43
816
apache-2.0
14
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
Created 2023-12-17
31 commits to main branch, last one about a month ago
🚀🚀🚀A collection of some wesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applica...
Created 2023-02-15
163 commits to main branch, last one 2 days ago
39
654
unknown
28
Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs
Created 2024-06-27
106 commits to main branch, last one 5 days ago
31
635
unknown
16
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models
Created 2023-09-26
107 commits to main branch, last one 3 months ago
76
613
apache-2.0
8
OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.
Created 2024-10-30
34 commits to main branch, last one about a month ago
44
555
apache-2.0
27
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
Created 2024-04-21
30 commits to main branch, last one 10 months ago
30
520
unknown
15
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Created 2024-03-18
78 commits to main branch, last one 5 months ago
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
Created 2024-01-24
274 commits to main branch, last one about a month ago
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Created 2023-11-17
357 commits to main branch, last one 7 days ago
23
437
unknown
6
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train...
Created 2023-10-24
135 commits to master branch, last one about a month ago
24
391
apache-2.0
8
This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024 Oral
Created 2023-11-27
2 commits to master branch, last one 8 months ago
8
319
mit
10
EVE Series: Encoder-Free Vision-Language Models from BAAI
Created 2024-06-14
26 commits to main branch, last one about a month ago
Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context l...
Created 2023-06-11
42 commits to main branch, last one 23 days ago
11
296
apache-2.0
6
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
Created 2023-06-06
18 commits to main branch, last one about a year ago
14
291
apache-2.0
6
VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
Created 2025-01-21
14 commits to main branch, last one 3 days ago
29
268
other
15
✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy
Created 2024-12-14
46 commits to main branch, last one 22 days ago