56 results found Sort:

2.6k
20.4k
mit
307
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Created 2019-07-23
1,229 commits to master branch, last one 6 days ago
300
3.2k
mit
53
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
Created 2024-01-26
139 commits to main branch, last one 2 months ago
159
2.7k
apache-2.0
43
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Created 2023-09-26
409 commits to main branch, last one 3 days ago
203
2.3k
other
25
Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System
Created 2024-11-12
50 commits to main branch, last one 5 days ago
Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought and OpenAI o1 🍓
Created 2022-11-05
173 commits to main branch, last one 4 days ago
115
2.0k
apache-2.0
33
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Created 2023-07-04
134 commits to main branch, last one 2 months ago
118
1.8k
apache-2.0
23
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Created 2024-06-17
59 commits to main branch, last one about a month ago
71
962
apache-2.0
20
A family of lightweight multimodal models.
Created 2024-01-31
114 commits to main branch, last one about a month ago
42
780
apache-2.0
14
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
Created 2023-12-17
29 commits to main branch, last one 4 months ago
95
703
apache-2.0
22
Agent S: an open agentic framework that uses computers like a human
Created 2024-10-09
88 commits to main branch, last one 15 days ago
29
619
unknown
15
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
Created 2023-09-26
106 commits to main branch, last one 6 months ago
61
585
apache-2.0
36
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
Created 2024-04-21
30 commits to main branch, last one 6 months ago
45
552
apache-2.0
31
EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Created 2024-06-27
81 commits to main branch, last one 3 months ago
29
465
apache-2.0
13
This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024 Oral
Created 2023-11-27
2 commits to master branch, last one 4 months ago
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
Created 2024-01-24
271 commits to main branch, last one about a month ago
24
414
unknown
14
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Created 2024-03-18
78 commits to main branch, last one 2 months ago
20
390
unknown
4
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train...
Created 2023-10-24
134 commits to master branch, last one 11 days ago
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Created 2023-11-17
346 commits to main branch, last one about a month ago
11
286
apache-2.0
5
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
Created 2023-06-06
18 commits to main branch, last one 11 months ago
Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context l...
Created 2023-06-11
35 commits to main branch, last one 4 months ago
[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models
Created 2024-06-14
19 commits to main branch, last one 2 months ago
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".
Created 2024-07-03
20 commits to main branch, last one 25 days ago
19
221
apache-2.0
5
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
Created 2023-05-22
4 commits to main branch, last one about a year ago
15
190
apache-2.0
9
Official code for Paper "Mantis: Multi-Image Instruction Tuning" (TMLR2024)
Created 2024-04-12
230 commits to main branch, last one 2 days ago
22
182
bsd-3-clause
3
[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering
Created 2023-05-10
36 commits to main branch, last one 11 months ago
12
167
unknown
3
AUITestAgent is the first automatic, natural language-driven GUI testing tool for mobile apps, capable of fully automating the entire process of GUI interaction and function verification.
Created 2024-07-11
22 commits to main branch, last one 5 months ago
14
150
bsd-3-clause
7
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
Created 2024-05-22
96 commits to main branch, last one 10 days ago
[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection
Created 2024-03-15
2 commits to main branch, last one 9 months ago
Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)
Created 2024-04-15
98 commits to main branch, last one 4 months ago
7
140
apache-2.0
4
A RLHF Infrastructure for Vision-Language Models
Created 2023-12-27
7 commits to main branch, last one about a month ago