78 results found Sort:

279
3.0k
mit
49
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
Created 2024-01-26
139 commits to main branch, last one about a month ago
311
2.7k
apache-2.0
38
ModelScope-Agent: An agent framework connecting models in ModelScope with the world
Created 2023-08-03
474 commits to master branch, last one 7 days ago
174
2.6k
apache-2.0
28
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Created 2024-09-10
13 commits to main branch, last one 7 days ago
115
1.8k
apache-2.0
21
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Created 2024-06-17
59 commits to main branch, last one 21 days ago
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
Created 2024-01-22
58 commits to main branch, last one about a month ago
101
1.6k
apache-2.0
31
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Created 2023-07-04
134 commits to main branch, last one about a month ago
59
968
other
40
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
Created 2024-08-10
65 commits to main branch, last one 28 days ago
69
933
apache-2.0
18
A family of lightweight multimodal models.
Created 2024-01-31
114 commits to main branch, last one 2 days ago
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
Created 2023-11-07
404 commits to main branch, last one 11 months ago
29
611
unknown
15
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
Created 2023-09-26
106 commits to main branch, last one 5 months ago
52
587
mit
21
Speech, Language, Audio, Music Processing with Large Language Model
Created 2023-10-23
696 commits to main branch, last one 3 days ago
A collection of resources on applications of multi-modal learning in medical imaging.
Created 2022-07-13
151 commits to main branch, last one 9 days ago
41
530
bsd-3-clause
10
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Created 2023-06-26
107 commits to main branch, last one 22 days ago
31
527
apache-2.0
7
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
Created 2024-06-13
31 commits to main branch, last one 16 days ago
实时语音交互数字人,支持端到端语音方案(GLM-4-Voice - THG)和级联方案(ASR-LLM-TTS-THG)。可自定义形象与音色,无须训练,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and cas...
Created 2024-10-18
40 commits to master branch, last one 6 days ago
12
406
unknown
5
✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Created 2024-06-02
46 commits to main branch, last one 5 months ago
24
386
unknown
14
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Created 2024-03-18
78 commits to main branch, last one about a month ago
20
381
unknown
4
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train...
Created 2023-10-24
133 commits to master branch, last one about a month ago
19
379
unknown
8
MLCD & UNICOM : Large-Scale Visual Representation Model
Created 2023-02-15
101 commits to main branch, last one 2 days ago
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Created 2023-11-17
346 commits to main branch, last one 6 days ago
Research Trends in LLM-guided Multimodal Learning.
Created 2023-05-29
16 commits to main branch, last one about a year ago
29
344
unknown
12
A Gradio demo of MGIE
Created 2023-09-28
1 commits to main branch, last one 9 months ago
14
313
apache-2.0
7
Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"
Created 2024-06-12
47 commits to main branch, last one 4 days ago
11
284
apache-2.0
5
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
Created 2023-06-06
18 commits to main branch, last one 10 months ago
Curated papers on Large Language Models in Healthcare and Medical domain
Created 2023-06-28
45 commits to main branch, last one 4 months ago
[WACV 2024 Survey Paper] Multimodal Large Language Models for Autonomous Driving
Created 2023-10-22
90 commits to main branch, last one 8 months ago
[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models
Created 2024-06-14
19 commits to main branch, last one about a month ago
22
181
apache-2.0
5
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, qwen-vl, qwen2-vl, phi3-v etc.
Created 2024-07-20
95 commits to main branch, last one 29 days ago