95 results found Sort:
- Filter by Primary Language:
- Python (68)
- Jupyter Notebook (8)
- HTML (2)
- C++ (1)
- +
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
multi-modality
chain-of-thought
instruction-tuning
in-context-learning
instruction-following
large-language-models
visual-instruction-tuning
large-vision-language-model
multimodal-chain-of-thought
large-vision-language-models
multimodal-instruction-tuning
multimodal-in-context-learning
multimodal-large-language-models
Created
2023-05-19
793 commits to main branch, last one 3 days ago
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
Created
2024-01-26
159 commits to main branch, last one 3 days ago
ModelScope-Agent: An agent framework connecting models in ModelScope with the world
Created
2023-08-03
474 commits to master branch, last one 2 months ago
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Created
2024-09-10
13 commits to main branch, last one 2 months ago
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Created
2023-07-04
135 commits to main branch, last one about a month ago
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Created
2024-08-10
126 commits to main branch, last one 10 days ago
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Created
2024-06-17
59 commits to main branch, last one 3 months ago
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
Created
2024-01-22
59 commits to main branch, last one about a month ago
A family of lightweight multimodal models.
Created
2024-01-31
114 commits to main branch, last one 2 months ago
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
Created
2023-11-07
404 commits to main branch, last one about a year ago
Speech, Language, Audio, Music Processing with Large Language Model
Created
2023-10-23
882 commits to main branch, last one 9 days ago
实时语音交互数字人,支持端到端语音方案(GLM-4-Voice - THG)和级联方案(ASR-LLM-TTS-THG)。可自定义形象与音色,无须训练,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and cas...
Created
2024-10-18
40 commits to master branch, last one 2 months ago
A collection of resources on applications of multi-modal learning in medical imaging.
Created
2022-07-13
154 commits to main branch, last one 8 days ago
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs
Created
2023-09-26
107 commits to main branch, last one about a month ago
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
Created
2024-06-13
32 commits to main branch, last one 2 months ago
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Created
2023-06-26
122 commits to main branch, last one 2 days ago
MLCD & UNICOM : Large-Scale Visual Representation Model
Created
2023-02-15
123 commits to main branch, last one 8 days ago
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Created
2024-03-18
78 commits to main branch, last one 3 months ago
✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Created
2024-06-02
50 commits to main branch, last one about a month ago
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Created
2023-11-17
348 commits to main branch, last one 13 days ago
The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insigh...
tutorial
awesome-list
image-text-matching
large-vision-models
vision-and-language
image-text-retrieval
large-language-model
video-text-retrieval
cross-modal-retrieval
large-language-models
multimodal-pretraining
video-text-recognition
memory-efficient-tuning
text-to-image-synthesis
text-to-image-generation
text-to-video-generation
visual-semantic-embedding
large-vision-language-models
parameter-efficient-fine-tuning
multimodal-large-language-models
Created
2020-12-22
130 commits to main branch, last one about a month ago
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train...
Created
2023-10-24
134 commits to master branch, last one about a month ago
Research Trends in LLM-guided Multimodal Learning.
Created
2023-05-29
16 commits to main branch, last one about a year ago
A Gradio demo of MGIE
Created
2023-09-28
1 commits to main branch, last one 11 months ago
Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"
Created
2024-06-12
48 commits to main branch, last one 17 days ago
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
Created
2025-01-07
8 commits to main branch, last one 18 days ago
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
Created
2023-06-06
18 commits to main branch, last one about a year ago
Curated papers on Large Language Models in Healthcare and Medical domain
Created
2023-06-28
45 commits to main branch, last one 6 months ago
[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models
Created
2024-06-14
19 commits to main branch, last one 4 months ago
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
Created
2024-11-04
21 commits to main branch, last one 29 days ago