7 results found Sort:
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
multi-modality
chain-of-thought
instruction-tuning
in-context-learning
instruction-following
large-language-models
visual-instruction-tuning
large-vision-language-model
multimodal-chain-of-thought
large-vision-language-models
multimodal-instruction-tuning
multimodal-in-context-learning
multimodal-large-language-models
Created
2023-05-19
793 commits to main branch, last one 6 days ago
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
Created
2023-12-17
29 commits to main branch, last one 6 months ago
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
Created
2025-01-07
8 commits to main branch, last one 21 days ago
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.
Created
2024-07-20
109 commits to main branch, last one a day ago
A collection of visual instruction tuning datasets.
Created
2023-10-07
24 commits to main branch, last one 10 months ago
🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)
Created
2023-07-02
35 commits to main branch, last one about a year ago
[EMNLP 2024] A Video Chat Agent with Temporal Prior
Created
2024-02-25
17 commits to main branch, last one about a month ago