6 results found Sort:
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
multi-modality
chain-of-thought
instruction-tuning
in-context-learning
instruction-following
large-language-models
visual-instruction-tuning
large-vision-language-model
multimodal-chain-of-thought
large-vision-language-models
multimodal-instruction-tuning
multimodal-in-context-learning
multimodal-large-language-models
Created
2023-05-19
782 commits to main branch, last one 17 hours ago
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
Created
2023-12-17
29 commits to main branch, last one 4 months ago
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.
Created
2024-07-20
104 commits to main branch, last one 5 days ago
A collection of visual instruction tuning datasets.
Created
2023-10-07
24 commits to main branch, last one 9 months ago
🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)
Created
2023-07-02
35 commits to main branch, last one about a year ago
[EMNLP 2024] A Video Chat Agent with Temporal Prior
Created
2024-02-25
15 commits to main branch, last one 2 days ago