7 results found Sort:

42
784
apache-2.0
14
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
Created 2023-12-17
29 commits to main branch, last one 6 months ago
13
303
apache-2.0
8
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
Created 2025-01-07
8 commits to main branch, last one 21 days ago
27
231
apache-2.0
7
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.
Created 2024-07-20
109 commits to main branch, last one a day ago
A collection of visual instruction tuning datasets.
Created 2023-10-07
24 commits to main branch, last one 10 months ago
🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)
Created 2023-07-02
35 commits to main branch, last one about a year ago
[EMNLP 2024] A Video Chat Agent with Temporal Prior
Created 2024-02-25
17 commits to main branch, last one about a month ago