6 results found Sort:

42
780
apache-2.0
14
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
Created 2023-12-17
29 commits to main branch, last one 4 months ago
24
208
apache-2.0
5
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.
Created 2024-07-20
104 commits to main branch, last one 5 days ago
A collection of visual instruction tuning datasets.
Created 2023-10-07
24 commits to main branch, last one 9 months ago
🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)
Created 2023-07-02
35 commits to main branch, last one about a year ago
[EMNLP 2024] A Video Chat Agent with Temporal Prior
Created 2024-02-25
15 commits to main branch, last one 2 days ago