9 results found Sort:

2.3k
20.8k
apache-2.0
157
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Created 2023-04-17
460 commits to main branch, last one 7 months ago
243
3.6k
mit
100
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
Created 2023-04-01
626 commits to main branch, last one 9 months ago
341
3.4k
bsd-3-clause
60
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
Created 2023-08-30
249 commits to main branch, last one about a month ago
159
2.7k
apache-2.0
43
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Created 2023-09-26
409 commits to main branch, last one 3 days ago
An open-source implementation for training LLaVA-NeXT.
Created 2024-05-11
36 commits to master branch, last one about a month ago
28
271
bsd-3-clause
12
(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
Created 2023-08-02
26 commits to main branch, last one 8 months ago
6
250
unknown
2
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
Created 2023-11-29
73 commits to main branch, last one 3 months ago
3
84
apache-2.0
1
🧘🏻‍♂️KarmaVLM (相生):A family of high efficiency and powerful visual language model.
Created 2024-01-23
45 commits to main branch, last one 7 months ago