10 results found Sort:

2.4k
22.2k
apache-2.0
158
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Created 2023-04-17
460 commits to main branch, last one 11 months ago
350
3.5k
bsd-3-clause
60
Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model
Created 2023-08-30
249 commits to main branch, last one 5 months ago
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
Created 2023-04-01
626 commits to main branch, last one about a year ago
172
2.8k
apache-2.0
44
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Created 2023-09-26
416 commits to main branch, last one 2 months ago
An open-source implementation for training LLaVA-NeXT.
Created 2024-05-11
36 commits to master branch, last one 5 months ago
8
276
unknown
2
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
Created 2023-11-29
73 commits to main branch, last one 7 months ago
28
257
bsd-3-clause
8
(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
Created 2023-08-02
26 commits to main branch, last one about a year ago
3
88
apache-2.0
1
🧘🏻‍♂️KarmaVLM (相生):A family of high efficiency and powerful visual language model.
Created 2024-01-23
45 commits to main branch, last one 11 months ago
Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖
Created 2024-06-05
44 commits to main branch, last one 10 months ago