9 results found Sort:
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Created
2023-04-17
460 commits to main branch, last one 7 months ago
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
Created
2023-04-01
626 commits to main branch, last one 9 months ago
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
Created
2023-08-30
249 commits to main branch, last one about a month ago
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Created
2023-09-26
409 commits to main branch, last one 3 days ago
An open-source implementation for training LLaVA-NeXT.
Created
2024-05-11
36 commits to master branch, last one about a month ago
(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
Created
2023-08-02
26 commits to main branch, last one 8 months ago
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
Created
2023-11-29
73 commits to main branch, last one 3 months ago
🧘🏻♂️KarmaVLM (相生):A family of high efficiency and powerful visual language model.
Created
2024-01-23
45 commits to main branch, last one 7 months ago
Multimodal Instruction Tuning for Llama 3
Created
2024-04-22
4 commits to main branch, last one 7 months ago