Search Results - RepositoryStats

2.4k

22.2k

apache-2.0

158

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

gpt-4 llama llava llama2 chatbot chatgpt llama-2 multimodal multi-modality foundation-models instruction-tuning vision-language-model visual-language-learning

Created 2023-04-17

460 commits to main branch, last one 11 months ago

NExT-GPT NExT-GPT

350

3.5k

bsd-3-clause

60

Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model

llm mllm gpt-4 chatgpt multimodal foundation-models instruction-tuning multi-modal-chatgpt large-language-models visual-language-learning

Created 2023-08-30

249 commits to main branch, last one 5 months ago

Otter EvolvingLMMs-Lab

213

3.2k

mit

80

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

gpt-4 chatgpt embodied-ai deep-learning multi-modality machine-learning foundation-models instruction-tuning large-scale-models artificial-inteligence visual-language-learning

Created 2023-04-01

626 commits to main branch, last one about a year ago

InternLM-XComposer InternLM

172

2.8k

apache-2.0

44

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

gpt llm mllm gpt-4 chatgpt foundation multimodal language-model multi-modality instruction-tuning vision-transformer large-language-model supervised-finetuning vision-language-model visual-language-learning large-vision-language-model

Created 2023-09-26

416 commits to main branch, last one 2 months ago

Open-LLaVA-NeXT xiaoachen98

19

392

unknown

10

An open-source implementation for training LLaVA-NeXT.

gpt-4 gpt4o llama llava llama3 chatbot chatgpt llava-next multimodal multi-modality vision-language-model large-multimodal-models visual-language-learning

Created 2024-05-11

36 commits to master branch, last one 5 months ago

RLHF-V RLHF-V

8

276

unknown

2

[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

gpt-4 llama rlhf-v chatbot multimodal multi-modality visual-language-learning

Created 2023-11-29

73 commits to main branch, last one 7 months ago

BLIVA mlpc-ucsd

28

257

bsd-3-clause

8

(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions

llm lora blip2 bliva llama chatbot multimodal instruction-tuning visual-language-learning

Created 2023-08-02

26 commits to main branch, last one about a year ago

KarmaVLM thomas-yanxin

3

88

apache-2.0

1

🧘🏻‍♂️KarmaVLM (相生)：A family of high efficiency and powerful visual language model.

vlm llava qwen2 llama2 multimodel vision-language-model visual-language-learning

Created 2024-01-23

45 commits to main branch, last one 11 months ago

llama-multimodal-vqa AdrianBZG

10

48

mit

1

Multimodal Instruction Tuning for Llama 3

vqa gpt-4 llama llama2 llama3 chatbot chatgpt multimodal huggingface language-models instruction-tuning visual-language-learning visual-question-answering multimodal-instruction-tuning

Created 2024-04-22

4 commits to main branch, last one 11 months ago

Basic-Visual-Language-Model xinyanghuang7

6

34

unknown

3

Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖

large-language-models visual-language-models visual-language-learning multimodel-large-language-model

Created 2024-06-05

44 commits to main branch, last one 10 months ago