Search Results - RepositoryStats

VITA VITA-MLLM

168

2.2k

other

49

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

large-multimodal-models multimodal-large-language-models

Created 2024-08-10

128 commits to main branch, last one 29 days ago

OpenAdapt OpenAdaptAI

175

1.2k

mit

13

Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models

Created 2023-04-12

939 commits to main branch, last one 2 months ago

ShareGPT4Video ShareGPT4Omni

41

1.1k

unknown

24

[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

gpt sora gpt-4v chatgpt text-to-video large-language-models large-multimodal-models large-video-language-models large-vision-language-models

Created 2024-06-06

44 commits to master branch, last one 6 months ago

TinyLLaVA_Factory TinyLLaVA

83

800

apache-2.0

11

A Framework of Small-scale Large Multimodal Models

nlp llama llava tinyllama transformers vision-language large-multimodal-models

Created 2024-02-21

225 commits to main branch, last one 20 hours ago

LLaVA-Plus-Codebase LLaVA-VL

59

739

apache-2.0

11

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills

agent tool-use large-language-models large-multimodal-models multimodal-large-language-models

Created 2023-11-07

404 commits to main branch, last one about a year ago

awesome-multimodal-in-medical-imaging richard-peng-xia

65

723

mit

17

A collection of resources on applications of multi-modal learning in medical imaging.

medical-imaging multimodal-learning large-language-models large-multimodal-models multimodal-deep-learning medical-report-generation visual-question-answering multimodal-large-language-models

Created 2022-07-13

161 commits to main branch, last one 23 days ago

describe-anything NVlabs

15

491

apache-2.0

10

Implementation for Describe Anything: Detailed Localized Image and Video Captioning

describe-anything vision-language-model large-multimodal-models detailed-localized-captioning

Created 2025-04-04

11 commits to main branch, last one a day ago

LLaVA-Mini ictnlp

20

456

apache-2.0

9

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

gpt4o gpt4v llama llava video vision efficient multimodal large-language-models vision-language-model large-multimodal-models visual-instruction-tuning multimodal-large-language-models

Created 2025-01-07

8 commits to main branch, last one 3 months ago

MMMU MMMU-Benchmark

34

420

apache-2.0

3

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

llm llms stem evaluation multimodal deep-learning multimodality computer-vision machine-learning foundation-models question-answering multimodal-learning deep-neural-networks large-language-models large-multimodal-models multimodal-deep-learning visual-question-answering natural-language-processing

Created 2023-11-23

153 commits to main branch, last one 10 days ago

Open-LLaVA-NeXT xiaoachen98

19

392

unknown

10

An open-source implementation for training LLaVA-NeXT.

gpt-4 gpt4o llama llava llama3 chatbot chatgpt llava-next multimodal multi-modality vision-language-model large-multimodal-models visual-language-learning

Created 2024-05-11

36 commits to master branch, last one 6 months ago

OPERA shikiw

28

332

mit

3

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

gpt-4 llama chatbot chatgpt multimodal vision-language-model large-multimodal-models vision-language-learning

Created 2023-11-30

50 commits to main branch, last one 8 months ago

LEGENT thunlp

18

308

apache-2.0

14

Open Platform for Embodied Agents

embodied-ai physics-engine robot-simulator language-grounding large-multimodal-models

Created 2024-03-13

129 commits to main branch, last one 3 months ago

lmms-finetune zjysteven

31

293

apache-2.0

8

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

llava qwen-vl finetuning llava-next multimodal vision-language foundation-models instruction-tuning large-language-model large-multimodal-models visual-instruction-tuning multimodal-large-language-models

Created 2024-07-20

109 commits to main branch, last one 2 months ago

MixEval JinjieNi

41

237

unknown

1

The official evaluation suite and dynamic data release for MixEval.

mixeval benchmark evaluation llm-inference llm-evaluation benchmark-mixture foundation-models benchmarking-suite evaluation-framework large-language-model large-language-models benchmarking-framework large-multimodal-models llm-evaluation-framework

Created 2024-06-01

120 commits to main branch, last one 5 months ago

ShareGPT4V ShareGPT4Omni

5

214

unknown

3

[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions

gpt gpt4v gpt-4v chatgpt eccv2024 language-model instruction-tuning large-language-models vision-language-model large-multimodal-models large-vision-language-models

Created 2024-06-06

3 commits to master branch, last one 9 months ago

Awesome-Multimodal-Papers friedrichor

18

185

mit

4

A curated list of awesome Multimodal studies.

multimodal deep-learning multimodal-data multimodal-dialogue multimodal-learning large-multimodal-models multimodal-deep-learning multimodal-large-language-models

Created 2024-04-05

75 commits to main branch, last one 20 days ago

multi_token sshh12

15

184

apache-2.0

3

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

llm llava multimodal large-context multi-modality large-language-models vision-language-model large-multimodal-models

Created 2023-10-11

84 commits to main branch, last one about a year ago

MMStar MMStar-Benchmark

5

175

unknown

1

[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

llm llms lvlm lvlms evaluation multimodal multimodality multimodal-learning large-language-models large-multimodal-models visual-question-answering large-vision-language-model large-vision-language-models

Created 2024-03-29

19 commits to main branch, last one 7 months ago

Modality-Integration-Rate shikiw

3

98

mit

2

The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".

llama llava gpt-4o chatbot multimodal vision-language-model large-multimodal-models vision-language-learning

Created 2024-10-09

17 commits to main branch, last one 4 months ago

BenchLMM AIFEG

7

85

apache-2.0

0

[ECCV 2024] BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models

cv dataset benchmark large-language-models large-multimodal-models

Created 2023-11-20

93 commits to main branch, last one 8 months ago

apiprompting yu-rp

6

84

mit

1

[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models

prompting visual-prompting vision-language-model vision-language-models large-multimodal-models large-vision-language-model large-vision-language-models

Created 2024-09-04

14 commits to master branch, last one 6 months ago

LOVA3 showlab

2

82

unknown

5

(NeurIPS 2024) Official PyTorch implementation of LOVA3

benchmark large-multimodal-models visual-question-answering visual-question-generation multimodal-large-language-models

Created 2024-05-19

49 commits to main branch, last one about a month ago

ross Haochen-Wang409

3

80

apache-2.0

1

[ICLR'25] Reconstructive Visual Instruction Tuning

iclr diffusion large-multimodal-models multimodal-large-language-models

Created 2024-10-11

10 commits to master branch, last one 16 days ago

GeoPixel mbzuai-oryx

5

75

apache-2.0

9

GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding capabi...

grounding-llms remote-sensing foundation-models segmentation-models vision-language-models large-multimodal-models large-vision-language-models

Created 2025-01-23

82 commits to main branch, last one 26 days ago

MMRole YanqiDai

2

68

mit

2

(ICLR'25) A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents

role-playing large-multimodal-models

Created 2024-07-26

16 commits to main branch, last one 2 months ago

GUI-Thinker showlab

5

62

unknown

1

Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.

agents gui-agent gui-application large-multimodal-models

Created 2025-02-12

140 commits to main branch, last one 14 days ago

VisualWebBench VisualWebBench

1

55

unknown

2

Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"

llm llms mllm evaluation multimodal deep-learning computer-vision machine-learning foundation-models question-answering large-language-models large-multimodal-models multimodal-deep-learning visual-question-answering natural-language-processing multimodal-large-language-models

Created 2024-04-02

26 commits to main branch, last one 6 months ago

FLAME xyz9911

3

48

apache-2.0

1

[AAAI-25 Oral] Official Implementation of "FLAME: Learning to Navigate with Multimodal LLM in Urban Environments"

streetview embodied-agent vision-language-model large-multimodal-models vision-and-language-navigation multimodal-large-language-models

Created 2024-08-20

9 commits to main branch, last one 2 months ago

GUI-R1 ritzz-ai

0

46

apache-2.0

0

Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents

o1 r1 grpo gui-agent multimodal mllm-reasoning large-multimodal-models deep-reinforcement-learning multimodal-large-language-models

Created 2025-04-18

4 commits to main branch, last one 5 days ago

OpenOmni RainBowLuoCS

2

41

unknown

2

OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-Time Self-Aware Emotional Speech Synthesis

omni image speech multimodal large-language-model large-multimodal-models multimodal-large-language-models

Created 2025-01-11

52 commits to main branch, last one about a month ago