Search Results - RepositoryStats

Awesome-Multimodal-Large-Language-Models BradyFU

941

14.6k

unknown

276

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

multi-modality chain-of-thought instruction-tuning in-context-learning instruction-following large-language-models visual-instruction-tuning large-vision-language-model multimodal-chain-of-thought large-vision-language-models multimodal-instruction-tuning multimodal-in-context-learning multimodal-large-language-models

Created 2023-05-19

829 commits to main branch, last one 2 days ago

MobileAgent X-PLUG

399

4.0k

mit

62

Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

app gui ios mllm agent gpt4v mobile android copilot harmony automation multimodal mobile-agents multimodal-agent multimodal-large-language-models

Created 2024-01-26

192 commits to main branch, last one 5 days ago

star-vector joanrod

171

3.3k

apache-2.0

45

StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and textu...

llm svg vlm multimodal-large-language-models

Created 2023-12-11

10 commits to main branch, last one 11 days ago

modelscope-agent modelscope

345

3.1k

apache-2.0

40

ModelScope-Agent: An agent framework connecting models in ModelScope with the world

llm rag code gpts qwen agent chatbot chatglm-4 open-gpts codexgraph assistantapi data-science mobile-agent multi-agents mobile-agents android-application data-science-assistant multimodal-large-language-models

Created 2023-08-03

475 commits to master branch, last one about a month ago

LLaMA-Omni ictnlp

194

2.9k

apache-2.0

31

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

speech-to-text speech-to-speech speech-interaction large-language-models speech-language-model multimodal-large-language-models

Created 2024-09-10

13 commits to main branch, last one 4 months ago

VITA VITA-MLLM

165

2.2k

other

49

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

large-multimodal-models multimodal-large-language-models

Created 2024-08-10

128 commits to main branch, last one 10 days ago

mPLUG-DocOwl X-PLUG

127

2.1k

apache-2.0

33

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

mllm multimodal chart-understanding table-understanding document-understanding multimodal-large-language-models

Created 2023-07-04

135 commits to main branch, last one 3 months ago

cambrian cambrian-mllm

129

1.9k

apache-2.0

23

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

clip dino llms mllm chatbot computer-vision instruction-tuning large-language-models representation-learning multimodal-large-language-models

Created 2024-06-17

59 commits to main branch, last one 5 months ago

RPG-DiffusionMaster YangLing0818

102

1.8k

mit

24

[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)

text-to-image image-editting large-language-models multimodal-large-language-models

Created 2024-01-22

60 commits to main branch, last one 2 months ago

Bunny BAAI-DCAI

76

1.0k

apache-2.0

19

A family of lightweight multimodal models.

vlm mllm gpt-4 chatgpt chinese english multimodal-large-language-models

Created 2024-01-31

114 commits to main branch, last one 4 months ago

Ovis AIDC-AI

57

868

apache-2.0

13

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

qwen llama3 chatbot multimodal multimodality vision-language-model vision-language-learning multimodal-large-language-models

Created 2024-06-13

40 commits to main branch, last one 12 days ago

VideoChat Henry-23

111

842

mit

12

实时语音交互数字人，支持端到端语音方案（GLM-4-Voice - THG）和级联方案（ASR-LLM-TTS-THG）。可自定义形象与音色，无须训练，支持音色克隆，首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and cas...

asr tts lip-sync musetalk real-time streaming end-to-end talking-head digital-human dialogue-systems gradio-python-app multimodal-large-language-models

Created 2024-10-18

41 commits to master branch, last one 16 days ago

SLAM-LLM X-LANCE

77

768

mit

23

Speech, Language, Audio, Music Processing with Large Language Model

peft audio-processing music-processing speech-processing large-language-model multimodal-large-language-models

Created 2023-10-23

886 commits to main branch, last one about a month ago

LLaVA-Plus-Codebase LLaVA-VL

59

734

apache-2.0

12

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills

agent tool-use large-language-models large-multimodal-models multimodal-large-language-models

Created 2023-11-07

404 commits to main branch, last one about a year ago

awesome-multimodal-in-medical-imaging richard-peng-xia

67

713

mit

17

A collection of resources on applications of multi-modal learning in medical imaging.

medical-imaging multimodal-learning large-language-models large-multimodal-models multimodal-deep-learning medical-report-generation visual-question-answering multimodal-large-language-models

Created 2022-07-13

161 commits to main branch, last one 5 days ago

unicom deepglint

27

656

mit

10

Large-Scale Visual Representation Model

laion400m vision-transformer large-language-models large-sacle-pretrained-model embodied-artificial-intelligence multimodal-large-language-models

Created 2023-02-15

174 commits to main branch, last one 18 hours ago

Woodpecker BradyFU

31

634

unknown

16

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

llm mllm hallucination multimodality hallucinations large-language-models multimodal-large-language-models

Created 2023-09-26

107 commits to main branch, last one 3 months ago

MovieChat rese1f

42

606

bsd-3-clause

12

[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

llama dataset computer-vision large-language-models long-video-understanding multimodal-large-language-models

Created 2023-06-26

122 commits to main branch, last one 2 months ago

Vitron SkyworkAI

30

519

unknown

15

NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

mllm segmentation multimodal-large-language-models

Created 2024-03-18

78 commits to main branch, last one 5 months ago

Video-MME BradyFU

20

500

unknown

5

✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

mme video video-mme large-language-models large-vision-language-models multimodal-large-language-models

Created 2024-06-02

55 commits to main branch, last one 10 days ago

Awesome-LLMs-meet-Multimodal-Generation YingqingHe

26

450

unknown

17

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

llm aigc lvlm mllm text-to-3d multimodality text-to-audio text-to-image text-to-music text-to-sound text-to-video text-to-speech multimodal-models large-language-models multimodal-generation large-vision-language-models multimodal-large-language-models

Created 2023-11-17

357 commits to main branch, last one 2 days ago

audio-flamingo NVIDIA

25

443

unknown

10

PyTorch implementation of Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities.

audio-reasoning audio-captioning audio-language-models audio-question-answering multimodal-large-language-models

Created 2024-05-20

19 commits to main branch, last one 26 days ago

LLaVA-Mini ictnlp

19

436

apache-2.0

9

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

gpt4o gpt4v llama llava video vision efficient multimodal large-language-models vision-language-model large-multimodal-models visual-instruction-tuning multimodal-large-language-models

Created 2025-01-07

8 commits to main branch, last one 2 months ago

MPP-LLaVA Coobiw

23

432

unknown

6

Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train...

mllm qwen deepspeed fine-tuning pretraining model-parallel pipeline-parallelism video-language-model video-large-language-models multimodal-large-language-models

Created 2023-10-24

135 commits to master branch, last one 27 days ago

Awesome_Matching_Pretraining_Transfering Paranioar

48

424

mit

13

The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insigh...

tutorial awesome-list image-text-matching large-vision-models vision-and-language image-text-retrieval large-language-model video-text-retrieval cross-modal-retrieval large-language-models multimodal-pretraining video-text-recognition memory-efficient-tuning text-to-image-synthesis text-to-image-generation text-to-video-generation visual-semantic-embedding large-vision-language-models parameter-efficient-fine-tuning multimodal-large-language-models

Created 2020-12-22

130 commits to main branch, last one 3 months ago

Awesome-MCoT yaotingwangofficial

7

392

unknown

8

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

cot mcts survey system-2 openai-o1 reasoning multimodal deepseek-r1 slow-thinking mllm-reasoning chain-of-thought instruction-tuning large-vision-language-model multimodal-chain-of-thought multimodal-large-language-models

Created 2025-02-15

57 commits to main branch, last one 18 hours ago