Search Results - RepositoryStats

unilm microsoft

2.6k

21.1k

mit

305

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Created 2019-07-23

1,236 commits to master branch, last one about a month ago

MobileAgent X-PLUG

406

4.0k

mit

62

Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

app gui ios mllm agent gpt4v mobile android copilot harmony automation multimodal mobile-agents multimodal-agent multimodal-large-language-models

Created 2024-01-26

197 commits to main branch, last one a day ago

NExT-GPT NExT-GPT

349

3.5k

bsd-3-clause

61

Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model

llm mllm gpt-4 chatgpt multimodal foundation-models instruction-tuning multi-modal-chatgpt large-language-models visual-language-learning

Created 2023-08-30

249 commits to main branch, last one 5 months ago

MagicQuill ant-research

328

3.3k

other

40

[CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System

aigc mllm gradio image-editing

Created 2024-11-12

62 commits to main branch, last one about a month ago

Awesome-LLM-Reasoning atfortes

166

3.0k

mit

47

Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 🍓

cot gpt mllm gpt-4o papers prompt awesome chatgpt deepseek openai-o1 reasoning multimodal strawberry deepseek-r1 language-models chain-of-thought prompt-engineering in-context-learning

Created 2022-11-05

185 commits to main branch, last one 23 days ago

SpatialLM manycore-research

218

2.9k

other

35

SpatialLM: Large Language Model for Spatial Understanding

mllm point-clouds scene-understanding spatial-intelligence

Created 2025-03-14

8 commits to main branch, last one 14 days ago

InternLM-XComposer InternLM

172

2.8k

apache-2.0

43

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

gpt llm mllm gpt-4 chatgpt foundation multimodal language-model multi-modality instruction-tuning vision-transformer large-language-model supervised-finetuning vision-language-model visual-language-learning large-vision-language-model

Created 2023-09-26

416 commits to main branch, last one 2 months ago

mPLUG-DocOwl X-PLUG

128

2.2k

apache-2.0

33

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

mllm multimodal chart-understanding table-understanding document-understanding multimodal-large-language-models

Created 2023-07-04

135 commits to main branch, last one 3 months ago

Agent-S simular-ai

236

2.1k

apache-2.0

32

Agent S: an open agentic framework that uses computers like a human

mllm memory planning ai-agents grounding gui-agents computer-use computer-automation agent-computer-interface retrieval-augmented-generation in-context-reinforcement-learning

Created 2024-10-09

213 commits to main branch, last one 2 days ago

cambrian cambrian-mllm

129

1.9k

apache-2.0

23

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

clip dino llms mllm chatbot computer-vision instruction-tuning large-language-models representation-learning multimodal-large-language-models

Created 2024-06-17

59 commits to main branch, last one 5 months ago

Skywork-R1V SkyworkAI

176

1.8k

mit

100

Pioneering Multimodal Reasoning with CoT

llm mllm deepseek-r1

Created 2025-03-15

123 commits to main branch, last one 2 days ago

awesome-yolo-object-detection coderonion

202

1.4k

unknown

34

🚀🚀🚀 A collection of some awesome public YOLO object detection series projects and the related object detection datasets.

Created 2022-02-19

408 commits to main branch, last one 2 days ago

Sa2VA magic-research

65

1.0k

apache-2.0

26

🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

mllm computer-vision

Created 2025-01-06

39 commits to main branch, last one 3 days ago

Bunny BAAI-DCAI

75

1.0k

apache-2.0

19

A family of lightweight multimodal models.

vlm mllm gpt-4 chatgpt chinese english multimodal-large-language-models

Created 2024-01-31

114 commits to main branch, last one 4 months ago

Osprey CircleRadon

43

816

apache-2.0

14

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

sam mllm pixel-understanding visual-instruction-tuning

Created 2023-12-17

31 commits to main branch, last one about a month ago

awesome-llm-and-aigc coderonion

59

654

unknown

14

🚀🚀🚀A collection of some wesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applica...

Created 2023-02-15

163 commits to main branch, last one 2 days ago

EAGLE NVlabs

39

654

unknown

28

Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs

llm lmm demo gpt4 lvlm mllm eagle llama llava nvdia llama3 huggingface large-language-models

Created 2024-06-27

106 commits to main branch, last one 5 days ago

Woodpecker VITA-MLLM

31

635

unknown

16

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

llm mllm hallucination multimodality hallucinations large-language-models multimodal-large-language-models

Created 2023-09-26

107 commits to main branch, last one 3 months ago

OpenEMMA taco-group

76

613

apache-2.0

8

OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.

emma mllm autonomy open-emma algorithms large-lang networking perception generative-ai autonomous-car transportation machine-learning autonomous-driving autonomous-vehicles artificial-intelligence

Created 2024-10-30

34 commits to main branch, last one about a month ago

Groma FoundationVision

44

555

apache-2.0

27

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

llm mllm llama llama2 grounding multimodal foundation-models large-language-models vision-language-model

Created 2024-04-21

30 commits to main branch, last one 10 months ago

Vitron SkyworkAI

30

520

unknown

15

NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

mllm segmentation multimodal-large-language-models

Created 2024-03-18

78 commits to main branch, last one 5 months ago

ComfyUI_VLM_nodes gokayfem

48

484

apache-2.0

7

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

llm vlm mllm llava nodes phi15 joytag siglip comfyui img2sfx img2text custom-nodes image-captioning

Created 2024-01-24

274 commits to main branch, last one about a month ago

Awesome-LLMs-meet-Multimodal-Generation YingqingHe

26

452

unknown

17

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

llm aigc lvlm mllm text-to-3d multimodality text-to-audio text-to-image text-to-music text-to-sound text-to-video text-to-speech multimodal-models large-language-models multimodal-generation large-vision-language-models multimodal-large-language-models

Created 2023-11-17

357 commits to main branch, last one 7 days ago

MPP-LLaVA Coobiw

23

437

unknown

6

Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train...

mllm qwen deepspeed fine-tuning pretraining model-parallel pipeline-parallelism video-language-model video-large-language-models multimodal-large-language-models

Created 2023-10-24

135 commits to master branch, last one about a month ago

LLMGA dvlab-research

24

391

apache-2.0

8

This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024 Oral

llm aigc mllm multi-modal image-editing image-generation large-language-model image-design-assistant

Created 2023-11-27

2 commits to master branch, last one 8 months ago

EVE baaivision

8

319

mit

10

EVE Series: Encoder-Free Vision-Language Models from BAAI

llm vlm clip mllm encoder-free-vlm instruction-following large-language-models vision-language-models multimodal-large-language-models

Created 2024-06-14

26 commits to main branch, last one about a month ago

Awesome_Multimodel_LLM Atomic-man007

21

314

unknown

8

Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context l...

gpt llm nlp mllm chatgpt dataset multimodel pretrained-models

Created 2023-06-11

42 commits to main branch, last one 23 days ago

Youku-mPLUG X-PLUG

11

296

apache-2.0

6

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks

mllm video youku chinese dataset benchmark multimodal video-retrieval multimodal-pretraining video-question-answering multimodal-large-language-models

Created 2023-06-06

18 commits to main branch, last one about a year ago

VARGPT VARGPT-family

14

291

apache-2.0

6

VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model

mllm unified-model

Created 2025-01-21

14 commits to main branch, last one 3 days ago

Long-VITA VITA-MLLM

29

268

other

15

✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

mllm long-context vision-language-model

Created 2024-12-14

46 commits to main branch, last one 22 days ago