Search Results - RepositoryStats

184

2.0k

mit

27

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, ...

ai gcc llm lmm vlm cradle ai-agent grounding personoid generative-ai multimodality computer-control foundation-agent ai-agents-framework large-language-models vision-language-model general-computer-control

Created 2024-03-03

35 commits to main branch, last one 4 months ago

groundingLMM mbzuai-oryx

45

854

unknown

32

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

lmm llm-agent foundation-models vision-and-language vision-language-model

Created 2023-11-02

43 commits to main branch, last one 3 months ago

EAGLE NVlabs

39

628

unknown

27

Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs

llm lmm demo gpt4 lvlm mllm eagle llama llava nvdia llama3 huggingface large-language-models

Created 2024-06-27

102 commits to main branch, last one about a month ago

LLaVA-Interactive-Demo LLaVA-VL

29

366

apache-2.0

16

LLaVA-Interactive-Demo

lmm multimodal

Created 2023-10-12

37 commits to main branch, last one 7 months ago

HallusionBench tianyi-lab

8

273

bsd-3-clause

5

[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

llm lmm vlms gpt-4 llava gpt-4v benchmark benchmarks hallucination large-language-models large-vision-language-models

Created 2023-10-22

136 commits to main branch, last one 4 months ago

Video-LLaVA mbzuai-oryx

12

256

unknown

14

PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

llm lmm video grounding transcription video-grounding video-conversation

Created 2023-11-20

8 commits to main branch, last one about a year ago

TokenPacker CircleRadon

9

239

unknown

8

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".

lmm mllm connector tokenpacker token-reduction visual-projector

Created 2024-07-03

21 commits to main branch, last one 2 months ago

Mantis TIGER-AI-Lab

20

208

apache-2.0

9

Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR2024]

lmm vlm fuyu mllm video mantis vision language multimodal llava-llama3 multi-image-understanding

Created 2024-04-12

290 commits to main branch, last one about a month ago

VL-RLHF TideDra

7

167

apache-2.0

4

A RLHF Infrastructure for Vision-Language Models

dpo llm lmm vlm mllm rlhf

Created 2023-12-27

7 commits to main branch, last one 4 months ago

awesome-Large-MultiModal-Hallucination xieyuquanxx

14

149

unknown

5

😎 curated list of awesome LMM hallucinations papers, methods & resources.

lmm multimodal multi-modal hallucination

This repository has been archived (exclude archived)

Created 2023-10-11

57 commits to main branch, last one 12 months ago

A-Bench Q-Future

3

142

unknown

6

[ICLR 2025] What do we expect from LMMs as AIGI evaluators and how do they perform?

lmm evaluation ai-generated-images

Created 2024-05-29

49 commits to main branch, last one about a month ago

graphist graphic-design-ai

3

112

unknown

13

Official Repo of Graphist

hlg llm lmm mllm graphic-design layout-generation

Created 2024-03-24

3 commits to main branch, last one 11 months ago

MLLM-Tool Chenyu-Wang567

4

109

mit

2

MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning

llm lmm gpt4 tool-agent

Created 2024-01-08

59 commits to main branch, last one 10 months ago

Discord-AIBot Javis603

0

86

mit

1

🤖 Discord AI assistant with OpenAI, Gemini, Claude & DeepSeek integration, multilingual support, multimodal chat, image generation, web search, and deep thinking | 一个强大的 Discord AI 助手，整合多种顶级 AI 模型，支持...

ai llm lmm xai claude gemini nodejs openai chatbot chatgpt discord deepseek discord-js discord-bot

Created 2025-03-02

15 commits to main branch, last one 17 days ago

YoLLaVA WisconsinAIVision

6

83

unknown

2

🌋👵🏻 Yo'LLaVA: Your Personalized Language and Vision Assistant

llm lmm llms lmms llava neurips neurips2024 personalized personalization multi-modal-models

Created 2024-06-13

10 commits to main branch, last one 4 months ago

VideoGLaMM mbzuai-oryx

1

49

unknown

6

[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

lmm cvpr2025 llm-agent foundation-models vision-and-language vision-language-model

Created 2024-10-31

5 commits to main branch, last one 17 days ago

LLaVA-CLI-with-multiple-images mapluisch

4

48

mit

2

LLaVA inference with multiple images at once for cross-image analysis.

lmm vqa lmms llava llama2 pillow python python3 pytorch inference llama2-13b image-processing image-concatenation visual-question-answering

Created 2023-11-30

18 commits to main branch, last one 12 months ago

GMAI-MMBench uni-medical

2

46

apache-2.0

2

GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI.

llm lmm vlm gmai medagi medical benchmark

Created 2024-08-08

46 commits to main branch, last one 3 months ago

Idea23D yisuanwang

1

46

unknown

2

[COLING 2025] Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs

3d lmm aigc agent

Created 2023-12-05

39 commits to master branch, last one about a month ago

Inner-Adaptor-Architecture 360CVGroup

4

39

apache-2.0

0

LMM solved catastrophic forgetting, AAAI2025

lmm large-multimodal-models

Created 2024-08-23

5 commits to main branch, last one 4 months ago

AIN mbzuai-oryx

0

34

mit

2

AIN - The First Arabic Inclusive Large Multimodal Model. It is a versatile bilingual LMM excelling in visual and contextual understanding across diverse domains.

lmm ocr vlm vqa culture multi-images remote-sensing vision-and-language

Created 2025-01-27

171 commits to main branch, last one 7 days ago