21 results found Sort:

184
2.0k
mit
27
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, ...
Created 2024-03-03
35 commits to main branch, last one 4 months ago
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
Created 2023-11-02
43 commits to main branch, last one 3 months ago
39
628
unknown
27
Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs
Created 2024-06-27
102 commits to main branch, last one about a month ago
LLaVA-Interactive-Demo
Created 2023-10-12
37 commits to main branch, last one 7 months ago
8
273
bsd-3-clause
5
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Created 2023-10-22
136 commits to main branch, last one 4 months ago
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
Created 2023-11-20
8 commits to main branch, last one about a year ago
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".
Created 2024-07-03
21 commits to main branch, last one 2 months ago
20
208
apache-2.0
9
Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR2024]
Created 2024-04-12
290 commits to main branch, last one about a month ago
7
167
apache-2.0
4
A RLHF Infrastructure for Vision-Language Models
Created 2023-12-27
7 commits to main branch, last one 4 months ago
😎 curated list of awesome LMM hallucinations papers, methods & resources.
This repository has been archived (exclude archived)
Created 2023-10-11
57 commits to main branch, last one 12 months ago
3
142
unknown
6
[ICLR 2025] What do we expect from LMMs as AIGI evaluators and how do they perform?
Created 2024-05-29
49 commits to main branch, last one about a month ago
Official Repo of Graphist
Created 2024-03-24
3 commits to main branch, last one 11 months ago
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
Created 2024-01-08
59 commits to main branch, last one 10 months ago
🤖 Discord AI assistant with OpenAI, Gemini, Claude & DeepSeek integration, multilingual support, multimodal chat, image generation, web search, and deep thinking | 一个强大的 Discord AI 助手,整合多种顶级 AI 模型,支持...
Created 2025-03-02
15 commits to main branch, last one 17 days ago
🌋👵🏻 Yo'LLaVA: Your Personalized Language and Vision Assistant
Created 2024-06-13
10 commits to main branch, last one 4 months ago
[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Created 2024-10-31
5 commits to main branch, last one 17 days ago
LLaVA inference with multiple images at once for cross-image analysis.
Created 2023-11-30
18 commits to main branch, last one 12 months ago
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI.
Created 2024-08-08
46 commits to main branch, last one 3 months ago
1
46
unknown
2
[COLING 2025] Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs
Created 2023-12-05
39 commits to master branch, last one about a month ago
LMM solved catastrophic forgetting, AAAI2025
Created 2024-08-23
5 commits to main branch, last one 4 months ago
AIN - The First Arabic Inclusive Large Multimodal Model. It is a versatile bilingual LMM excelling in visual and contextual understanding across diverse domains.
Created 2025-01-27
171 commits to main branch, last one 7 days ago