142 results found Sort:

2.4k
22.1k
apache-2.0
158
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Created 2023-04-17
460 commits to main branch, last one 11 months ago
571
7.4k
mit
57
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Created 2023-11-22
236 commits to main branch, last one 18 days ago
436
5.7k
other
49
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Created 2023-08-21
136 commits to master branch, last one 12 months ago
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Created 2024-03-07
11 commits to main branch, last one 11 months ago
282
3.3k
apache-2.0
28
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Created 2024-03-26
35 commits to main branch, last one 11 months ago
398
3.3k
apache-2.0
261
Align Anything: Training All-modality Model with Feedback
Created 2024-07-14
117 commits to main branch, last one 7 days ago
171
2.8k
apache-2.0
43
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Created 2023-09-26
416 commits to main branch, last one 2 months ago
203
2.6k
unknown
99
Collection of AWESOME vision-language models for vision tasks
Created 2023-03-30
91 commits to main branch, last one 14 days ago
The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention
Created 2025-01-14
29 commits to main branch, last one 5 days ago
240
2.3k
apache-2.0
29
🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!
Created 2024-09-11
105 commits to master branch, last one 4 days ago
184
2.1k
mit
27
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, ...
Created 2024-03-03
35 commits to main branch, last one 5 months ago
144
1.7k
mit
18
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
Created 2024-06-20
182 commits to main branch, last one a day ago
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Created 2022-09-28
69 commits to main branch, last one 3 months ago
73
1.3k
other
16
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Created 2023-03-02
36 commits to main branch, last one about a year ago
74
1.2k
apache-2.0
15
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
Created 2024-10-31
267 commits to main branch, last one 24 days ago
107
1.1k
mit
13
MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
Created 2024-04-16
224 commits to main branch, last one 10 days ago
97
1.1k
apache-2.0
44
This series will take you on a journey from the fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models.
Created 2024-12-20
6 commits to master branch, last one 2 months ago
45
930
apache-2.0
9
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Created 2023-11-13
83 commits to main branch, last one 5 months ago
57
873
apache-2.0
13
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
Created 2024-06-13
40 commits to main branch, last one 13 days ago
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
Created 2023-11-02
43 commits to main branch, last one 4 months ago
55
798
apache-2.0
12
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Created 2023-11-27
97 commits to main branch, last one 8 months ago
Famous Vision Language Models and Their Architectures
Created 2024-02-15
240 commits to main branch, last one about a month ago
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
Created 2024-08-12
41 commits to main branch, last one 5 months ago
52
656
apache-2.0
12
Parsing-free RAG supported by VLMs
Created 2024-10-14
119 commits to master branch, last one about a month ago
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
Created 2023-11-02
5 commits to main branch, last one about a month ago
66
590
apache-2.0
7
An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.
Created 2024-09-10
98 commits to master branch, last one 4 days ago
44
553
apache-2.0
27
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
Created 2024-04-21
30 commits to main branch, last one 10 months ago
23
531
apache-2.0
4
The open source Meme Search Engine and Finder. Free and built to self-host locally with Python, Ruby, and Docker.
Created 2024-06-08
453 commits to main branch, last one 7 days ago
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP...
Created 2023-05-10
86 commits to main branch, last one 11 months ago