120 results found Sort:

2.3k
21.2k
apache-2.0
159
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Created 2023-04-17
460 commits to main branch, last one 8 months ago
528
6.9k
mit
58
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Created 2023-11-22
234 commits to main branch, last one about a month ago
412
5.4k
other
50
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Created 2023-08-21
136 commits to master branch, last one 9 months ago
282
3.2k
apache-2.0
29
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Created 2024-03-26
35 commits to main branch, last one 9 months ago
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Created 2024-03-07
11 commits to main branch, last one 9 months ago
165
2.7k
apache-2.0
44
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Created 2023-09-26
416 commits to main branch, last one 10 days ago
195
2.5k
unknown
97
Collection of AWESOME vision-language models for vision tasks
Created 2023-03-30
89 commits to main branch, last one about a month ago
173
2.0k
mit
27
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, ...
Created 2024-03-03
35 commits to main branch, last one 2 months ago
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Created 2022-09-28
69 commits to main branch, last one about a month ago
124
1.4k
mit
15
The code used to train and run inference with the ColPali architecture.
Created 2024-06-20
134 commits to main branch, last one 2 days ago
73
1.3k
other
16
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Created 2023-03-02
36 commits to main branch, last one about a year ago
168
1.0k
apache-2.0
92
Align Anything: Training All-modality Model with Feedback
Created 2024-07-14
89 commits to main branch, last one 7 days ago
87
958
apache-2.0
37
This series will take you on a journey from the fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models.
Created 2024-12-20
6 commits to master branch, last one 8 days ago
44
903
apache-2.0
9
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Created 2023-11-13
83 commits to main branch, last one 3 months ago
51
889
apache-2.0
14
Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
Created 2024-10-31
218 commits to main branch, last one 2 days ago
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
Created 2023-11-02
43 commits to main branch, last one 2 months ago
83
815
apache-2.0
17
🚀 「大模型」3小时从0训练27M参数的视觉多模态VLM!🌏 Train a 27M-parameter VLM from scratch in just 3 hours!
Created 2024-09-11
96 commits to master branch, last one about a month ago
50
768
apache-2.0
13
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Created 2023-11-27
97 commits to main branch, last one 6 months ago
62
757
mit
9
MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
Created 2024-04-16
197 commits to main branch, last one 2 days ago
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
Created 2023-11-02
3 commits to main branch, last one about a year ago
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
Created 2024-08-12
41 commits to main branch, last one 2 months ago
33
599
apache-2.0
7
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
Created 2024-06-13
32 commits to main branch, last one 2 months ago
Famous Vision Language Models and Their Architectures
Created 2024-02-15
231 commits to main branch, last one 4 months ago
45
570
apache-2.0
11
Parsing-free RAG supported by VLMs
Created 2024-10-14
117 commits to master branch, last one 12 days ago
61
533
apache-2.0
27
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
Created 2024-04-21
30 commits to main branch, last one 7 months ago
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP...
Created 2023-05-10
86 commits to main branch, last one 9 months ago
50
460
other
20
[ ICLR 2024 ] Official Codebase for "InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists"
Created 2023-02-04
128 commits to main branch, last one 9 months ago
18
430
apache-2.0
4
The open source Meme Search Engine and Finder. Free and built to self-host locally with Python, Ruby, and Docker.
Created 2024-06-08
315 commits to main branch, last one 8 days ago
A curated list of awesome knowledge-driven autonomous driving (continually updated)
Created 2023-10-24
51 commits to main branch, last one 7 months ago