97 results found Sort:

2.2k
20.3k
apache-2.0
158
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Created 2023-04-17
460 commits to main branch, last one 6 months ago
471
6.1k
mit
53
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Created 2023-11-22
216 commits to main branch, last one a day ago
385
5.1k
other
49
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Created 2023-08-21
136 commits to master branch, last one 7 months ago
279
3.2k
apache-2.0
28
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Created 2024-03-26
35 commits to main branch, last one 6 months ago
154
2.5k
apache-2.0
43
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Created 2023-09-26
395 commits to main branch, last one about a month ago
220
2.5k
unknown
122
Collection of AWESOME vision-language models for vision tasks
Created 2023-03-30
86 commits to main branch, last one 18 days ago
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Created 2024-03-07
11 commits to main branch, last one 7 months ago
163
1.9k
mit
26
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, ...
Created 2024-03-03
35 commits to main branch, last one 14 days ago
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Created 2022-09-28
62 commits to main branch, last one about a month ago
75
1.3k
other
16
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Created 2023-03-02
36 commits to main branch, last one 10 months ago
102
1.1k
mit
14
The code used to train and run inference with the ColPali architecture.
Created 2024-06-20
119 commits to main branch, last one 10 days ago
43
857
apache-2.0
7
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Created 2023-11-13
83 commits to main branch, last one about a month ago
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
Created 2023-11-02
42 commits to main branch, last one 5 months ago
43
705
apache-2.0
13
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Created 2023-11-27
97 commits to main branch, last one 3 months ago
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
Created 2023-11-02
3 commits to main branch, last one about a year ago
61
563
apache-2.0
35
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
Created 2024-04-21
30 commits to main branch, last one 5 months ago
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
Created 2024-08-12
41 commits to main branch, last one 17 days ago
31
527
apache-2.0
7
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
Created 2024-06-13
31 commits to main branch, last one 16 days ago
47
520
other
32
[ ICLR 2024 ] Official Codebase for "InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists"
Created 2023-02-04
128 commits to main branch, last one 6 months ago
45
498
mit
6
MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.
Created 2024-04-16
165 commits to main branch, last one 5 days ago
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP...
Created 2023-05-10
86 commits to main branch, last one 7 months ago
Famous Vision Language Models and Their Architectures
Created 2024-02-15
231 commits to main branch, last one 2 months ago
16
408
apache-2.0
3
Index your memes by their content and text, making them easily retrievable for your meme warfare pleasures. Find funny fast.
Created 2024-06-08
310 commits to main branch, last one 3 days ago
30
403
apache-2.0
9
Parsing-free RAG supported by VLMs
Created 2024-10-14
66 commits to master branch, last one a day ago
A curated list of awesome knowledge-driven autonomous driving (continually updated)
Created 2023-10-24
51 commits to main branch, last one 5 months ago
An open-source implementation for training LLaVA-NeXT.
Created 2024-05-11
36 commits to master branch, last one 29 days ago
40
368
apache-2.0
13
「大模型」3小时从0训练27M参数的视觉多模态VLM,个人显卡即可推理训练!
Created 2024-09-11
95 commits to master branch, last one 2 days ago
🎉 PILOT: A Pre-trained Model-Based Continual Learning Toolbox
Created 2023-08-30
69 commits to main branch, last one 14 days ago
Code for RoboFlamingo
Created 2023-11-02
34 commits to main branch, last one 7 months ago