95 results found Sort:

2.2k
20.1k
apache-2.0
157
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Created 2023-04-17
460 commits to main branch, last one 5 months ago
461
5.9k
mit
52
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Created 2023-11-22
188 commits to main branch, last one 8 days ago
378
5.0k
other
49
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Created 2023-08-21
136 commits to master branch, last one 7 months ago
277
3.2k
apache-2.0
28
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Created 2024-03-26
35 commits to main branch, last one 6 months ago
154
2.5k
apache-2.0
43
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Created 2023-09-26
395 commits to main branch, last one 27 days ago
216
2.5k
unknown
121
Collection of AWESOME vision-language models for vision tasks
Created 2023-03-30
86 commits to main branch, last one 3 days ago
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Created 2024-03-07
11 commits to main branch, last one 6 months ago
161
1.8k
mit
26
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, ...
Created 2024-03-03
34 commits to main branch, last one 21 hours ago
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Created 2022-09-28
62 commits to main branch, last one about a month ago
75
1.3k
other
16
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Created 2023-03-02
36 commits to main branch, last one 9 months ago
The code used to train and run inference with the ColPali architecture.
Created 2024-06-20
113 commits to main branch, last one 14 hours ago
43
830
apache-2.0
7
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Created 2023-11-13
83 commits to main branch, last one 21 days ago
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
Created 2023-11-02
42 commits to main branch, last one 5 months ago
42
687
apache-2.0
13
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Created 2023-11-27
97 commits to main branch, last one 3 months ago
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
Created 2023-11-02
3 commits to main branch, last one about a year ago
59
556
apache-2.0
35
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
Created 2024-04-21
30 commits to main branch, last one 5 months ago
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
Created 2024-08-12
41 commits to main branch, last one 2 days ago
47
520
other
32
[ ICLR 2024 ] Official Codebase for "InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists"
Created 2023-02-04
128 commits to main branch, last one 6 months ago
29
503
apache-2.0
6
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
Created 2024-06-13
31 commits to main branch, last one 2 days ago
35
467
mit
6
MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.
Created 2024-04-16
161 commits to main branch, last one 15 days ago
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP...
Created 2023-05-10
86 commits to main branch, last one 6 months ago
Famous Vision Language Models and Their Architectures
Created 2024-02-15
231 commits to main branch, last one about a month ago
A curated list of awesome knowledge-driven autonomous driving (continually updated)
Created 2023-10-24
51 commits to main branch, last one 5 months ago
An open-source implementation for training LLaVA-NeXT.
Created 2024-05-11
36 commits to master branch, last one 14 days ago
13
355
apache-2.0
4
Index your memes by their content and text, making them easily retrievable for your meme warfare pleasures. Find funny fast.
Created 2024-06-08
83 commits to main branch, last one 3 months ago
36
328
apache-2.0
11
「大模型」3小时从0训练27M参数的视觉多模态VLM,个人显卡即可推理训练!
Created 2024-09-11
94 commits to master branch, last one 4 days ago
26
321
apache-2.0
9
Parsing-free RAG supported by VLMs
Created 2024-10-14
55 commits to master branch, last one 2 days ago
Code for RoboFlamingo
Created 2023-11-02
34 commits to main branch, last one 6 months ago
🎉 PILOT: A Pre-trained Model-Based Continual Learning Toolbox
Created 2023-08-30
68 commits to main branch, last one about a month ago