95 results found Sort:
- Filter by Primary Language:
- Python (75)
- Jupyter Notebook (6)
- C++ (1)
- Rust (1)
- TypeScript (1)
- Markdown (1)
- Java (1)
- +
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Created
2023-04-17
460 commits to main branch, last one 5 months ago
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Created
2023-11-22
188 commits to main branch, last one 8 days ago
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Created
2023-08-21
136 commits to master branch, last one 7 months ago
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Created
2024-03-26
35 commits to main branch, last one 6 months ago
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Created
2023-09-26
395 commits to main branch, last one 27 days ago
Collection of AWESOME vision-language models for vision tasks
Created
2023-03-30
86 commits to main branch, last one 3 days ago
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Created
2024-03-07
11 commits to main branch, last one 6 months ago
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, ...
Created
2024-03-03
34 commits to main branch, last one 21 hours ago
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
ocr
document
documentai
multimodal
end-to-end-ocr
text-detection
computer-vision
vision-language
text-recognition
document-analysis
document-recognition
scene-text-detection
document-intelligence
vision-language-model
document-understanding
scene-text-recognition
artificial-intelligence
multimodal-deep-learning
vision-language-transformer
scene-text-detection-recognition
Created
2022-09-28
62 commits to main branch, last one about a month ago
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Created
2023-03-02
36 commits to main branch, last one 9 months ago
The code used to train and run inference with the ColPali architecture.
Created
2024-06-20
113 commits to main branch, last one 14 hours ago
日本語LLMまとめ - Overview of Japanese LLMs
Created
2023-07-09
469 commits to main branch, last one 5 days ago
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Created
2023-11-13
83 commits to main branch, last one 21 days ago
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
Created
2023-11-02
42 commits to main branch, last one 5 months ago
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Created
2023-11-27
97 commits to main branch, last one 3 months ago
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
Created
2023-11-02
3 commits to main branch, last one about a year ago
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
Created
2024-04-21
30 commits to main branch, last one 5 months ago
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
Created
2024-08-12
41 commits to main branch, last one 2 days ago
[ ICLR 2024 ] Official Codebase for "InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists"
Created
2023-02-04
128 commits to main branch, last one 6 months ago
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
Created
2024-06-13
31 commits to main branch, last one 2 days ago
MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.
Created
2024-04-16
161 commits to main branch, last one 15 days ago
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP...
Created
2023-05-10
86 commits to main branch, last one 6 months ago
Famous Vision Language Models and Their Architectures
Created
2024-02-15
231 commits to main branch, last one about a month ago
A curated list of awesome knowledge-driven autonomous driving (continually updated)
Created
2023-10-24
51 commits to main branch, last one 5 months ago
An open-source implementation for training LLaVA-NeXT.
Created
2024-05-11
36 commits to master branch, last one 14 days ago
Index your memes by their content and text, making them easily retrievable for your meme warfare pleasures. Find funny fast.
Created
2024-06-08
83 commits to main branch, last one 3 months ago
「大模型」3小时从0训练27M参数的视觉多模态VLM,个人显卡即可推理训练!
Created
2024-09-11
94 commits to master branch, last one 4 days ago
Parsing-free RAG supported by VLMs
Created
2024-10-14
55 commits to master branch, last one 2 days ago
Code for RoboFlamingo
Created
2023-11-02
34 commits to main branch, last one 6 months ago
🎉 PILOT: A Pre-trained Model-Based Continual Learning Toolbox
Created
2023-08-30
68 commits to main branch, last one about a month ago