140 results found Sort:

A one stop repository for generative AI research updates, interview resources, notebooks and much more!
Created 2024-02-06
267 commits to main branch, last one 13 days ago
1.0k
10.4k
bsd-3-clause
95
LAVIS - A One-stop Library for Language-Vision Intelligence
Created 2022-08-24
492 commits to main branch, last one 4 months ago
203
2.5k
apache-2.0
34
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
Created 2023-11-24
505 commits to develop branch, last one 7 days ago
271
2.5k
apache-2.0
133
Build multimodal language agents for fast prototype and production
Created 2024-07-04
531 commits to main branch, last one about a month ago
205
1.6k
bsd-3-clause
12
Code for ALBEF: a new vision-language pre-training method
Created 2021-07-13
39 commits to main branch, last one 2 years ago
131
1.5k
apache-2.0
13
Multimodal-GPT
Created 2023-04-26
24 commits to main branch, last one about a year ago
216
1.4k
apache-2.0
14
Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
Created 2021-03-25
19 commits to master branch, last one 3 years ago
111
1.3k
apache-2.0
69
Real-time and accurate open-vocabulary end-to-end object detection
Created 2024-03-11
32 commits to main branch, last one 3 months ago
73
1.3k
other
16
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Created 2023-03-02
36 commits to main branch, last one about a year ago
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
Created 2020-03-25
38 commits to master branch, last one 3 years ago
252
1.0k
mit
25
Oscar and VinVL
This repository has been archived (exclude archived)
Created 2020-05-14
28 commits to master branch, last one about a year ago
86
1.0k
apache-2.0
20
Codebase for Aria - an Open Multimodal Native MoE
Created 2024-09-29
207 commits to main branch, last one 2 months ago
71
1.0k
apache-2.0
14
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Created 2023-05-18
136 commits to main branch, last one 6 months ago
105
970
other
28
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense r...
Created 2021-06-25
84 commits to master branch, last one 2 years ago
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
Created 2023-11-02
43 commits to main branch, last one 4 months ago
My Reading Lists of Deep Learning and Natural Language Processing
Created 2018-02-02
18 commits to master branch, last one 2 years ago
55
802
apache-2.0
12
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Created 2023-11-27
97 commits to main branch, last one 8 months ago
112
792
unknown
17
Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"
Created 2020-01-28
70 commits to master branch, last one 3 years ago
53
763
other
10
[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation
Created 2024-04-11
45 commits to main branch, last one 6 months ago
37
760
unknown
13
[ECCV 2024 Best Paper Candidate] PointLLM: Empowering Large Language Models to Understand Point Clouds
Created 2023-08-17
51 commits to master branch, last one 5 months ago
111
740
mit
14
Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".
Created 2019-11-22
25 commits to master branch, last one 4 years ago
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
Created 2021-02-10
14 commits to main branch, last one 2 years ago
This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]
Created 2024-04-10
56 commits to master branch, last one 9 months ago
This repository is a curated collection of the most exciting and influential CVPR 2023 papers. 🔥 [Paper + Code]
Created 2023-06-15
30 commits to master branch, last one 9 months ago
Creating a software for automatic monitoring in online proctoring
Created 2020-05-04
86 commits to master branch, last one 4 months ago
AI Research Platform for Reinforcement Learning from Real Panoramic Images.
Created 2017-09-29
210 commits to master branch, last one 3 years ago
73
533
mit
5
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
Created 2021-07-20
271 commits to main branch, last one about a month ago
A curated list of awesome vision and language resources (still under construction... stay tuned!)
Created 2019-10-25
48 commits to master branch, last one 5 months ago
52
477
bsd-3-clause
4
X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
Created 2021-11-15
31 commits to master branch, last one 2 years ago