20 results found Sort:

974
9.9k
bsd-3-clause
98
LAVIS - A One-stop Library for Language-Vision Intelligence
Created 2022-08-24
492 commits to main branch, last one 2 days ago
261
2.8k
bsd-3-clause
32
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Created 2023-05-06
145 commits to main branch, last one 5 months ago
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Created 2024-03-07
11 commits to main branch, last one 7 months ago
108
1.2k
cc-by-4.0
15
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for ...
Created 2023-05-18
43 commits to main branch, last one 2 months ago
51
1.1k
mit
22
Janus-Series: Unified Multimodal Understanding and Generation Models
Created 2024-10-18
16 commits to main branch, last one 8 days ago
31
637
unknown
19
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
Created 2021-10-09
34 commits to main branch, last one 2 years ago
Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Created 2023-01-23
47 commits to master branch, last one 5 months ago
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
Created 2024-06-13
6 commits to main branch, last one 4 months ago
4
149
apache-2.0
7
[CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》
Created 2022-12-16
16 commits to main branch, last one about a year ago
10
86
apache-2.0
2
[MedIA'24] FLAIR: A Foundation LAnguage-Image model of the Retina for fundus image understanding.
Created 2023-07-17
25 commits to main branch, last one 6 months ago
8
83
unknown
9
PyTorch implementation of ICML 2023 paper "SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation"
Created 2022-11-11
8 commits to main branch, last one about a year ago
Official repository for "CLIP model is an Efficient Continual Learner".
Created 2022-10-03
5 commits to master branch, last one about a year ago
This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl....
Created 2022-03-13
66 commits to main branch, last one about a year ago
Multi-Aspect Vision Language Pretraining - CVPR2024
Created 2024-03-06
72 commits to main branch, last one 3 months ago
📍 Official pytorch implementation of paper "ProtoCLIP: Prototypical Contrastive Language Image Pretraining" (IEEE TNNLS)
Created 2022-06-20
4 commits to main branch, last one about a year ago
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models. [ICCV 2023 Oral]
Created 2023-07-30
7 commits to main branch, last one about a year ago
Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
Created 2023-05-24
8 commits to master branch, last one about a year ago
1
31
unknown
6
Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)
Created 2023-02-28
6 commits to master branch, last one about a year ago
0
25
unknown
3
Bias-to-Text: Debiasing Unknown Visual Biases through Language Interpretation
Created 2023-01-31
4 commits to master branch, last one about a year ago