28 results found Sort:
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
benchmark
multimodal
video-clip
video-data
video-dataset
self-supervised
video-retrieval
foundation-models
action-recognition
instruction-tuning
masked-autoencoder
vision-transformer
video-understanding
zero-shot-retrieval
contrastive-learning
open-set-recognition
video-question-answering
zero-shot-classification
temporal-action-localization
spatio-temporal-action-localization
Created
2022-11-23
213 commits to main branch, last one 4 days ago
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
Created
2021-02-10
14 commits to main branch, last one 2 years ago
Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
Created
2024-03-23
189 commits to main branch, last one 3 months ago
Video embeddings for retrieval with natural language queries
Created
2019-07-17
90 commits to master branch, last one 2 years ago
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
Created
2023-06-06
18 commits to main branch, last one 10 months ago
[NeurIPS 2021] Moment-DETR code and QVHighlights dataset
Created
2021-07-20
9 commits to main branch, last one about a year ago
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
Created
2023-05-22
4 commits to main branch, last one about a year ago
Authors official PyTorch implementation of the "ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning" [ICCV 2019]
Created
2019-08-14
38 commits to master branch, last one about a year ago
Official pytorch repository for "QD-DETR : Query-Dependent Video Representation for Moment Retrieval and Highlight Detection" (CVPR 2023 Paper)
Created
2023-01-30
31 commits to main branch, last one about a year ago
[ECCV 2020] PyTorch code for XML on TVRetrieval dataset - TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
Created
2020-01-27
17 commits to master branch, last one 5 months ago
A PyTorch implementation of VIOLET
Created
2021-11-24
47 commits to main branch, last one 11 months ago
[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
Created
2022-09-23
33 commits to main branch, last one 7 months ago
[ICCV 2023] DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
Created
2023-03-16
20 commits to main branch, last one 7 months ago
Authors official Tensorflow implementation of the "Near-Duplicate Video Retrieval with Deep Metric Learning" [ICCVW 2017]
Created
2018-09-13
29 commits to master branch, last one about a year ago
[CVPR 2023 Highlight] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Created
2023-02-28
33 commits to main branch, last one 7 months ago
Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)
Created
2023-03-24
9 commits to main branch, last one about a year ago
[arXiv22] Disentangled Representation Learning for Text-Video Retrieval
Created
2022-04-07
4 commits to main branch, last one 2 years ago
Authors official PyTorch implementation of the "DnS: Distill-and-Select for Efficient and Accurate Video Indexing and Retrieval" [IJCV 2022]
Created
2021-06-24
42 commits to main branch, last one about a year ago
TransVCL: Attention-enhanced Video Copy Localization Network with Flexible Supervision [AAAI2023 Oral]]
Created
2022-08-08
7 commits to main branch, last one about a year ago
[IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment
Created
2023-04-29
15 commits to main branch, last one 7 months ago
[TIP 2022] Official code of paper “Video Question Answering with Prior Knowledge and Object-sensitive Learning”
Created
2022-01-24
43 commits to main branch, last one 9 months ago
Video-aided Unsupervised Grammar Induction, NAACL‘21 [best long paper]
Created
2021-04-09
18 commits to master branch, last one 2 years ago
Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
Created
2023-05-24
8 commits to master branch, last one about a year ago
A PyTorch implementation of EmpiricalMVM
Created
2023-03-09
9 commits to main branch, last one 11 months ago
Authors official PyTorch implementation of the "Self-Supervised Video Similarity Learning" [CVPRW 2023]
Created
2023-04-05
21 commits to main branch, last one 12 months ago
[WACV'22] Code repository for the paper "Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting", https://arxiv.org/abs/2106.10137.
Created
2021-06-16
17 commits to main branch, last one 2 years ago
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)
Created
2023-03-23
9 commits to master branch, last one 7 months ago
A curated list of video-text datasets in a variety of languages. These datasets can be used for video captioning (video description) or video retrieval.
Created
2023-01-03
26 commits to main branch, last one 9 months ago