1 result found Sort:
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
benchmark
multimodal
video-clip
video-data
video-dataset
self-supervised
video-retrieval
foundation-models
action-recognition
instruction-tuning
masked-autoencoder
vision-transformer
video-understanding
zero-shot-retrieval
contrastive-learning
open-set-recognition
video-question-answering
zero-shot-classification
temporal-action-localization
spatio-temporal-action-localization
Created
2022-11-23
213 commits to main branch, last one 6 days ago