3 results found Sort:

[CVPR2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events
Created 2021-03-27
39 commits to master branch, last one 5 months ago
[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
Created 2023-10-29
9 commits to main branch, last one about a year ago
[ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
Created 2023-05-24
9 commits to master branch, last one about a month ago