3 results found Sort:

[CVPR2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events
Created 2021-03-27
39 commits to master branch, last one 4 months ago
[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
Created 2023-10-29
9 commits to main branch, last one 11 months ago
Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
Created 2023-05-24
8 commits to master branch, last one about a year ago