3 results found Sort:
[ICLR 2025] SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Created
2024-10-09
7 commits to main branch, last one 22 days ago
Code for "StarGen: A Spatiotemporal Autoregression Framework with Video Diffusion Model for Scalable and Controllable Scene Generation", Arxiv 2025.
Created
2025-01-13
2 commits to main branch, last one about a month ago
[NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding
evals
gpt-4
reasoning
gemini-pro
navigation
perception
neurips-2024
summarization
visual-reasoning
benchmark-dataset
egocentric-videos
spatial-intelligence
multiple-choice-questions
long-context-understanding
video-language-understanding
multimodal-large-language-models
1-hour-video-language-understanding
long-form-video-language-understanding
Created
2024-11-27
9 commits to main branch, last one about a month ago