17 results found Sort:

172
2.0k
mit
27
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, ...
Created 2024-03-03
35 commits to main branch, last one 2 months ago
61
531
apache-2.0
27
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
Created 2024-04-21
30 commits to main branch, last one 7 months ago
85
475
apache-2.0
6
CLIPort: What and Where Pathways for Robotic Manipulation
Created 2021-09-20
91 commits to master branch, last one about a year ago
65
461
mit
5
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
Created 2021-07-20
269 commits to main branch, last one 23 days ago
30
460
mit
10
Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"
Created 2023-10-01
115 commits to main branch, last one 10 months ago
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
Created 2023-11-20
8 commits to main branch, last one about a year ago
We perform functional grounding of LLMs' knowledge in BabyAI-Text
Created 2023-02-01
52 commits to main branch, last one 5 months ago
8
115
apache-2.0
4
[TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.
Created 2023-05-13
32 commits to master branch, last one 9 days ago
Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)
Created 2024-02-25
41 commits to main branch, last one 3 months ago
Hierarchical Universal Language Conditioned Policies
Created 2022-04-12
47 commits to main branch, last one 10 months ago
[TPAMI reviewing] Towards Visual Grounding: A Survey
Created 2024-07-03
60 commits to master branch, last one 17 days ago
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
Created 2021-02-10
57 commits to main branch, last one 3 years ago
8
49
mit
3
[Paper][AAAI 2023] DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning
Created 2022-11-27
48 commits to main branch, last one 11 months ago
3
38
mit
3
[ICRA2023] Grounding Language with Visual Affordances over Unstructured Data
Created 2022-11-06
4 commits to main branch, last one about a year ago
4
35
apache-2.0
2
[ACM MM 2024] Hierarchical Multimodal Fine-grained Modulation for Visual Grounding.
Created 2024-04-20
17 commits to master branch, last one 9 days ago
This is the official implementation for our paper;"LAR:Look Around and Refer".
Created 2022-03-14
22 commits to main branch, last one 2 years ago