13 results found Sort:

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, ...
Created 2024-03-03
17 commits to main branch, last one 2 days ago
55
466
apache-2.0
36
Grounded Multimodal Large Language Model with Localized Visual Tokenization
Created 2024-04-21
30 commits to main branch, last one 22 days ago
80
432
apache-2.0
6
CLIPort: What and Where Pathways for Robotic Manipulation
Created 2021-09-20
91 commits to master branch, last one about a year ago
25
423
mit
10
Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"
Created 2023-10-01
115 commits to main branch, last one 3 months ago
49
303
mit
6
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
Created 2021-07-20
259 commits to main branch, last one 3 months ago
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
Created 2023-11-20
8 commits to main branch, last one 5 months ago
We perform functional grounding of LLMs' knowledge in BabyAI-Text
Created 2023-02-01
39 commits to main branch, last one 4 months ago
4
98
apache-2.0
3
Self-paced Curriculum Adapting of CLIP for Visual Grounding.
Created 2023-05-13
23 commits to master branch, last one about a month ago
Hierarchical Universal Language Conditioned Policies
Created 2022-04-12
47 commits to main branch, last one 3 months ago
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
Created 2021-02-10
57 commits to main branch, last one 2 years ago
8
41
mit
4
[Paper][AAAI 2023] DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning
Created 2022-11-27
48 commits to main branch, last one 4 months ago
2
31
mit
3
[ICRA2023] Grounding Language with Visual Affordances over Unstructured Data
Created 2022-11-06
4 commits to main branch, last one 8 months ago