15 results found Sort:

161
1.8k
mit
26
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, ...
Created 2024-03-03
34 commits to main branch, last one 22 hours ago
59
556
apache-2.0
35
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
Created 2024-04-21
30 commits to main branch, last one 5 months ago
81
461
apache-2.0
6
CLIPort: What and Where Pathways for Robotic Manipulation
Created 2021-09-20
91 commits to master branch, last one about a year ago
29
447
mit
10
Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"
Created 2023-10-01
115 commits to main branch, last one 7 months ago
57
394
mit
6
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
Created 2021-07-20
263 commits to main branch, last one 2 months ago
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
Created 2023-11-20
8 commits to main branch, last one 10 months ago
We perform functional grounding of LLMs' knowledge in BabyAI-Text
Created 2023-02-01
52 commits to main branch, last one 2 months ago
6
107
apache-2.0
4
[TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.
Created 2023-05-13
30 commits to master branch, last one 3 months ago
Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)
Created 2024-02-25
41 commits to main branch, last one 18 days ago
Hierarchical Universal Language Conditioned Policies
Created 2022-04-12
47 commits to main branch, last one 7 months ago
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
Created 2021-02-10
57 commits to main branch, last one 3 years ago
8
49
mit
4
[Paper][AAAI 2023] DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning
Created 2022-11-27
48 commits to main branch, last one 9 months ago
3
36
mit
3
[ICRA2023] Grounding Language with Visual Affordances over Unstructured Data
Created 2022-11-06
4 commits to main branch, last one about a year ago
3
31
apache-2.0
2
[ACM MM 2024] Hierarchical Multimodal Fine-grained Modulation for Visual Grounding.
Created 2024-04-20
15 commits to master branch, last one 19 days ago