7 results found Sort:

Meshed-Memory Transformer for Image Captioning. CVPR 2020
Created 2019-12-12
10 commits to master branch, last one about a year ago
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions. CVPR 2019
Created 2019-02-14
12 commits to master branch, last one about a year ago
[CVPR 2021] Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
Created 2020-12-07
18 commits to main branch, last one 2 years ago
ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model, a unified and user-friendly shape-language model
Created 2023-11-30
22 commits to main branch, last one 11 months ago
[CVPR 2023] Vote2Cap-DETR and [T-PAMI 2024] Vote2Cap-DETR++; A set-to-set perspective towards 3D Dense Captioning; State-of-the-Art 3D Dense Captioning methods
Created 2022-11-28
101 commits to master branch, last one 3 months ago
6
41
unknown
2
[ECCV2022] D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding
Created 2021-11-30
12 commits to main branch, last one 2 years ago
Implemented 3 different architectures to tackle the Image Caption problem, i.e, Merged Encoder-Decoder - Bahdanau Attention - Transformers
Created 2021-01-04
20 commits to main branch, last one 3 years ago