10 results found Sort:
- Filter by Primary Language:
- Python (6)
- Jupyter Notebook (2)
- C++ (1)
- +
LAVIS - A One-stop Library for Language-Vision Intelligence
Created
2022-08-24
492 commits to main branch, last one 2 days ago
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Created
2023-03-09
84 commits to main branch, last one 3 months ago
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Created
2022-01-25
64 commits to main branch, last one 2 years ago
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
ocr
document
documentai
multimodal
end-to-end-ocr
text-detection
computer-vision
vision-language
text-recognition
document-analysis
document-recognition
scene-text-detection
document-intelligence
vision-language-model
document-understanding
scene-text-recognition
artificial-intelligence
multimodal-deep-learning
vision-language-transformer
scene-text-detection-recognition
Created
2022-09-28
62 commits to main branch, last one about a month ago
[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation
Created
2023-03-11
14 commits to main branch, last one about a year ago
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
Created
2023-08-25
57 commits to main branch, last one 6 months ago
[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation
Created
2021-07-23
7 commits to main branch, last one 3 years ago
[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.
Created
2023-05-27
146 commits to main branch, last one about a year ago
Instruction Following Agents with Multimodal Transforemrs
Created
2022-10-23
5 commits to main branch, last one 2 years ago
[ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.
Created
2023-05-27
11 commits to main branch, last one about a year ago