Search Results - RepositoryStats

1.0k

10.5k

bsd-3-clause

95

LAVIS - A One-stop Library for Language-Vision Intelligence

salesforce deep-learning image-captioning vision-framework multimodal-datasets vision-and-language deep-learning-library multimodal-deep-learning visual-question-anwsering vision-language-pretraining vision-language-transformer

Created 2022-08-24

492 commits to main branch, last one 5 months ago

GroundingDINO IDEA-Research

795

7.9k

apache-2.0

46

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

open-world vision-language object-detection open-world-detection vision-language-transformer

Created 2023-03-09

84 commits to main branch, last one 8 months ago

BLIP salesforce

683

5.2k

bsd-3-clause

31

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

vision-language image-captioning visual-reasoning image-text-retrieval visual-question-answering vision-language-transformer vision-and-language-pre-training

Created 2022-01-25

64 commits to main branch, last one 2 years ago

AdvancedLiterateMachinery AlibabaResearch

190

1.7k

apache-2.0

41

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

Created 2022-09-28

70 commits to main branch, last one 15 days ago

ReLA henghuiding

19

693

mit

5

[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation

cvpr2023 multimodal-learning vision-language-transformer referring-image-segmentation referring-expression-segmentation referring-expression-comprehension

Created 2023-03-11

14 commits to main branch, last one about a year ago

APE shenyunhang

42

563

apache-2.0

9

[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception

open-world object-detection image-segmentation vision-language-transformer referring-expression-comprehension

Created 2023-08-25

57 commits to main branch, last one 11 months ago

Vision-Language-Transformer henghuiding

23

353

mit

4

[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation

keras tpami iccv2021 tensorflow transformer vision-language referring-segmentation vision-language-transformer

Created 2021-07-23

7 commits to main branch, last one 3 years ago

instructrl haoliuhl

5

52

mit

1

Instruction Following Agents with Multimodal Transforemrs

jax flax transformer instructions machine-learning instruction-following reinforcement-learning vision-language-transformer

Created 2022-10-23

5 commits to main branch, last one 2 years ago