Search Results - RepositoryStats

1.1k

11.6k

other

84

An open source implementation of CLIP.

pytorch deep-learning language-model computer-vision contrastive-loss pretrained-models multi-modal-learning zero-shot-classification

Created 2021-07-28

588 commits to main branch, last one a day ago

Chinese-CLIP OFA-Sys

494

5.1k

mit

37

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

nlp clip chinese pytorch multi-modal transformers coreml-models deep-learning computer-vision vision-language contrastive-loss pretrained-models image-text-retrieval multi-modal-learning vision-and-language-pre-training

Created 2022-07-08

382 commits to master branch, last one 8 months ago

Macaw-LLM lyuchenyang

125

1.6k

apache-2.0

25

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

deep-learning language-model neural-networks machine-learning multi-modal-learning natural-language-processing

Created 2023-05-23

144 commits to main branch, last one 3 months ago

prismer NVlabs

73

1.3k

other

16

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

vqa language-model image-captioning multi-task-learning vision-and-language multi-modal-learning vision-language-model

Created 2023-03-02

36 commits to main branch, last one about a year ago

x-clip lucidrains

47

707

mit

25

A concise but complete implementation of CLIP with various experimental improvements from recent papers

deep-learning zero-shot-learning contrastive-learning multi-modal-learning artificial-intelligence

Created 2021-12-01

76 commits to main branch, last one about a year ago

awesome-visual-question-answering jokieleung

95

662

unknown

24

A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.

vqa multi-modal awesome-list attention-networks multi-modal-learning

Created 2019-03-03

36 commits to master branch, last one 2 years ago

EmbodiedScan OpenRobotLab

44

585

apache-2.0

7

[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

robotics 3d-vision computer-vision multi-modal-learning

Created 2023-12-11

106 commits to main branch, last one 3 months ago

zeta kyegomez

50

497

apache-2.0

6

Build high-performance AI models with modular building blocks

gpt4 llama2 longnet pytorch multi-modal transformer transformers deep-learning multi-platform speech-recognition multi-agent-systems multi-modal-learning artificial-intelligence

Created 2023-07-09

887 commits to master branch, last one 2 months ago

CVPR-2023-24-Papers DmitryRyumin

29

449

mit

9

CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included...

Created 2023-08-09

1,146 commits to main branch, last one 9 months ago

KG-MM-Survey zjukg

19

404

mit

9

Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

awsome survey surveys paper-list awsome-list entity-linking knowledge-graph entity-alignment image-generation multi-modal-fusion image-classification multi-modal-learning cross-modal-retrieval large-language-models information-extraction visual-question-answering knowledge-graph-embeddings multi-modal-knowledge-graph

Created 2024-01-29

83 commits to main branch, last one 4 months ago

PromptKD zhengli97

4

292

apache-2.0

4

[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"

clip cvpr2024 prompt-learning multi-modal-learning vision-language-model knowledge-distillation

Created 2024-02-27

48 commits to main branch, last one about a month ago

NeRCo Ysz2022

16

243

unknown

4

[ICCV 2023] Implicit Neural Representation for Cooperative Low-light Image Enhancement

iccv iccv2023 low-light-image multi-modal-learning neural-representation low-light-image-enhancement

Created 2023-03-20

38 commits to main branch, last one about a year ago

chug huggingface

11

157

apache-2.0

10

Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.

datasets webdataset dataloading pdf-document computer-vision distributed-training multi-modal-learning document-understanding

Created 2023-06-06

36 commits to main branch, last one about a year ago

Achelous GuanRunwei

10

152

unknown

4

The official repository of Achelous and Achelous++

4d-mmwave-radar object-tracking object-detection multi-modal-fusion multi-task-learning panoptic-perception multi-modal-learning semantic-segmentation point-cloud-segmentation

Created 2023-03-17

150 commits to main branch, last one 9 months ago

ReCon qizekun

14

144

mit

7

[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining

3d-point-clouds multi-modal-learning representation-learning self-supervised-learning

Created 2023-02-07

13 commits to main branch, last one 9 months ago

CGDETR wjun0830

13

129

other

6

Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"

detr pytorch computer-vision video-grounding moment-retrieval temporal-grounding highlight-detection video-summarization video-understanding multi-modal-learning text-video-retrieval detection-transformer

Created 2023-11-10

18 commits to main branch, last one 8 months ago

d-cube shikras

7

119

other

6

A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).

dataset vision-language object-detection multi-modal-learning open-vocabulary-detection referring-expression-comprehension

Created 2023-07-19

13 commits to main branch, last one about a year ago

EDITOR 924973292

6

100

mit

1

【CVPR2024】Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

reid msvr310 cvpr2024 rgbnt100 rgbnt201 multi-modal person-reid token-selection frequency-analysis multi-modal-learning vehicle-reidentification

Created 2024-03-14

124 commits to master branch, last one 6 months ago

Multimodal-Remote-Sensing-Toolkit likyoo

12

88

gpl-3.0

3

A python tool to perform deep learning experiments on multimodal remote sensing data.

python pytorch remote-sensing multi-modal-learning

Created 2021-02-28

26 commits to main branch, last one 3 years ago

Aurora WillDreamer

7

86

unknown

5

[NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model

multi-modal-learning parameter-efficient-tuning

Created 2023-05-14

20 commits to main branch, last one about a year ago

MRM-pytorch RL4M

5

79

mit

1

An official implementation of Advancing Radiograph Representation Learning with Masked Record Modeling (ICLR'23)

chest-xray-images pre-trained-model multi-modal-learning representation-learning self-supervised-learning

Created 2023-01-18

14 commits to main branch, last one 2 years ago

sugar-crepe RAIVNLab

9

78

mit

10

[NeurIPS 2023] A faithful benchmark for vision-language compositionality

pytorch benchmark deep-learning vision-and-language multi-modal-learning

Created 2023-06-04

54 commits to main branch, last one about a year ago

japanese-clip rinnakk

9

72

apache-2.0

8

Japanese CLIP by rinna Co., Ltd.

clip cloob vision japanese language-model pretrained-models multi-modal-learning

Created 2022-04-25

31 commits to master branch, last one 2 years ago

TRAR-VQA rentainhe

18

66

mit

2

[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"

clevr vqav2 pytorch iccv2021 official attention multi-modal transformer visualization multi-modality dynamic-network local-and-global vision-and-language multi-modal-learning multi-scale-features visual-question-answering

Created 2021-07-23

61 commits to main branch, last one 3 years ago