Search Results - RepositoryStats

676

5.1k

bsd-3-clause

31

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

vision-language image-captioning visual-reasoning image-text-retrieval visual-question-answering vision-language-transformer vision-and-language-pre-training

Created 2022-01-25

64 commits to main branch, last one 2 years ago

OFA OFA-Sys

249

2.5k

apache-2.0

20

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

prompt chinese multimodal pretraining prompt-tuning vision-language image-captioning pretrained-models text-to-image-synthesis visual-question-answering referring-expression-comprehension

Created 2022-01-29

712 commits to main branch, last one about a year ago

bottom-up-attention peteanderson80

377

1.4k

mit

25

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

vqa caffe mscoco faster-rcnn mscoco-dataset image-captioning captioning-images visual-question-answering

Created 2017-05-26

55 commits to master branch, last one 4 years ago

flamingo-pytorch lucidrains

62

1.2k

mit

21

Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

transformers deep-learning attention-mechanism artificial-intelligence visual-question-answering

Created 2022-04-28

36 commits to main branch, last one 2 years ago

xmodaler YehLi

105

970

other

28

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense r...

tden pretraining image-captioning video-captioning vision-and-language cross-modal-retrieval visual-question-answering

Created 2021-06-25

84 commits to master branch, last one 2 years ago

awesome-multimodal-in-medical-imaging richard-peng-xia

65

690

mit

17

A collection of resources on applications of multi-modal learning in medical imaging.

medical-imaging multimodal-learning large-language-models large-multimodal-models multimodal-deep-learning medical-report-generation visual-question-answering multimodal-large-language-models

Created 2022-07-13

158 commits to main branch, last one 16 days ago

ban-vqa jnhwkim

100

545

mit

13

Bilinear attention networks for visual question answering

attention bilinear-pooling pytorch-implmention visual-question-answering

This repository has been archived (exclude archived)

Created 2018-06-12

48 commits to master branch, last one about a year ago

MMMU MMMU-Benchmark

33

401

apache-2.0

3

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

llm llms stem evaluation multimodal deep-learning multimodality computer-vision machine-learning foundation-models question-answering multimodal-learning deep-neural-networks large-language-models large-multimodal-models multimodal-deep-learning visual-question-answering natural-language-processing

Created 2023-11-23

147 commits to main branch, last one 6 days ago

KG-MM-Survey zjukg

19

393

mit

8

Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

awsome survey surveys paper-list awsome-list entity-linking knowledge-graph entity-alignment image-generation multi-modal-fusion image-classification multi-modal-learning cross-modal-retrieval large-language-models information-extraction visual-question-answering knowledge-graph-embeddings multi-modal-knowledge-graph

Created 2024-01-29

83 commits to main branch, last one 3 months ago

tbd-nets davidmascharka

74

348

mit

15

PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

vqa pytorch deep-learning visualization neural-networks machine-learning visual-question-answering

Created 2018-03-13

34 commits to master branch, last one 6 years ago

openvqa MILVLG

64

321

apache-2.0

11

A lightweight, scalable, and general framework for visual question answering research

vqa pytorch benchmark deep-learning visual-question-answering

Created 2019-07-04

254 commits to master branch, last one 3 years ago

MathVista lupantech

47

283

cc-by-sa-4.0

5

MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts

mathqa ai4math science mathematics machine-learning large-language-models visual-question-answering large-multimadality-models

Created 2023-10-04

112 commits to main branch, last one 3 months ago

prophet MILVLG

27

271

apache-2.0

2

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

gpt-3 okvqa a-okvqa pytorch prompt-engineering multimodal-deep-learning visual-question-answering

Created 2023-01-09

63 commits to main branch, last one about a year ago

pytorch-vqa Cyanogenoid

99

239

unknown

7

Strong baseline for visual question answering

vqa pytorch baseline visual-question-answering

Created 2017-07-30

18 commits to master branch, last one 2 years ago

awesome-computer-vision-resources HanXinzi-AI

31

231

unknown

2

a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。

Created 2021-04-28

47 commits to main branch, last one 3 years ago

NuScenes-QA qiantianwen

2

177

mit

14

[AAAI 2024] NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario.

vision-language autonomous-driving visual-question-answering

Created 2023-05-24

16 commits to main branch, last one 4 months ago

MMStar MMStar-Benchmark

5

168

unknown

1

[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

llm llms lvlm lvlms evaluation multimodal multimodality multimodal-learning large-language-models large-multimodal-models visual-question-answering large-vision-language-model large-vision-language-models

Created 2024-03-29

19 commits to main branch, last one 5 months ago

FrozenBiLM antoyang

23

156

apache-2.0

4

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

vqa videoqa pre-training multimodal-learning video-understanding vision-and-language large-language-models video-question-answering visual-question-answering weakly-supervised-learning

Created 2022-09-25

18 commits to main branch, last one 3 months ago

tifa Yushi-Hu

9

153

apache-2.0

2

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

image-to-text text-to-image large-language-models visual-question-answering

Created 2023-03-21

76 commits to main branch, last one 10 months ago

just-ask antoyang

15

119

apache-2.0

4

[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

vqa videoqa pre-training multimodal-learning question-generation video-understanding vision-and-language video-question-answering visual-question-answering weakly-supervised-learning

Created 2021-03-22

47 commits to main branch, last one about a year ago

VILLA zhegan27

14

119

mit

8

Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part

pretraining neurips-2020 vision-and-language adversarial-training visual-question-answering

Created 2020-10-20

5 commits to main branch, last one 4 years ago

aokvqa allenai

8

79

apache-2.0

5

Official repository for the A-OKVQA dataset

dataset computer-vision visual-question-answering natural-language-processing

Created 2022-05-10

28 commits to main branch, last one 10 months ago

LOVA3 showlab

2

78

unknown

5

(NeurIPS 2024) Official PyTorch implementation of LOVA3

benchmark large-multimodal-models visual-question-answering visual-question-generation multimodal-large-language-models

Created 2024-05-19

40 commits to main branch, last one 12 days ago

Flipped-VQA mlvlab

10

74

mit

5

Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)

emnlp2023 multi-modal large-language-models video-question-answering visual-question-answering

Created 2023-10-19

20 commits to main branch, last one 7 months ago

ZS-F-VQA China-UK-ZSL

15

70

mit

2

[Paper][ISWC 2021] Zero-shot Visual Question Answering using Knowledge Graph

vqa zsl fvqa zs-f-vqa zero-shot commonsense knowledge-graph commonsense-reasoning visual-question-answering

Created 2021-07-11

93 commits to main branch, last one about a year ago

TRAR-VQA rentainhe

18

66

mit

2

[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"

clevr vqav2 pytorch iccv2021 official attention multi-modal transformer visualization multi-modality dynamic-network local-and-global vision-and-language multi-modal-learning multi-scale-features visual-question-answering

Created 2021-07-23

61 commits to main branch, last one 3 years ago

multimodal-meta-learn ivonajdenkoska

2

59

mit

5

[ICLR 2023] Official code repository for "Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning"

iclr-2023 meta-learning vision-language image-captioning few-shot-learning visual-question-answering

Created 2023-02-24

6 commits to main branch, last one about a year ago

MICCAI19-MedVQA aioz-ai

35

58

mit

6

AIOZ AI - Overcoming Data Limitation in Medical Visual Question Answering (MICCAI 2019)

ai vqa aioz medvqa miccai aioz-ai medical deep-learning medical-image-processing visual-question-answering

Created 2019-07-24

10 commits to master branch, last one 2 years ago

fusion_brain_aij2021 ai-forever

15

50

unknown

5

Creating multimodal multitask models

bilingual multitask java-to-python multimodal-fusion visual-question-answering zero-shot-object-detection handwritten-text-recognition

Created 2021-09-13

271 commits to main branch, last one 3 years ago

VisualWebBench VisualWebBench

1

49

unknown

2

Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"

llm llms mllm evaluation multimodal deep-learning computer-vision machine-learning foundation-models question-answering large-language-models large-multimodal-models multimodal-deep-learning visual-question-answering natural-language-processing multimodal-large-language-models

Created 2024-04-02

26 commits to main branch, last one 4 months ago