Search Results - RepositoryStats

mmf facebookresearch

940

5.6k

other

110

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

vqa dialog pytorch textvqa captioning multimodal deep-learning hateful-memes multi-tasking pretrained-models

Created 2018-06-27

1,103 commits to main branch, last one 5 days ago

InternGPT OpenGVLab

231

3.2k

apache-2.0

41

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, ...

Created 2023-05-08

261 commits to main branch, last one 7 months ago

maestro roboflow

201

2.5k

apache-2.0

32

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

vqa qwen2-vl paligemma captioning florence-2 multimodal fine-tuning phi-3-vision transformers objectdetection vision-and-language

Created 2023-11-24

497 commits to develop branch, last one 5 days ago

VLMEvalKit open-compass

300

2.1k

apache-2.0

12

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

gpt llm vit vqa clip gpt4 qwen llava claude gemini gpt-4v openai chatgpt pytorch evaluation openai-api multi-modal computer-vision large-language-models

Created 2023-12-01

1,246 commits to main branch, last one 2 days ago

QA-Survey-CN BDBC-KG-NLP

263

1.8k

unknown

40

北京航空航天大学大数据高精尖中心自然语言处理研究团队开展了智能问答的研究与应用总结。包括基于知识图谱的问答（KBQA），基于文本的问答系统（TextQA），基于表格的问答系统（TableQA）、基于视觉的问答系统（VisualQA）和机器阅读理解（MRC）等，每类任务分别对学术界和工业界进行了相关总结。

qa cqa nlp tqa vqa kbqa survey qa-survey question-answering

Created 2020-04-28

215 commits to master branch, last one about a year ago

bottom-up-attention peteanderson80

377

1.4k

mit

25

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

vqa caffe mscoco faster-rcnn mscoco-dataset image-captioning captioning-images visual-question-answering

Created 2017-05-26

55 commits to master branch, last one 4 years ago

prismer NVlabs

73

1.3k

other

16

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

vqa language-model image-captioning multi-task-learning vision-and-language multi-modal-learning vision-language-model

Created 2023-03-02

36 commits to main branch, last one about a year ago

Oscar microsoft

253

1.0k

mit

25

Oscar and VinVL

vqa oscar vinvl pre-training image-captioning image-text-search vision-and-language

This repository has been archived (exclude archived)

Created 2020-05-14

28 commits to master branch, last one about a year ago

Transformer-MM-Explainability hila-chefer

109

838

mit

8

[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-bas...

vqa clip detr lxmert visualbert transformer transformers visualization explainability explainable-ai interpretability

Created 2021-03-23

77 commits to main branch, last one 2 years ago

bottom-up-attention-vqa hengyuan-hu

181

757

gpl-3.0

34

An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.

vqa pytorch bottom-up-attention

Created 2017-12-16

24 commits to master branch, last one 5 years ago

ClipBERT jayleicn

86

717

mit

8

[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.

vqa pytorch cvpr2021 video-retrieval vision-and-language video-question-answering

Created 2021-02-10

14 commits to main branch, last one 2 years ago

awesome-visual-question-answering jokieleung

95

661

unknown

24

A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.

vqa multi-modal awesome-list attention-networks multi-modal-learning

Created 2019-03-03

36 commits to master branch, last one 2 years ago

Multi-Modality-Arena OpenGVLab

37

507

unknown

7

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP...

vqa chat llms gradio chatbot chatgpt multi-modality large-language-models vision-language-model

Created 2023-05-10

86 commits to main branch, last one 11 months ago

mac-network stanfordnlp

119

501

apache-2.0

30

Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)

vqa clevr attention tensorflow machine-reasoning question-answering compositional-attention-networks

Created 2018-04-14

71 commits to master branch, last one 3 years ago

tbd-nets davidmascharka

74

348

mit

15

PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

vqa pytorch deep-learning visualization neural-networks machine-learning visual-question-answering

Created 2018-03-13

34 commits to master branch, last one 6 years ago

openvqa MILVLG

64

321

apache-2.0

11

A lightweight, scalable, and general framework for visual question answering research

vqa pytorch benchmark deep-learning visual-question-answering

Created 2019-07-04

254 commits to master branch, last one 3 years ago

Existing-Medical-QA-Datasets abachaa

33

281

unknown

12

Multimodal Question Answering in the Medical Domain: A summary of Existing Datasets and Systems

qa nlp vqa bionlp datasets radiology medical-qa computer-vision question-answering medical-informatics medical-qa-datasets consumer-health-questions

Created 2020-04-23

12 commits to master branch, last one about a year ago

LRV-Instruction FuxiaoLiu

13

272

bsd-3-clause

12

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

gpt vqa iclr gpt-4 llama llava vicuna vision chatgpt iclr2024 evaluation multimodal hallucination object-detection foundation-models evaluation-metrics prompt-engineering vision-and-language

Created 2023-06-15

366 commits to main branch, last one about a year ago

pytorch-vqa Cyanogenoid

99

239

unknown

7

Strong baseline for visual question answering

vqa pytorch baseline visual-question-answering

Created 2017-07-30

18 commits to master branch, last one 2 years ago

OmniFusion FusionBrainLab

26

230

apache-2.0

5

OmniFusion — a multimodal model to communicate using text and images

vcr vqa multimodal transformer ai-assistant visual-encoding large-language-models

Created 2023-11-20

80 commits to main branch, last one 10 months ago

mPLUG-2 X-PLUG

19

223

apache-2.0

4

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)

vqa mllm mplug video multimodal image-retrieval video-retrieval foundation-models multimodal-pretraining video-question-answering

Created 2023-05-22

4 commits to main branch, last one about a year ago

Awesome-LLM-Papers-Comprehensive-Topics shure-dev

22

210

unknown

11

Awesome LLM Papers and repos on very comprehensive topics.

Created 2024-01-13

171 commits to main branch, last one 7 months ago

VQA_ReGAT linjieli222

38

182

mit

6

Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"

vqa pytorch attention

Created 2019-10-07

14 commits to master branch, last one 3 years ago

FineR OatmealLiu

13

171

apache-2.0

3

[ICLR'24] Democratizing Fine-grained Visual Recognition with Large Language Models

vqa reasoning vision-language large-language-models fine-grained-recognition

Created 2023-10-02

24 commits to main branch, last one 8 months ago

FrozenBiLM antoyang

23

156

apache-2.0

4

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

vqa videoqa pre-training multimodal-learning video-understanding vision-and-language large-language-models video-question-answering visual-question-answering weakly-supervised-learning

Created 2022-09-25

18 commits to main branch, last one 3 months ago

hcrn-videoqa thaolmk54

26

132

apache-2.0

7

Implementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)

vqa tgif-qa videoqa question-answering

Created 2020-02-28

15 commits to master branch, last one 4 years ago

VIDEVAL vztu

20

129

mit

6

[IEEE TIP'2021] "UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content", Zhengzhong Tu, Yilin Wang, Neil Birkbeck, Balu Adsumilli, Alan C. Bovik

iqa vqa dataset bvqa-model evaluation feature-extraction user-generated-content image-quality-assessment video-quality-assessment

Created 2020-05-18

41 commits to master branch, last one 3 years ago

cfvqa yuleiniu

13

120

apache-2.0

2

[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias

vqa cvpr pytorch cvpr2021 causality language-bias counterfactual causal-inference

Created 2021-03-02

9 commits to master branch, last one 3 years ago

just-ask antoyang

15

120

apache-2.0

4

[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

vqa videoqa pre-training multimodal-learning question-generation video-understanding vision-and-language video-question-answering visual-question-answering weakly-supervised-learning

Created 2021-03-22

47 commits to main branch, last one about a year ago

SlotFormer pairlab

22

106

mit

5

Code release for ICLR 2023 paper: SlotFormer on object-centric dynamics models

vqa pytorch planning deep-learning dynamics-model computer-vision video-prediction

Created 2023-01-23

35 commits to master branch, last one about a year ago