58 results found Sort:

939
5.5k
other
114
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Created 2018-06-27
1,099 commits to main branch, last one 5 days ago
232
3.2k
apache-2.0
43
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, ...
Created 2023-05-08
261 commits to main branch, last one 3 months ago
261
1.7k
unknown
42
北京航空航天大学大数据高精尖中心自然语言处理研究团队开展了智能问答的研究与应用总结。包括基于知识图谱的问答(KBQA),基于文本的问答系统(TextQA),基于表格的问答系统(TableQA)、基于视觉的问答系统(VisualQA)和机器阅读理解(MRC)等,每类任务分别对学术界和工业界进行了相关总结。
Created 2020-04-28
215 commits to master branch, last one about a year ago
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Created 2017-05-26
55 commits to master branch, last one 3 years ago
102
1.4k
apache-2.0
20
streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, and Qwen2-VL
Created 2023-11-24
255 commits to develop branch, last one 16 days ago
193
1.4k
apache-2.0
11
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
Created 2023-12-01
1,017 commits to main branch, last one a day ago
75
1.3k
other
16
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Created 2023-03-02
36 commits to main branch, last one 10 months ago
252
1.0k
mit
25
Oscar and VinVL
This repository has been archived (exclude archived)
Created 2020-05-14
28 commits to master branch, last one about a year ago
[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-bas...
Created 2021-03-23
77 commits to main branch, last one about a year ago
An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.
Created 2017-12-16
24 commits to master branch, last one 5 years ago
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
Created 2021-02-10
14 commits to main branch, last one 2 years ago
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
Created 2019-03-03
36 commits to master branch, last one about a year ago
119
496
apache-2.0
31
Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)
Created 2018-04-14
71 commits to master branch, last one 3 years ago
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP...
Created 2023-05-10
86 commits to main branch, last one 7 months ago
PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"
Created 2018-03-13
34 commits to master branch, last one 6 years ago
64
321
apache-2.0
12
A lightweight, scalable, and general framework for visual question answering research
Created 2019-07-04
254 commits to master branch, last one 3 years ago
13
256
bsd-3-clause
11
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Created 2023-06-15
366 commits to main branch, last one 8 months ago
Multimodal Question Answering in the Medical Domain: A summary of Existing Datasets and Systems
Created 2020-04-23
12 commits to master branch, last one about a year ago
Strong baseline for visual question answering
Created 2017-07-30
18 commits to master branch, last one about a year ago
18
220
apache-2.0
5
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
Created 2023-05-22
4 commits to main branch, last one about a year ago
13
219
apache-2.0
3
[ICLR'24] Democratizing Fine-grained Visual Recognition with Large Language Models
Created 2023-10-02
24 commits to main branch, last one 4 months ago
Awesome LLM Papers and repos on very comprehensive topics.
Created 2024-01-13
171 commits to main branch, last one 3 months ago
Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"
Created 2019-10-07
14 commits to master branch, last one 3 years ago
23
156
apache-2.0
5
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Created 2022-09-25
16 commits to main branch, last one about a year ago
10
134
other
9
[NeurIPS 2022] Official Code for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
Created 2022-09-21
83 commits to main branch, last one 2 months ago
26
131
apache-2.0
7
Implementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)
Created 2020-02-28
15 commits to master branch, last one 4 years ago
19
125
mit
6
[IEEE TIP'2021] "UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content", Zhengzhong Tu, Yilin Wang, Neil Birkbeck, Balu Adsumilli, Alan C. Bovik
Created 2020-05-18
41 commits to master branch, last one 3 years ago
15
117
apache-2.0
5
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Created 2021-03-22
47 commits to main branch, last one about a year ago
14
116
apache-2.0
2
[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias
Created 2021-03-02
9 commits to master branch, last one 2 years ago
Code release for ICLR 2023 paper: SlotFormer on object-centric dynamics models
Created 2023-01-23
35 commits to master branch, last one about a year ago