Search Results - RepositoryStats

mmf facebookresearch

941

5.6k

other

110

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

vqa dialog pytorch textvqa captioning multimodal deep-learning hateful-memes multi-tasking pretrained-models

Created 2018-06-27

1,103 commits to main branch, last one a day ago

maestro roboflow

199

2.5k

apache-2.0

32

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

vqa qwen2-vl paligemma captioning florence-2 multimodal fine-tuning phi-3-vision transformers objectdetection vision-and-language

Created 2023-11-24

497 commits to develop branch, last one a day ago

joycaption fpgaminer

17

377

apache-2.0

7

JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.

vlm captioning joycaption

Created 2024-10-12

4 commits to main branch, last one 3 months ago

CapDec DavidHuji

21

191

mit

4

CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)

clip gpt-2 clipcap captioning zero-shot-learning multimodal-deep-learning

Created 2022-05-22

291 commits to main branch, last one about a year ago

aac-datasets Labbeti

6

115

mit

2

Audio Captioning datasets for PyTorch.

audio caption dataset pytorch datasets captioning deep-learning audio-captioning

Created 2022-05-19

13 commits to main branch, last one 12 months ago

vistext mitvis

6

90

gpl-3.0

6

VisText is a benchmark dataset for semantically rich chart captioning.

t5 charts dataset captioning captioning-images

Created 2023-04-04

93 commits to main branch, last one about a year ago

Awesome-RS-Temporal-VLM Chen-Yang-Liu

4

78

unknown

2

Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey

captioning change-detection multimodal-deep-learning

Created 2024-12-03

12 commits to main branch, last one 5 days ago

MedCLIP Mauville

16

69

unknown

3

Medical image captioning using OpenAI's CLIP

clip captioning deep-learning medical-imaging machine-learning what-a-challenge-this-was

Created 2021-12-02

16 commits to main branch, last one 2 years ago

MTL-AQA ParitoshParmar

15

66

unknown

3

What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]

c3d lstm mtl-aqa pytorch captioning dilated-c3d video-captioning video-processing action-recognition multitask-learning dilated-convolution video-understanding representation-learning action-quality-assessment fine-grained-classification fine-grained-action-recognition

Created 2019-06-11

26 commits to master branch, last one 4 months ago

pacscore aimagelab

5

60

unknown

5

Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation. (CVPR 2023)

cvpr cvpr2023 captioning computer-vision captioning-images captioning-videos vision-and-language

Created 2023-01-29

19 commits to main branch, last one 15 days ago

VidSitu TheShadow29

8

59

mit

2

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

nlp srl video vision grounding captioning semantic-roles video-language event-relations captioning-videos vision-and-language

Created 2021-02-10

57 commits to main branch, last one 3 years ago

CaptainCaption 42lux

9

58

mit

2

A gradio based image captioning tool that uses the GPT-4-Vision API to generate detailed descriptions of images.

gradio tagging captioning openai-api gpt-4-vision

Created 2023-12-19

17 commits to main branch, last one 4 months ago

aac-metrics Labbeti

3

43

mit

2

Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.

text audio metrics captioning audio-captioning

Created 2022-09-20

19 commits to main branch, last one about a month ago

caption-by-committee DavidMChan

4

40

other

2

Using LLMs and pre-trained caption models for super-human performance on image captioning.

ai image python chatgpt captioning deep-learning machine-learning

Created 2022-12-14

50 commits to main branch, last one about a year ago

video-chat deepgram-devs

14

37

mit

13

Sample app to display live captioning to a WebRTC video session with the Deepgram API.

webrtc deepgram captioning speech-to-text speech-recognition

Created 2021-04-07

46 commits to main branch, last one 3 years ago

X-Trans2Cap CurryYuan

3

34

apache-2.0

3

[CVPR 2022] X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

cvpr2022 captioning

Created 2022-03-08

14 commits to main branch, last one 2 years ago