16 results found Sort:

941
5.6k
other
110
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Created 2018-06-27
1,103 commits to main branch, last one a day ago
199
2.5k
apache-2.0
32
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
Created 2023-11-24
497 commits to develop branch, last one a day ago
17
377
apache-2.0
7
JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.
Created 2024-10-12
4 commits to main branch, last one 3 months ago
CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
Created 2022-05-22
291 commits to main branch, last one about a year ago
Audio Captioning datasets for PyTorch.
Created 2022-05-19
13 commits to main branch, last one 12 months ago
6
90
gpl-3.0
6
VisText is a benchmark dataset for semantically rich chart captioning.
Created 2023-04-04
93 commits to main branch, last one about a year ago
Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey
Created 2024-12-03
12 commits to main branch, last one 5 days ago
16
69
unknown
3
Medical image captioning using OpenAI's CLIP
Created 2021-12-02
16 commits to main branch, last one 2 years ago
What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]
Created 2019-06-11
26 commits to master branch, last one 4 months ago
5
60
unknown
5
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation. (CVPR 2023)
Created 2023-01-29
19 commits to main branch, last one 15 days ago
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
Created 2021-02-10
57 commits to main branch, last one 3 years ago
A gradio based image captioning tool that uses the GPT-4-Vision API to generate detailed descriptions of images.
Created 2023-12-19
17 commits to main branch, last one 4 months ago
Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.
Created 2022-09-20
19 commits to main branch, last one about a month ago
Using LLMs and pre-trained caption models for super-human performance on image captioning.
Created 2022-12-14
50 commits to main branch, last one about a year ago
Sample app to display live captioning to a WebRTC video session with the Deepgram API.
Created 2021-04-07
46 commits to main branch, last one 3 years ago
3
34
apache-2.0
3
[CVPR 2022] X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning
Created 2022-03-08
14 commits to main branch, last one 2 years ago