Trending repositories for topic image-captioning

Last 3 days (new repositories)

no newly created repositories trending in the last 3 days

Last 3 days (absolute gain)

salesforce/BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

4,929 (+9)

bsd-3-clause

salesforce/LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

10,109 (+8)

bsd-3-clause

jhc13/taggui

Tag manager and captioner for image datasets

815 (+3)

gpl-3.0

OFA-Sys/OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

2,442 (+2)

apache-2.0

li-xirong/coco-cn

Enriching MS-COCO with Chinese sentences and tags for cross-lingual multimedia tasks

185 (+2)

mit

SkalskiP/awesome-foundation-and-multimodal-models

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

594 (+1)

Last 3 days (relative gain)

li-xirong/coco-cn

Enriching MS-COCO with Chinese sentences and tags for cross-lingual multimedia tasks

185 (+1%)

mit

jhc13/taggui

Tag manager and captioner for image datasets

815 (+0.4%)

gpl-3.0

salesforce/BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

4,929 (+0.2%)

bsd-3-clause

SkalskiP/awesome-foundation-and-multimodal-models

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

594 (+0.2%)

OFA-Sys/OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

2,442 (+0.1%)

apache-2.0

salesforce/LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

10,109 (+0.1%)

bsd-3-clause

Last week (new repositories)

no newly created repositories trending in the last week

Last week (absolute gain)

salesforce/LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

10,109 (+32)

bsd-3-clause

salesforce/BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

4,929 (+17)

bsd-3-clause

jhc13/taggui

Tag manager and captioner for image datasets

815 (+8)

gpl-3.0

OFA-Sys/OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

2,442 (+6)

apache-2.0

gokayfem/ComfyUI_VLM_nodes

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

436 (+3)

apache-2.0

SkalskiP/awesome-foundation-and-multimodal-models

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

594 (+2)

li-xirong/coco-cn

Enriching MS-COCO with Chinese sentences and tags for cross-lingual multimedia tasks

185 (+2)

mit

Markin-Wang/awesome_radiology_report_generation

Awesome radiology report generation and image captioning papers.

64 (+1)

iOPENCap/awesome-remote-image-captioning

A list of awesome remote sensing image captioning resources

96 (+1)

cc0-1.0

ZexinYan/Medical-Report-Generation

A pytorch implementation of On the Automatic Generation of Medical Imaging Reports.

204 (+1)

saahiluppal/catr

Image Captioning Using Transformer

260 (+1)

apache-2.0

scopeInfinity/Video2Description

Video to Text: Natural language description generator for some given video. [Video Captioning]

339 (+1)

apache-2.0

NVlabs/prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

1,303 (+1)

OpenGVLab/InternGPT

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, ...

3,215 (+1)

apache-2.0

Last week (relative gain)

Markin-Wang/awesome_radiology_report_generation

Awesome radiology report generation and image captioning papers.

64 (+2%)

li-xirong/coco-cn

Enriching MS-COCO with Chinese sentences and tags for cross-lingual multimedia tasks

185 (+1%)

mit

iOPENCap/awesome-remote-image-captioning

A list of awesome remote sensing image captioning resources

96 (+1%)

cc0-1.0

jhc13/taggui

Tag manager and captioner for image datasets

815 (+1.0%)

gpl-3.0

gokayfem/ComfyUI_VLM_nodes

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

436 (+0.7%)

apache-2.0

ZexinYan/Medical-Report-Generation

A pytorch implementation of On the Automatic Generation of Medical Imaging Reports.

204 (+0.5%)

saahiluppal/catr

Image Captioning Using Transformer

260 (+0.4%)

apache-2.0

salesforce/BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

4,929 (+0.3%)

bsd-3-clause

SkalskiP/awesome-foundation-and-multimodal-models

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

594 (+0.3%)

salesforce/LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

10,109 (+0.3%)

bsd-3-clause

scopeInfinity/Video2Description

Video to Text: Natural language description generator for some given video. [Video Captioning]

339 (+0.3%)

apache-2.0

OFA-Sys/OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

2,442 (+0.2%)

apache-2.0

NVlabs/prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

1,303 (+0.1%)

OpenGVLab/InternGPT

3,215 (+0.0%)

apache-2.0

Last month (new repositories)

no newly created repositories trending in the last month

Last month (absolute gain)

salesforce/LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

10,109 (+116)

bsd-3-clause

salesforce/BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

4,929 (+74)

bsd-3-clause

jhc13/taggui

Tag manager and captioner for image datasets

815 (+43)

gpl-3.0

sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning

Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning

2,807 (+19)

mit

OFA-Sys/OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

2,442 (+15)

apache-2.0

gokayfem/ComfyUI_VLM_nodes

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

436 (+14)

apache-2.0

HanXinzi-AI/awesome-computer-vision-resources

a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。

209 (+11)

SkalskiP/awesome-foundation-and-multimodal-models

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

594 (+11)

ttengwang/Caption-Anything

Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/space...

1,696 (+8)

bsd-3-clause

OpenGVLab/InternGPT

3,215 (+8)

apache-2.0

Markin-Wang/awesome_radiology_report_generation

Awesome radiology report generation and image captioning papers.

64 (+5)

NVlabs/prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

1,303 (+5)

iOPENCap/awesome-remote-image-captioning

A list of awesome remote sensing image captioning resources

96 (+4)

cc0-1.0

saahiluppal/catr

Image Captioning Using Transformer

260 (+4)

apache-2.0

kuanghuei/SCAN

PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)

554 (+4)

apache-2.0

peteanderson80/bottom-up-attention

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

1,439 (+4)

mit

li-xirong/coco-cn

Enriching MS-COCO with Chinese sentences and tags for cross-lingual multimedia tasks

185 (+4)

mit

Jiaxuan-Li/EVCap

[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension

40 (+3)

aimagelab/meshed-memory-transformer

Meshed-Memory Transformer for Image Captioning. CVPR 2020

524 (+3)

bsd-3-clause

peteanderson80/SPICE

Semantic Propositional Image Caption Evaluation

140 (+2)

agpl-3.0

Last month (relative gain)

Markin-Wang/awesome_radiology_report_generation

Awesome radiology report generation and image captioning papers.

64 (+8%)

Jiaxuan-Li/EVCap

[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension

40 (+8%)

jhc13/taggui

Tag manager and captioner for image datasets

815 (+6%)

gpl-3.0

HanXinzi-AI/awesome-computer-vision-resources

a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。

209 (+6%)

iOPENCap/awesome-remote-image-captioning

A list of awesome remote sensing image captioning resources

96 (+4%)

cc0-1.0

santoshlite/ByteDetective

The easiest way to search for images on your desktop 🔎

29 (+4%)

gokayfem/ComfyUI_VLM_nodes

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

436 (+3%)

apache-2.0

shreydan/VisionGPT2

Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.

32 (+3%)

ProGamerGov/VLM-Captioning-Tools

Python scripts to use for captioning images with VLMs

36 (+3%)

mit

li-xirong/coco-cn

Enriching MS-COCO with Chinese sentences and tags for cross-lingual multimedia tasks

185 (+2%)

mit

SkalskiP/awesome-foundation-and-multimodal-models

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

594 (+2%)

zarzouram/image_captioning_with_transformers

Pytorch implementation of image captioning using transformer-based model.

63 (+2%)

mit

saahiluppal/catr

Image Captioning Using Transformer

260 (+2%)

apache-2.0

salesforce/BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

4,929 (+2%)

bsd-3-clause

peteanderson80/SPICE

Semantic Propositional Image Caption Evaluation

140 (+1%)

agpl-3.0

zchoi/S2-Transformer

[IJCAI 2022] Official Pytorch code for paper “S2 Transformer for Image Captioning”

83 (+1%)

mit

salesforce/LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

10,109 (+1%)

bsd-3-clause

ZexinYan/Medical-Report-Generation

A pytorch implementation of On the Automatic Generation of Medical Imaging Reports.

204 (+1.0%)

jmisilo/clip-gpt-captioning

CLIPxGPT Captioner is Image Captioning Model based on OpenAI's CLIP and GPT-2.

113 (+0.9%)

mit

yxuansu/MAGIC

Language Models Can See: Plugging Visual Controls in Text Generation

256 (+0.8%)

Last 12-months (new repositories)

gokayfem/ComfyUI_VLM_nodes

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

436

apache-2.0

google/imageinwords

Data release for the ImageInWords (IIW) paper.

204

ProGamerGov/VLM-Captioning-Tools

Python scripts to use for captioning images with VLMs

mit

Last 12-months (absolute gain)

salesforce/LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

10,109 (+2,388)

bsd-3-clause

salesforce/BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

4,929 (+1,104)

bsd-3-clause

jhc13/taggui

Tag manager and captioner for image datasets

815 (+738)

gpl-3.0

gokayfem/ComfyUI_VLM_nodes

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

436 (+435)

apache-2.0

SkalskiP/awesome-foundation-and-multimodal-models

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

594 (+314)

sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning

Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning

2,807 (+227)

mit

OFA-Sys/OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

2,442 (+222)

apache-2.0

OpenGVLab/InternGPT

3,215 (+198)

apache-2.0

ttengwang/Caption-Anything

1,696 (+192)

bsd-3-clause

google/imageinwords

Data release for the ImageInWords (IIW) paper.

204 (+173)

HanXinzi-AI/awesome-computer-vision-resources

a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。

209 (+139)

scopeInfinity/Video2Description

Video to Text: Natural language description generator for some given video. [Video Captioning]

339 (+62)

apache-2.0

NVlabs/prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

1,303 (+60)

peteanderson80/bottom-up-attention

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

1,439 (+54)

mit

kuanghuei/SCAN

PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)

554 (+52)

apache-2.0

Markin-Wang/awesome_radiology_report_generation

Awesome radiology report generation and image captioning papers.

64 (+51)

iOPENCap/awesome-remote-image-captioning

A list of awesome remote sensing image captioning resources

96 (+44)

cc0-1.0

YehLi/xmodaler

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense r...

1,031 (+43)

imaginary-cloud/CameraManager

Simple Swift class to provide all the configurations you need to create custom camera view in your app

1,378 (+41)

mit

jmisilo/clip-gpt-captioning

CLIPxGPT Captioner is Image Captioning Model based on OpenAI's CLIP and GPT-2.

113 (+40)

mit

Last 12-months (relative gain)

jhc13/taggui

Tag manager and captioner for image datasets

815 (+958%)

gpl-3.0

google/imageinwords

Data release for the ImageInWords (IIW) paper.

204 (+558%)

Jiaxuan-Li/EVCap

[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension

40 (+471%)

Markin-Wang/awesome_radiology_report_generation

Awesome radiology report generation and image captioning papers.

64 (+392%)

bhushan2311/image_caption_generator

An Image captioning web application combines the power of React.js for front-end, Flask and Node.js for back-end, utilizing the MERN stack. Users can upload images and instantly receive automatic capt...

33 (+313%)

HanXinzi-AI/awesome-computer-vision-resources

a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。

209 (+199%)

inuwamobarak/Image-captioning-ViT

Image Captioning Vision Transformers (ViTs) are transformer models that generate descriptive captions for images by combining the power of Transformers and computer vision. It leverages state-of-the-a...

28 (+155%)

SkalskiP/awesome-foundation-and-multimodal-models

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

594 (+112%)