Statistics for topic image-captioning
RepositoryStats tracks 639,258 Github repositories, of these 83 are tagged with the image-captioning topic. The most common primary language for repositories using this topic is Python (50). Other languages include: Jupyter Notebook (20)
Stargazers over time for topic image-captioning
Most starred repositories for topic image-captioning (view more)
Trending repositories for topic image-captioning (view more)
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
LAVIS - A One-stop Library for Language-Vision Intelligence
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"
Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
LAVIS - A One-stop Library for Language-Vision Intelligence
LAVIS - A One-stop Library for Language-Vision Intelligence
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
:fire: :fire: :fire: A paper list of some recent Computer Vision(CV) works
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
:fire: :fire: :fire: A paper list of some recent Computer Vision(CV) works
Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"
a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
:fire: :fire: :fire: A paper list of some recent Computer Vision(CV) works
LAVIS - A One-stop Library for Language-Vision Intelligence
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/space...
:fire: :fire: :fire: A paper list of some recent Computer Vision(CV) works
AI VTuber with LLM, ASR, TTS, OCR, CV and more technologies to live stream or play Minecraft with you.
Image Captioning Vision Transformers (ViTs) are transformer models that generate descriptive captions for images by combining the power of Transformers and computer vision. It leverages state-of-the-a...
Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"
:fire: :fire: :fire: A paper list of some recent Computer Vision(CV) works
LAVIS - A One-stop Library for Language-Vision Intelligence
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
:fire: :fire: :fire: A paper list of some recent Computer Vision(CV) works
AI VTuber with LLM, ASR, TTS, OCR, CV and more technologies to live stream or play Minecraft with you.
[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.