Statistics for topic image-captioning
RepositoryStats tracks 610,147 Github repositories, of these 80 are tagged with the image-captioning topic. The most common primary language for repositories using this topic is Python (49). Other languages include: Jupyter Notebook (19)
Stargazers over time for topic image-captioning
Most starred repositories for topic image-captioning (view more)
Trending repositories for topic image-captioning (view more)
LAVIS - A One-stop Library for Language-Vision Intelligence
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Video to Text: Natural language description generator for some given video. [Video Captioning]
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/space...
Video to Text: Natural language description generator for some given video. [Video Captioning]
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/space...
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
LAVIS - A One-stop Library for Language-Vision Intelligence
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Video to Text: Natural language description generator for some given video. [Video Captioning]
Video to Text: Natural language description generator for some given video. [Video Captioning]
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
LAVIS - A One-stop Library for Language-Vision Intelligence
LAVIS - A One-stop Library for Language-Vision Intelligence
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.
[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
Awesome radiology report generation and image captioning papers.
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
LAVIS - A One-stop Library for Language-Vision Intelligence
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.
[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension