Trending repositories for topic image-captioning
LAVIS - A One-stop Library for Language-Vision Intelligence
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
Awesome radiology report generation and image captioning papers.
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/space...
Awesome radiology report generation and image captioning papers.
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
LAVIS - A One-stop Library for Language-Vision Intelligence
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/space...
LAVIS - A One-stop Library for Language-Vision Intelligence
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/space...
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
Awesome radiology report generation and image captioning papers.
a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。
PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Awesome radiology report generation and image captioning papers.
a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
LAVIS - A One-stop Library for Language-Vision Intelligence
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/space...
PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
LAVIS - A One-stop Library for Language-Vision Intelligence
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/space...
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Awesome radiology report generation and image captioning papers.
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, ...
Video to Text: Natural language description generator for some given video. [Video Captioning]
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense r...
[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
Official code repository for "Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning" (published at ICLR 2023).
A list of awesome remote sensing image captioning resources
[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.
Awesome radiology report generation and image captioning papers.
[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。
Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.
Official code repository for "Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning" (published at ICLR 2023).
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
Image Captioning Vision Transformers (ViTs) are transformer models that generate descriptive captions for images by combining the power of Transformers and computer vision. It leverages state-of-the-a...
A list of awesome remote sensing image captioning resources
An Image captioning web application combines the power of React.js for front-end, Flask and Node.js for back-end, utilizing the MERN stack. Users can upload images and instantly receive automatic capt...
[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.
Pytorch implementation of image captioning using transformer-based model.
NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Video to Text: Natural language description generator for some given video. [Video Captioning]
LAVIS - A One-stop Library for Language-Vision Intelligence
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/space...
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
LAVIS - A One-stop Library for Language-Vision Intelligence
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, ...
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/space...
a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Video to Text: Natural language description generator for some given video. [Video Captioning]
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense r...
Awesome radiology report generation and image captioning papers.
Simple Swift class to provide all the configurations you need to create custom camera view in your app
A list of awesome remote sensing image captioning resources
[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.
An Image captioning web application combines the power of React.js for front-end, Flask and Node.js for back-end, utilizing the MERN stack. Users can upload images and instantly receive automatic capt...
Awesome radiology report generation and image captioning papers.
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
Image Captioning Vision Transformers (ViTs) are transformer models that generate descriptive captions for images by combining the power of Transformers and computer vision. It leverages state-of-the-a...
a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。
NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference
A list of awesome remote sensing image captioning resources
Transformer & CNN Image Captioning model in PyTorch.
Pytorch implementation of image captioning using transformer-based model.
[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.
[ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.
Improving Chest X-Ray Report Generation by Leveraging Warm-Starting
CLIPxGPT Captioner is Image Captioning Model based on OpenAI's CLIP and GPT-2.
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)
Official code repository for "Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning" (published at ICLR 2023).
LAVIS - A One-stop Library for Language-Vision Intelligence