Search Results - RepositoryStats

1 result found Sort:

unknown

Image Captioning Vision Transformers (ViTs) are transformer models that generate descriptive captions for images by combining the power of Transformers and computer vision. It leverages state-of-the-a...

genai image analyticsvidhya image-captioning vision-transformer transformers-models

Created 2023-06-18

22 commits to main branch, last one 4 months ago