1 result found Sort:
Image Captioning Vision Transformers (ViTs) are transformer models that generate descriptive captions for images by combining the power of Transformers and computer vision. It leverages state-of-the-a...
Created
2023-06-18
22 commits to main branch, last one about a month ago