1 result found Sort:

Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.
Created 2021-03-03
16 commits to main branch, last one about a year ago