8 results found Sort:

270
3.0k
bsd-3-clause
33
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Created 2023-05-06
145 commits to main branch, last one 9 months ago
21
308
apache-2.0
5
[ICRA 2024] Chat with NeRF enables users to interact with a NeRF model by typing in natural language.
Created 2023-04-27
137 commits to main branch, last one 11 months ago
28
255
bsd-3-clause
8
(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
Created 2023-08-02
26 commits to main branch, last one 11 months ago
4
83
unknown
5
Official code base for NeuroClips
Created 2024-05-15
135 commits to main branch, last one about a month ago
Automate Fashion Image Captioning using BLIP-2. Automatic generating descriptions of clothes on shopping websites, which can help customers without fashion knowledge to better understand the features ...
Created 2023-05-23
16 commits to master branch, last one about a year ago
Implementation of Qformer from BLIP2 in Zeta Lego blocks.
Created 2023-12-29
28 commits to main branch, last one about a year ago
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
Created 2023-11-10
13 commits to main branch, last one 11 months ago
[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives
Created 2024-04-12
12 commits to master branch, last one 4 months ago