Search Results - RepositoryStats

270

3.0k

bsd-3-clause

33

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

blip2 llama minigpt4 multi-modal-chatgpt large-language-models cross-modal-pretraining video-language-pretraining vision-language-pretraining

Created 2023-05-06

145 commits to main branch, last one 9 months ago

chat-with-nerf sled-group

21

308

apache-2.0

5

[ICRA 2024] Chat with NeRF enables users to interact with a NeRF model by typing in natural language.

lerf nerf blip2 gpt-4 chatgpt nerfstudio

Created 2023-04-27

137 commits to main branch, last one 11 months ago

BLIVA mlpc-ucsd

28

255

bsd-3-clause

8

(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions

llm lora blip2 bliva llama chatbot multimodal instruction-tuning visual-language-learning

Created 2023-08-02

26 commits to main branch, last one 11 months ago

NeuroClips gongzix

4

83

unknown

5

Official code base for NeuroClips

fmri blip2 fmri-to-video brain-decoding videodiffusion

Created 2024-05-15

135 commits to main branch, last one about a month ago

fashion_image_caption SmithaUpadhyaya

10

57

unknown

2

Automate Fashion Image Captioning using BLIP-2. Automatic generating descriptions of clothes on shopping websites, which can help customers without fashion knowledge to better understand the features ...

blip2 image transformer huggingface-datasets image-caption-generator huggingface-transformers multimodal-deep-learning

Created 2023-05-23

16 commits to master branch, last one about a year ago

qformer kyegomez

0

36

mit

2

Implementation of Qformer from BLIP2 in Zeta Lego blocks.

ai blip2 machine multi-modal multi-modality machine-learning attention-mechanism artificial-intelligence

Created 2023-12-29

28 commits to main branch, last one about a year ago

ComCLIP eric-ai-lab

3

35

mit

2

Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"

svo clip slip blip2 causality flickr30k winoground compositionality flickr8k-dataset image-text-matching vision-and-language image-text-retrieval

Created 2023-11-10

13 commits to main branch, last one 11 months ago

SPN4CIR BUAADreamer

3

30

mit

1

[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives

blip clip blip2 llama llava acmmm2024 memory-bank transformer data-generation image-retrieval multimodal-learning cross-modal-retrieval multi-modal-retrieval composed-image-retrieval

Created 2024-04-12

12 commits to master branch, last one 4 months ago