6 results found Sort:
- Filter by Primary Language:
- Python (5)
- Jupyter Notebook (1)
- +
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Created
2023-05-06
145 commits to main branch, last one 25 days ago
Chat with NeRF enables users to interact with a NeRF model by typing in natural language.
Created
2023-04-27
137 commits to main branch, last one 2 months ago
(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
Created
2023-08-02
26 commits to main branch, last one 2 months ago
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high per...
Created
2023-07-05
675 commits to develop branch, last one 15 days ago
Automate Fashion Image Captioning using BLIP-2. Automatic generating descriptions of clothes on shopping websites, which can help customers without fashion knowledge to better understand the features ...
Created
2023-05-23
16 commits to master branch, last one 10 months ago
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
Created
2023-11-10
13 commits to main branch, last one 2 months ago