Search Results - RepositoryStats

MiniGPT-5 eric-ai-lab

52

864

apache-2.0

13

Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"

transformers multimodal-llm diffusion-models multimodal-generation

Created 2023-10-02

21 commits to main branch, last one 4 months ago

Awesome-LLMs-meet-Multimodal-Generation YingqingHe

26

452

unknown

17

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

llm aigc lvlm mllm text-to-3d multimodality text-to-audio text-to-image text-to-music text-to-sound text-to-video text-to-speech multimodal-models large-language-models multimodal-generation large-vision-language-models multimodal-large-language-models

Created 2023-11-17

357 commits to main branch, last one 7 days ago

Text2Poster-ICASSP-22 chuhaojin

18

211

mit

4

Official implementation of the ICASSP-2022 paper "Text2Poster: Laying Out Stylized Texts on Retrieved Images"

aigc pytorch deep-learning layout-design image-retrieval banner-generator image-processing object-detection poster-generation geneative-creation image-text-retrieval banner-advertisements multimodal-generation artificial-neural-networks encoder-decoder-architecture

Created 2022-09-18

61 commits to master branch, last one about a year ago

ContextDiff YangLing0818

4

67

unknown

4

[ICLR 2024] Contextualized Diffusion Models for Text-Guided Image and Video Generation

text-to-video diffusion-models multimodal-generation text-to-image-generation

Created 2024-02-06

18 commits to main branch, last one 10 months ago

HermesFlow Gen-Verse

3

54

unknown

2

HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation

image-to-text text-to-image multimodal-generation multimodal-large-language-models

Created 2025-02-11

43 commits to main branch, last one about a month ago

Awesome-Vision-to-Music-Generation wzk1015

1

50

mit

4

A curated list of vision-to-music generation: methods, datasets, evaluation and challenges.

survey image-to-music video-to-music vision-to-music music-generation multimodal-generation

Created 2025-03-24

11 commits to main branch, last one 15 days ago

UniteandConquer Nithin-GK

3

36

apache-2.0

4

[CVPR '23] Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models

imagenet multimodal ffhq-dataset plug-and-play text-to-image face-synthesis face-generation diffusion-models celeba-hq-dataset multimodal-generation semantic-segmentation text-to-image-diffusion text-to-image-synthesis multimodal-deep-learning text-to-image-generation

Created 2023-04-20

47 commits to main branch, last one about a year ago