Trending repositories for topic text-to-image
🔥 InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
Liquid: Language Models are Scalable and Unified Multi-modal Generators
A python tool that uses GPT-4, FFmpeg, and OpenCV to automatically analyze videos, extract the most interesting sections, and crop them for an improved viewing experience.
[TMLR 2025🔥] A survey for the autoregressive models in vision.
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high per...
FLUX, Stable Diffusion, SDXL, SD3, LoRA, Fine Tuning, DreamBooth, Training, Automatic1111, Forge WebUI, SwarmUI, DeepFake, TTS, Animation, Text To Video, Tutorials, Guides, Lectures, Courses, ComfyUI,...
The official implementation of "Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models"
This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
SynCD: Generating Multi-Image Synthetic Data for Text-to-Image Customization
A curated list of Generative AI tools, works, models, and references
🧠 世界上覆盖最全的优秀Qwen提示语大全,欢迎贡献你的提示词。🧠 The most comprehensive collection of excellent Qwen prompts in the world. Feel free to contribute your own prompts!
🚀 Cross attention map tools for huggingface/diffusers
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
🔥 InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
The official implementation of "Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models"
Liquid: Language Models are Scalable and Unified Multi-modal Generators
[TMLR 2025🔥] A survey for the autoregressive models in vision.
SynCD: Generating Multi-Image Synthetic Data for Text-to-Image Customization
A curated list of recent style transfer methods with diffusion models
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high per...
TrustEval: A modular and extensible toolkit for comprehensive trust evaluation of generative foundation models (GenFMs)
🧠 世界上覆盖最全的优秀Qwen提示语大全,欢迎贡献你的提示词。🧠 The most comprehensive collection of excellent Qwen prompts in the world. Feel free to contribute your own prompts!
🚀 Cross attention map tools for huggingface/diffusers
A python tool that uses GPT-4, FFmpeg, and OpenCV to automatically analyze videos, extract the most interesting sections, and crop them for an improved viewing experience.
[CCS'24] SafeGen: Mitigating Unsafe Content Generation in Text-to-Image Models
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
A collection of awesome text-to-image generation studies.
🔥 InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
Liquid: Language Models are Scalable and Unified Multi-modal Generators
The official implementation of "Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models"
A python tool that uses GPT-4, FFmpeg, and OpenCV to automatically analyze videos, extract the most interesting sections, and crop them for an improved viewing experience.
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high per...
Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
[TMLR 2025🔥] A survey for the autoregressive models in vision.
FLUX, Stable Diffusion, SDXL, SD3, LoRA, Fine Tuning, DreamBooth, Training, Automatic1111, Forge WebUI, SwarmUI, DeepFake, TTS, Animation, Text To Video, Tutorials, Guides, Lectures, Courses, ComfyUI,...
This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc
A curated list of Generative AI tools, works, models, and references
A collection of awesome text-to-image generation studies.
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
[ECCV 2024] The official implementation of paper "BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion"
🔥 InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
The official implementation of "Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models"
Liquid: Language Models are Scalable and Unified Multi-modal Generators
A curated list of recent style transfer methods with diffusion models
Create and customize your AI influencer open-source
[CVPR 2025] Official implementation of StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements
SynCD: Generating Multi-Image Synthetic Data for Text-to-Image Customization
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high per...
[CVPR 2024] InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization
TrustEval: A modular and extensible toolkit for comprehensive trust evaluation of generative foundation models (GenFMs)
🚀 Cross attention map tools for huggingface/diffusers
A curated list of awesome stable diffusion resources 🌟
🧠 世界上覆盖最全的优秀Qwen提示语大全,欢迎贡献你的提示词。🧠 The most comprehensive collection of excellent Qwen prompts in the world. Feel free to contribute your own prompts!
AI Plugin is a powerful extension for the Payload CMS, integrating advanced AI capabilities to enhance content creation and management.
Generate a video script, voice and a talking face completely with AI
The official implementation of "Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models"
Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny modality gap ensues!
🔥 InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
Liquid: Language Models are Scalable and Unified Multi-modal Generators
A python tool that uses GPT-4, FFmpeg, and OpenCV to automatically analyze videos, extract the most interesting sections, and crop them for an improved viewing experience.
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high per...
SynCD: Generating Multi-Image Synthetic Data for Text-to-Image Customization
This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc
The official implementation of "Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models"
FLUX, Stable Diffusion, SDXL, SD3, LoRA, Fine Tuning, DreamBooth, Training, Automatic1111, Forge WebUI, SwarmUI, DeepFake, TTS, Animation, Text To Video, Tutorials, Guides, Lectures, Courses, ComfyUI,...
Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
🔥ICLR 2025 (Spotlight) One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt
[TMLR 2025🔥] A survey for the autoregressive models in vision.
A curated list of Generative AI tools, works, models, and references
Official implementation for "Stable Flow: Vital Layers for Training-Free Image Editing" [CVPR 2025]
A collection of awesome text-to-image generation studies.
[ECCV 2024] The official implementation of paper "BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion"
Diffusion model papers, survey, and taxonomy
📚 Collection of awesome generation acceleration resources.
🔥 InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
A Unified Tokenizer for Visual Generation and Understanding
The official implementation of "Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models"
Liquid: Language Models are Scalable and Unified Multi-modal Generators
Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny modality gap ensues!
SynCD: Generating Multi-Image Synthetic Data for Text-to-Image Customization
A curated list of recent style transfer methods with diffusion models
Create and customize your AI influencer open-source
[CVPR 2025] Official implementation of StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements
🔥ICLR 2025 (Spotlight) One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt
TrustEval: A modular and extensible toolkit for comprehensive trust evaluation of generative foundation models (GenFMs)
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high per...
📚 Collection of awesome generation acceleration resources.
Official implementation for "Stable Flow: Vital Layers for Training-Free Image Editing" [CVPR 2025]
AI Plugin is a powerful extension for the Payload CMS, integrating advanced AI capabilities to enhance content creation and management.
[CVPR 2024] InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization
A python tool that uses GPT-4, FFmpeg, and OpenCV to automatically analyze videos, extract the most interesting sections, and crop them for an improved viewing experience.
🔥 InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
⚡ Flash Diffusion ⚡: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation (AAAI 2025 Oral)
[TMLR 2025🔥] A survey for the autoregressive models in vision.
StyleShot: A SnapShot on Any Style. 一款可以迁移任意风格到任意内容的模型,无需针对图片微调,即能生成高质量的个性风格化图片!
End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training).
Official implementation for "Stable Flow: Vital Layers for Training-Free Image Editing" [CVPR 2025]
Liquid: Language Models are Scalable and Unified Multi-modal Generators
[ICLR 2025] Official Implementation of Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
Generate a video script, voice and a talking face completely with AI
An 8-step inversion and 8-step editing process works effectively with the FLUX-dev model. (3x speedup with results that are comparable or even superior to baseline methods)
🔥ICLR 2025 (Spotlight) One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt
多模型同时对话、文生图,纯前端。Multi-model simultaneous chat、text-to-image generation, all done through pure front-end (API mode, no server-side needed).
🧠 世界上覆盖最全的优秀Qwen提示语大全,欢迎贡献你的提示词。🧠 The most comprehensive collection of excellent Qwen prompts in the world. Feel free to contribute your own prompts!
A python tool that uses GPT-4, FFmpeg, and OpenCV to automatically analyze videos, extract the most interesting sections, and crop them for an improved viewing experience.
[ECCV 2024] The official implementation of paper "BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion"
🔥 InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc
Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
A curated list of Generative AI tools, works, models, and references
FLUX, Stable Diffusion, SDXL, SD3, LoRA, Fine Tuning, DreamBooth, Training, Automatic1111, Forge WebUI, SwarmUI, DeepFake, TTS, Animation, Text To Video, Tutorials, Guides, Lectures, Courses, ComfyUI,...
A microframework on top of PyTorch with first-class citizen APIs for foundation model adaptation
⚡ Flash Diffusion ⚡: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation (AAAI 2025 Oral)
Diffusion model papers, survey, and taxonomy
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch
A collection of awesome text-to-image generation studies.
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
[TMLR 2025🔥] A survey for the autoregressive models in vision.
A collection of resources on controllable generation with text-to-image diffusion models.
A python tool that uses GPT-4, FFmpeg, and OpenCV to automatically analyze videos, extract the most interesting sections, and crop them for an improved viewing experience.
🔥 InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
StyleShot: A SnapShot on Any Style. 一款可以迁移任意风格到任意内容的模型,无需针对图片微调,即能生成高质量的个性风格化图片!
Generate a video script, voice and a talking face completely with AI
📚 Collection of awesome generation acceleration resources.
🔥ICLR 2025 (Spotlight) One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt
Create and customize your AI influencer open-source
A Unified Tokenizer for Visual Generation and Understanding
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
🧠 世界上覆盖最全的优秀Qwen提示语大全,欢迎贡献你的提示词。🧠 The most comprehensive collection of excellent Qwen prompts in the world. Feel free to contribute your own prompts!
A versatile multi-modal chat application that enables users to develop custom agents, create images, leverage visual recognition, and engage in voice interactions. It integrates seamlessly with local ...
End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training).
[CCS'24] SafeGen: Mitigating Unsafe Content Generation in Text-to-Image Models
Official repository for "CFG++: manifold-constrained classifier free guidance for diffusion models" (ICLR2025)
[ECCV 2024] The official implementation of paper "BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion"
CustomDiffusion360: Customizing Text-to-Image Diffusion with Camera Viewpoint Control