Trending repositories for topic diffusion-models
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
HunyuanVideo: A Systematic Framework For Large Video Generation Model
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultr...
《Pytorch实用教程》(第二版)无论是零基础入门,还是CV、NLP、LLM项目应用,或是进阶工程化部署落地,在这里都有。相信在本书的帮助下,读者将能够轻松掌握 PyTorch 的使用,成为一名优秀的深度学习工程师。
HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo
[ICLR24] Official implementation of the paper “MagicDrive: Street View Generation with Diverse 3D Geometry Control”
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
A collection of resources and papers on Diffusion Models
[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
A curated list of recent diffusion models for video generation, editing, and various other applications.
[ICLR 2025 Oral] The official implementation of "Diffusion-Based Planning for Autonomous Driving with Flexible Guidance"
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
collection of diffusion model papers categorized by their subareas
Official implementation of the paper “MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control”
[ICLR 2025] Autoregressive Video Generation without Vector Quantization
The collection of awesome papers on alignment of diffusion models.
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
ACTalker: an end-to-end video diffusion framework for talking head synthesis that supports both single and multi-signal control (e.g., audio, expression).
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
(CVPR 2025) Adversarial Diffusion Compression for Real-World Image Super-Resolution [PyTorch]
The official implementation of "Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models"
A curated list of recent style transfer methods with diffusion models
The collection of awesome papers on alignment of diffusion models.
Collection of tutorials on diffusion models, step-by-step implementation guide, scripts for generating images with AI, prompt engineering guide, and resources for further learning.
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling.
(TPAMI 2025) Invertible Diffusion Models for Compressed Sensing [PyTorch]
[ICLR 2025 Oral] The official implementation of "Diffusion-Based Planning for Autonomous Driving with Flexible Guidance"
[CVPR 2025] FaithDiff for Classic Film Rejuvenation, Old Photo Revival, Social Media Restoration, Image Enhancement and AIGC Enhancement.
Official implementation of the paper “MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control”
Unofficial implementation of "Simplifying, Stabilizing & Scaling Continuous-Time Consistency Models" for MNIST
[ICLR 2025] Autoregressive Video Generation without Vector Quantization
IDDM (Industrial, landscape, animate, spectrogram...), support DDPM, DDIM, PLMS, webui and distributed training. Pytorch实现扩散模型,生成模型,分布式训练
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
ACTalker: an end-to-end video diffusion framework for talking head synthesis that supports both single and multi-signal control (e.g., audio, expression).
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultr...
HunyuanVideo: A Systematic Framework For Large Video Generation Model
《Pytorch实用教程》(第二版)无论是零基础入门,还是CV、NLP、LLM项目应用,或是进阶工程化部署落地,在这里都有。相信在本书的帮助下,读者将能够轻松掌握 PyTorch 的使用,成为一名优秀的深度学习工程师。
HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
📦A portable package for running Hunyuan3D-2 on Windows. | 混元 3D 2.0 整合包
A curated list of recent diffusion models for video generation, editing, and various other applications.
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
A collection of resources and papers on Diffusion Models
Official Implementation Code of Our Paper "LightMotion: A Light and Tuning-free Method for Simulating Camera Motion in Video Generation"
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
[ICLR 2025 Oral] The official implementation of "Diffusion-Based Planning for Autonomous Driving with Flexible Guidance"
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
ACTalker: an end-to-end video diffusion framework for talking head synthesis that supports both single and multi-signal control (e.g., audio, expression).
Official Implementation Code of Our Paper "LightMotion: A Light and Tuning-free Method for Simulating Camera Motion in Video Generation"
(CVPR 2025) Adversarial Diffusion Compression for Real-World Image Super-Resolution [PyTorch]
[CVPR 2025] FaithDiff for Classic Film Rejuvenation, Old Photo Revival, Social Media Restoration, Image Enhancement and AIGC Enhancement.
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
A curated list of recent style transfer methods with diffusion models
(TPAMI 2025) Invertible Diffusion Models for Compressed Sensing [PyTorch]
The official implementation of "Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models"
📦A portable package for running Hunyuan3D-2 on Windows. | 混元 3D 2.0 整合包
The collection of awesome papers on alignment of diffusion models.
Unofficial implementation of "Simplifying, Stabilizing & Scaling Continuous-Time Consistency Models" for MNIST
[ICLR 2025 Oral] The official implementation of "Diffusion-Based Planning for Autonomous Driving with Flexible Guidance"
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling.
Collection of tutorials on diffusion models, step-by-step implementation guide, scripts for generating images with AI, prompt engineering guide, and resources for further learning.
[ECAI 2024] Official code for "TwinDiffusion: Enhancing Coherence and Efficiency in Panoramic Image Generation with Diffusion Models".
[ICLR 2025] Autoregressive Video Generation without Vector Quantization
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
ACTalker: an end-to-end video diffusion framework for talking head synthesis that supports both single and multi-signal control (e.g., audio, expression).
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultr...
HunyuanVideo: A Systematic Framework For Large Video Generation Model
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
《Pytorch实用教程》(第二版)无论是零基础入门,还是CV、NLP、LLM项目应用,或是进阶工程化部署落地,在这里都有。相信在本书的帮助下,读者将能够轻松掌握 PyTorch 的使用,成为一名优秀的深度学习工程师。
A curated list of recent diffusion models for video generation, editing, and various other applications.
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
ACTalker: an end-to-end video diffusion framework for talking head synthesis that supports both single and multi-signal control (e.g., audio, expression).
📦A portable package for running Hunyuan3D-2 on Windows. | 混元 3D 2.0 整合包
A collection of resources and papers on Diffusion Models
Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
The official implementation of "Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models"
Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency
Official implementation of ICLR 2025 paper: "Unify ML4TSP: Drawing Methodological Principles for TSP and Beyond from Streamlined Design Space of Learning and Search".
(CVPR 2025) Adversarial Diffusion Compression for Real-World Image Super-Resolution [PyTorch]
[CVPR 2025] FaithDiff for Classic Film Rejuvenation, Old Photo Revival, Social Media Restoration, Image Enhancement and AIGC Enhancement.
[CVPR 2025] h-Edit: Effective and Flexible Diffusion-Based Editing via Doob’s h-Transform
A curated list of recent style transfer methods with diffusion models
A PyTorch implementation of diffusion models built from scratch
📦A portable package for running Hunyuan3D-2 on Windows. | 混元 3D 2.0 整合包
[ICLR 2025] VideoGrain: This repo is the official implementation of "VideoGrain: Modulating Space-Time Attention for Multi-Grained Video Editing"
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
(TPAMI 2025) Invertible Diffusion Models for Compressed Sensing [PyTorch]
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
[CVPR 2025] Official implementation of StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements
[ICLR2025] The code of Z-Sampling, proposed in our paper "Zigzag Diffusion Sampling: Diffusion Models Can Self-Improve via Self-Reflection".
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
[ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling
High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 Spotlight.
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
Official implementation of "MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling"
[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
[ICLR 2025] CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2) Parameter-Efficient Training (49.57M parameters trainable) and...
HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
[AAAI 2025]👔IMAGDressing👔: Interactive Modular Apparel Generation for Virtual Dressing. It enables customizable human image generation with flexible garment, pose, and scene control, ensuring high f...
[CVPR 2025] 3DTopia-XL: High-Quality 3D PBR Asset Generation via Primitive Diffusion
[ECCV 2024] MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model.
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
HunyuanVideo: A Systematic Framework For Large Video Generation Model
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultr...
《Pytorch实用教程》(第二版)无论是零基础入门,还是CV、NLP、LLM项目应用,或是进阶工程化部署落地,在这里都有。相信在本书的帮助下,读者将能够轻松掌握 PyTorch 的使用,成为一名优秀的深度学习工程师。
[ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
Lumina-T2X is a unified framework for Text to Any Modality Generation
A general fine-tuning kit geared toward diffusion models.
A curated list of recent diffusion models for video generation, editing, and various other applications.
SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild. Our new online demo is also released at suppixel.ai.
DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 Spotlight.
A collection of resources and papers on Diffusion Models
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Official implementation of "MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling"
[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
[ICLR 2025] CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2) Parameter-Efficient Training (49.57M parameters trainable) and...
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
[ICLR 2025] Official implementation of Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation (ECCV 2024)
[ECCV 2024] MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model.
High-quality Text-to-Audio Generation with Efficient Diffusion Transformer
📦A portable package for running Hunyuan3D-2 on Windows. | 混元 3D 2.0 整合包
Official code repository of < CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph >
📚 Collection of awesome generation acceleration resources.
Official repo for VGGHeads: 3D Multi Head Alignment with a Large-Scale Synthetic Dataset..
[Single/Sparse View-to-Scene on a 4090(24G)] VistaDream: Sampling multiview consistent images for single-view scene reconstruction
Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis (ECCV 2024 Oral) - Official Implementation
Official Code Release for [SIGGRAPH 2024] DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation
High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
Live2Diff: A Pipeline that processes Live video streams by a uni-directional video Diffusion model.
[Arxiv 2024] From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation
Official code base for paper EZIGen: Enhancing zero-shot personalized image generation with precise subject encoding and decoupled guidance
Official implementation of the paper “MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes”
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
[NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching