Description
DeciDiffusion 2.0 is a 732 million parameter text-to-image latent diffusion model.
Publishers
Deci AI Team
Submitted Version
January 17, 2024
Latest Version
N/A
Size
N/A
DeciDiffusion 2.0 is a 732 million parameter text-to-image latent diffusion model. Advanced training techniques were used to speed up training, improve training performance, and achieve better inference quality.The model’s unique architecture was generated by Auto-NAC, Deci’s advanced Neural Architecture Search engine, to run optimally on cost-efficient hardware such as the Qualcomm AI100 and generate images in under a second.
DeciDiffusion 2.0, a state-of-the-art diffusion-based text-to-image generation model, builds upon the core architecture of Stable Diffusion. It incorporates key elements like the Variational Autoencoder (VAE) and the pre-trained Text Encoder CLIP. A standout feature of DeciDiffusion is its U-Net component, which is optimized for performance on cost-effective hardware.
DeciDiffusion’s AutoNAC-generated U-Net-NAS features 525 million parameters as opposed to the 860 in Stable Diffusion 1.5’s U-Net. This optimized design significantly enhances processing speed, making DeciDiffusion a highly efficient and effective solution in the realm of text-to-image generation.
Misuse, Malicious Use, and Out-of-Scope Use
The model must not be employed to deliberately produce or spread images that foster hostile or unwelcoming settings for individuals. This encompasses generating visuals that might be predictably offensive, upsetting, disturbing, distressing, or inappropriate, as well as content that perpetuates existing or historical biases and streotypes.
The model isn’t designed to produce accurate or truthful depictions of people or events. Thus, using it for such purposes exceeds its intended capabilities.
Using the model to produce content that harms or maligns individuals is a misuse of this model Such misuses include, but aren’t limited to:
The model has certain limitations and may not function optimally in the following scenarios:
The remarkable abilities of image generation models can unintentionally amplify societal biases. DeciDiffusion 2.0 was mainly trained on subsets of LAION-v2, focused on English descriptions. Consequently, non-English communities and cultures might be underrepresented, leading to a bias towards white and western norms. Outputs from non-English prompts are notably less accurate. Given these biases, users should approach DeciDiffusion with discretion, regardless of input.
Training Procedure
The model was trained in 4 phases:
DeciDiffusion 2.0 marks a significant advancement over previous latent diffusion models, particularly in terms of sample efficiency. This means it can produce high-quality images with fewer diffusion timesteps during the inference process. To attain such efficiency, Deci has refined the DPM++ scheduler, effectively cutting down the number of steps needed to generate a quality image from 16 to just 10.
Additionally, the following training techniques were used to improve the model’s sample efficiency:
The following tables provide an image latency comparison between DeciDiffusion 2.0 and Stable Diffusion v1.5.
DeciDiffusion 2.0 vs. Stable Diffusion v1.5 at FP16 precision:
Implementation + Iterations | DeciDiffusion 2.0 on AI 100 (seconds/image) | Stable Diffusion v1.5 on A10 (seconds/image) |
Compiled 16 iterations | 1.335 | 2.478 |
Compiled 10 iterations | 0.971 | 1.684 |
You can use the DeciDiffusion model to do text generation. Below, see how you can easily load the DeciDiffusion model.
# pip install diffusers transformers torch from diffusers import StableDiffusionPipeline import torch device = 'cuda' if torch.cuda.is_available() else 'cpu' checkpoint = "Deci/DeciDiffusion-v2-0" pipeline = StableDiffusionPipeline.from_pretrained(checkpoint, custom_pipeline=checkpoint, torch_dtype=torch.float16) pipeline.unet = pipeline.unet.from_pretrained(checkpoint, subfolder='flexible_unet', torch_dtype=torch.float16) pipeline = pipeline.to(device) img = pipeline(prompt=['A photo of an astronaut riding a horse on Mars']).images[0]
How to Cite
Please cite this model using this format:
@misc{DeciFoundationModels, title = {DeciDiffusion 2.0}, author = {DeciAI Research Team}, year = {2024} url={[https://huggingface.co/deci/decidiffusion-v2-0](https://huggingface.co/deci/decidiffusion-v2-0)}, }
We’d love your feedback on the information presented in this card. Please also share any unexpected results.
For a short meeting with the SuperGradients team, use this link and choose your preferred time.
Deci is ISO 27001
Certified
from transformers import AutoFeatureExtractor, AutoModelForImageClassification
extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50")
model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")