DeciDiffusion 1.0 is an 820 million parameter text-to-image latent diffusion model trained on the LAION-v2 dataset and fine-tuned on the LAION-ART dataset. 

Deci AI Team

Submitted Version
September 13, 2023

Latest Version


Text to Image


DeciDiffusion 1.0 is an 820 million parameter text-to-image latent diffusion model trained on the LAION-v2 dataset and fine-tuned on the LAION-ART dataset. Advanced training techniques were used to speed up training, improve training performance, and achieve better inference quality.

Model Highlights

  • Task: Image generation
  • Model type: Diffusion-based text-to-image generation model
  • Languages (NLP): English
  • Dataset: Trained on the LAION and LAION-ART datasets

Model Architecture

DeciDiffusion 1.0 is a diffusion-based text-to-image generation model. While it maintains foundational architecture elements from Stable Diffusion, such as the Variational Autoencoder (VAE) and CLIP’s pre-trained Text Encoder, DeciDiffusion introduces significant enhancements. The primary innovation is the substitution of U-Net with the more efficient U-Net-NAS, a design pioneered by Deci. This novel component streamlines the model by reducing the number of parameters, leading to superior computational efficiency.


Misuse, Malicious Use, and Out-of-Scope Use

The model must not be employed to deliberately produce or spread images that foster hostile or unwelcoming settings for individuals. This encompasses generating visuals that might be predictably upsetting, distressing, or inappropriate, as well as content that perpetuates existing or historical biases.

Out-of-Scope Use

The model isn’t designed to produce accurate or truthful depictions of people or events. Thus, using it for such purposes exceeds its intended capabilities.

Misuse and Malicious Use

Misusing the model to produce content that harms or maligns individuals is strictly discouraged. Such misuses include, but aren’t limited to:

  • Creating offensive, degrading, or damaging portrayals of individuals, their cultures, religions, or surroundings.
  • Intentionally promoting or propagating discriminatory content or harmful stereotypes.Deliberately endorsing or disseminating prejudiced content or harmful stereotypes.
  • Deliberately endorsing or disseminating prejudiced content or harmful stereotypes.
  • Posing as someone else without their agreement.
  • Generating explicit content without the knowledge or agreement of potential viewers.
  • Distributing copyrighted or licensed content against its usage terms.
  • Sharing modified versions of copyrighted or licensed content in breach of its usage guidelines.

Limitations and Bias


The model has certain limitations and may not function optimally in the following scenarios:

  • It doesn’t produce completely photorealistic images.
  • Rendering legible text is beyond its capability.
  • Complex compositions, like visualizing “A green sphere to the left of a blue square”, are challenging for the model.
  • Generation of faces and human figures may be imprecise.
  • It is primarily optimized for English captions and might not be as effective with other languages.
  • The autoencoding component of the model is lossy.


The remarkable abilities of image generation models can unintentionally amplify societal biases. DeciDiffusion was mainly trained on subsets of LAION-v2, focused on English descriptions. Consequently, non-English communities and cultures might be underrepresented, leading to a bias towards white and western norms. Outputs from non-English prompts are notably less accurate. Given these biases, users should approach DeciDiffusion with discretion, regardless of input.

Add Your Heading Text Here
					from transformers import AutoFeatureExtractor, AutoModelForImageClassification

extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50")

model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")