Description

DeciDiffusion 2.0 is a 732 million parameter text-to-image latent diffusion model.

Publishers
Deci AI Team

Submitted Version
January 17, 2024

Latest Version
N/A

Size
N/A 

Text to Image

Overview


DeciDiffusion 2.0 is a 732 million parameter text-to-image latent diffusion model. Advanced training techniques were used to speed up training, improve training performance, and achieve better inference quality.The model’s unique architecture was generated by Auto-NAC, Deci’s advanced Neural Architecture Search engine, to run optimally on cost-efficient hardware such as the Qualcomm AI100 and generate images in under a second.

Model Highlights

  • Task: Image generation
  • Model type: Diffusion-based text-to-image generation model
  • Languages (NLP): English

Model Architecture

DeciDiffusion 2.0, a state-of-the-art diffusion-based text-to-image generation model, builds upon the core architecture of Stable Diffusion. It incorporates key elements like the Variational Autoencoder (VAE) and the pre-trained Text Encoder CLIP. A standout feature of DeciDiffusion is its U-Net component, which is optimized for performance on cost-effective hardware.

DeciDiffusion’s AutoNAC-generated U-Net-NAS features 525 million parameters as opposed to the 860 in Stable Diffusion 1.5’s U-Net. This optimized design significantly enhances processing speed, making DeciDiffusion a highly efficient and effective solution in the realm of text-to-image generation.

Uses

Misuse, Malicious Use, and Out-of-Scope Use

The model must not be employed to deliberately produce or spread images that foster hostile or unwelcoming settings for individuals. This encompasses generating visuals that might be predictably offensive, upsetting, disturbing, distressing, or inappropriate, as well as content that perpetuates existing or historical biases and streotypes.

Out-of-Scope Use

The model isn’t designed to produce accurate or truthful depictions of people or events. Thus, using it for such purposes exceeds its intended capabilities.

Misuse and Malicious Use

Using the model to produce content that harms or maligns individuals is a misuse of this model Such misuses include, but aren’t limited to:

  • Creating offensive, degrading, or damaging portrayals of individuals, their cultures, religions, or surroundings.
  • Intentionally promoting or propagating discriminatory content or harmful stereotypes.Deliberately endorsing or disseminating prejudiced content or harmful stereotypes.
  • Deliberately endorsing or disseminating prejudiced content or harmful stereotypes.
  • Posing as someone else without their agreement.
  • Generating explicit content without the knowledge or agreement of potential viewers.
  • Mis and disinformation.
  • Distributing copyrighted or licensed content against its usage terms.
  • Sharing modified versions of copyrighted or licensed content in breach of its usage guidelines.

Limitations and Bias

Limitations

The model has certain limitations and may not function optimally in the following scenarios:

  • It doesn’t produce completely photorealistic images.
  • Rendering legible text is beyond its capability.
  • Complex compositions, like visualizing “A green sphere to the left of a blue square”, are challenging for the model.
  • Generation of faces and human figures may be imprecise.
  • It is primarily optimized for English captions and might not be as effective with other languages.
  • The autoencoding component of the model is lossy.

Bias

The remarkable abilities of image generation models can unintentionally amplify societal biases. DeciDiffusion 2.0 was mainly trained on subsets of LAION-v2, focused on English descriptions. Consequently, non-English communities and cultures might be underrepresented, leading to a bias towards white and western norms. Outputs from non-English prompts are notably less accurate. Given these biases, users should approach DeciDiffusion with discretion, regardless of input.

Share
Add Your Heading Text Here
				
					from transformers import AutoFeatureExtractor, AutoModelForImageClassification

extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50")

model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")