DeciDiffusion 1.0 is an 820 million parameter text-to-image latent diffusion model trained on the LAION-v2 dataset and fine-tuned on the LAION-ART dataset.
Deci AI Team
September 13, 2023
DeciDiffusion 1.0 is an 820 million parameter text-to-image latent diffusion model trained on the LAION-v2 dataset and fine-tuned on the LAION-ART dataset. Advanced training techniques were used to speed up training, improve training performance, and achieve better inference quality.
DeciDiffusion 1.0 is a diffusion-based text-to-image generation model. While it maintains foundational architecture elements from Stable Diffusion, such as the Variational Autoencoder (VAE) and CLIP’s pre-trained Text Encoder, DeciDiffusion introduces significant enhancements. The primary innovation is the substitution of U-Net with the more efficient U-Net-NAS, a design pioneered by Deci. This novel component streamlines the model by reducing the number of parameters, leading to superior computational efficiency.
Misuse, Malicious Use, and Out-of-Scope Use
The model must not be employed to deliberately produce or spread images that foster hostile or unwelcoming settings for individuals. This encompasses generating visuals that might be predictably upsetting, distressing, or inappropriate, as well as content that perpetuates existing or historical biases.
The model isn’t designed to produce accurate or truthful depictions of people or events. Thus, using it for such purposes exceeds its intended capabilities.
Misusing the model to produce content that harms or maligns individuals is strictly discouraged. Such misuses include, but aren’t limited to:
The model has certain limitations and may not function optimally in the following scenarios:
The remarkable abilities of image generation models can unintentionally amplify societal biases. DeciDiffusion was mainly trained on subsets of LAION-v2, focused on English descriptions. Consequently, non-English communities and cultures might be underrepresented, leading to a bias towards white and western norms. Outputs from non-English prompts are notably less accurate. Given these biases, users should approach DeciDiffusion with discretion, regardless of input.
from transformers import AutoFeatureExtractor, AutoModelForImageClassification
extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50")
model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")