Description

DeciLM 6B is a 5.7 billion parameter decoder-only text generation model. With a context window of 4096 tokens, the highly efficient model uses variable Grouped-Query Attention (GQA) to achieve an optimal balance between performance and computational efficiency. The model’s architecture was generated using Deci’s proprietary Neural Architecture Search-based technology, AutoNAC.

Publishers
Deci AI Team

Submitted Version
September 13, 2023

Latest Version
N/A 

Size
N/A 

Text to Text

Overview



Deci developed and publically released the DeciLM 6B large language model, a pretrained, high-efficiency generative text model with 5.7 billion parameters. DeciLM 6B outpaces pretrained models in its class, with a throughput that’s up to 15 times that of Llama 2 7B’s. DeciLM-6B was further fine-tuned using LoRA for instruction following on a subset of the OpenOrca dataset, creating DeciLM 6B-Instruct.

Model Highlights

  • Task: Text generation
  • Model type: An auto-regressive language model using an optimized transformer decoder architecture that includes variable Grouped-Query Attention
  • Languages (NLP): English
  • Dataset: Trained on the SlimPajamas dataset

Model Architecture

ParametersLayersHeadsSequence LengthGQA num_key_value_heads*Hidden Size
5.7B32324096Variable4096

*AutoNAC was employed to optimize the selection of the GQA num_key_value_heads for each layer of the model.

  • Decoder layer: Varible Grouped Query Attention. Grouped Query Attention (GQA) was introduced in Ainslie et al., 2023
  • Position Embeddings: Dynamic NTK Scaling Rotary Position Embeddings Su et al., 2021

Uses

The model is intended for commercial and research use in English and can be fine-tuned for use in other languages.

Share
Add Your Heading Text Here
				
					from transformers import AutoFeatureExtractor, AutoModelForImageClassification

extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50")

model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")