Description

DeciLM-7B is a 7.04 billion- parameter decoder-only text generation model. This Apache 2.0-licensed model is currently the top-performing 7 billion parameter base language model on the Open LLM Leaderboard. 

Publishers
Deci AI Team

Submitted Version
December 12, 2023

Latest Version
N/A 

Size
N/A 

Text to Text

Overview


DeciLM-7B is a pretrained large language model for generative text with 7.04 billion parameters and outperforms models in its class with a 4.4x reduction in throughput over Mistral 7B. DeciLM-7B was further fine-tuned for instruction with LoRA on the SlimOrca dataset, creating DeciLM 7B-Instruct.

With support for 8K-token sequence length, the highly efficient model uses variable Grouped-Query Attention (GQA) to achieve an optimal balance between performance and computational efficiency.

The model’s architecture was generated using Deci’s proprietary Neural Architecture Search-based technology, AutoNAC.

DeciLM-7B is available under the Apache 2.0 license, offering unrestricted use. It’s designed for versatile deployment, whether locally or on any cloud platform.

Model Highlights

  • Task: Text Generation 
  • Model Type: An auto-regressive language model using an optimized transformer decoder architecture that includes variable Grouped-Query Attention
  • Languages (NLP): English 

Model Architecture

Parameters

Layers

Heads

Sequence Length

GQA Key Value Heads

7.04B

32

32

8K

Variable

*AutoNAC was employed to optimize the selection of the GQA num_key_value_heads for each model layer.

  • Decoder layer: Varible Grouped Query Attention. Grouped Query Attention (GQA) was introduced in Ainslie et al., 2023
  • Position Embeddings: Dynamic NTK Scaling Rotary Position Embeddings Su et al., 2021

Uses

This model is intended for commercial and research use in English and can be fine-tuned for use in other languages.

Share
Add Your Heading Text Here
				
					from transformers import AutoFeatureExtractor, AutoModelForImageClassification

extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50")

model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")