DeciLM 6B

Description

DeciLM 6B is a 5.7 billion parameter decoder-only text generation model. With a context window of 4096 tokens, the highly efficient model uses variable Grouped-Query Attention (GQA) to achieve an optimal balance between performance and computational efficiency. The model’s architecture was generated using Deci’s proprietary Neural Architecture Search-based technology, AutoNAC.

Publishers
Deci AI Team

Submitted Version
September 13, 2023

Latest Version
N/A

Size
N/A

Text to Text

Overview

Deci developed and publically released the DeciLM 6B large language model, a pretrained, high-efficiency generative text model with 5.7 billion parameters. DeciLM 6B outpaces pretrained models in its class, with a throughput that’s up to 15 times that of Llama 2 7B’s. DeciLM-6B was further fine-tuned using LoRA for instruction following on a subset of the OpenOrca dataset, creating DeciLM 6B-Instruct.

Model Highlights

Task: Text generation
Model type: An auto-regressive language model using an optimized transformer decoder architecture that includes variable Grouped-Query Attention
Languages (NLP): English
Dataset: Trained on the SlimPajamas dataset

Model Architecture

Parameters	Layers	Heads	Sequence Length	GQA num_key_value_heads*	Hidden Size
5.7B	32	32	4096	Variable	4096

*AutoNAC was employed to optimize the selection of the GQA num_key_value_heads for each layer of the model.

Decoder layer: Varible Grouped Query Attention. Grouped Query Attention (GQA) was introduced in Ainslie et al., 2023
Position Embeddings: Dynamic NTK Scaling Rotary Position Embeddings Su et al., 2021

Uses

The model is intended for commercial and research use in English and can be fine-tuned for use in other languages.

Metrics and Performance

Training Details

DeciLM 6B underwent training utilizing a subset of the SlimPajamas dataset, leveraging advanced proprietary methodologies allowing for fast training.

Evaluation

Below are DeciLM’s 6B evaluation results.

Average	ARC Challenge*	ARC Easy*	BoolQ	HellaSwag*	LAMBDA OpenAI	OpenBookQA	PIQA	TruthfulQA	Winogrande
60.33	42.06	70.02	71.01	74.58	69.78	34	77.09	36.19	68.03

Accuracy-norm score*

Runtime Benchmarks

Inference Tool/Hardware	A10 (tokens/sec)
PyTorch	652.49
Infery LLM	2,029.6

Throughput (tokens/sec) – Measured with optimal batch – PyTorch BS 64, Infery LLM BS 128
In order to replicate the results of the PyTorch benchmark, use this code example

How to Use

You can use the DeciLM model to do text generation. Below, see how you can easily load the DeciLM model.

# pip install -q transformers

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "Deci/DeciLM-6b"
device = "cuda" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype=torch.bfloat16, trust_remote_code=True).to(device)

inputs = tokenizer.encode("In a shocking finding, scientists discovered a herd of unicorns living in", return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=100, do_sample=True, top_p=0.95)
print(tokenizer.decode(outputs[0]))

Resources

Improve Your DeciLM Training, Optimization, and Deployment

Community and Feedback

We’d love your feedback on the information presented in this card. Please also share any unexpected results.

To report a bug, file an issue on GitHub.
Be a member of our Discord community and stay up to date with new features and models, important announcements, and upcoming events.

For a short meeting with the SuperGradients team, use this link and choose your preferred time.

DeciLM 6B

Overview

Model Highlights

Model Architecture

Uses

Metrics and Performance

Training Details

Evaluation

Runtime Benchmarks

How to Use

License

Resources

Further Reading and Resources

Improve Your DeciLM Training, Optimization, and Deployment

Community and Feedback

DeciLM 6B

Overview

Model Highlights

Model Architecture

Uses

Share

Add Your Heading Text Here