DeciLM-7B

Description

DeciLM-7B is a 7.04 billion- parameter decoder-only text generation model. This Apache 2.0-licensed model is currently the top-performing 7 billion parameter base language model on the Open LLM Leaderboard.

Publishers
Deci AI Team

Submitted Version
December 12, 2023

Latest Version
N/A

Size
N/A

Text to Text

Overview

DeciLM-7B is a pretrained large language model for generative text with 7.04 billion parameters and outperforms models in its class with a 4.4x reduction in throughput over Mistral 7B. DeciLM-7B was further fine-tuned for instruction with LoRA on the SlimOrca dataset, creating DeciLM 7B-Instruct.

With support for 8K-token sequence length, the highly efficient model uses variable Grouped-Query Attention (GQA) to achieve an optimal balance between performance and computational efficiency.

The model’s architecture was generated using Deci’s proprietary Neural Architecture Search-based technology, AutoNAC.

DeciLM-7B is available under the Apache 2.0 license, offering unrestricted use. It’s designed for versatile deployment, whether locally or on any cloud platform.

Model Highlights

Task: Text Generation
Model Type: An auto-regressive language model using an optimized transformer decoder architecture that includes variable Grouped-Query Attention
Languages (NLP): English

Model Architecture

Parameters	Layers	Heads	Sequence Length	GQA Key Value Heads
7.04B	32	32	8K	Variable

*AutoNAC was employed to optimize the selection of the GQA num_key_value_heads for each model layer.

Decoder layer: Varible Grouped Query Attention. Grouped Query Attention (GQA) was introduced in Ainslie et al., 2023
Position Embeddings: Dynamic NTK Scaling Rotary Position Embeddings Su et al., 2021

Uses

This model is intended for commercial and research use in English and can be fine-tuned for use in other languages.

Metrics and Performance

Evaluation

Below are DeciLM-7B’s evaluation results:

Model

Description

Leader-

board

average

ARC

HellaSwag

MMLU

TruthfulQA

Wino-

grande

GSM8K

DeciLM-7B-base

Base model, trained on permissively-licensed data.

61.55

59.39

82.51

59.76

40.33

79.95

47.38

Runtime Benchmarks

Inference Tool / Hardware	Hardware	Prompt Length	Generated Length	Generated tokens/s	Batch Size	# of Prompts
Hugging FacePytorch	A100-SXM4-80GB 400W	512	512	1174	352	352
Hugging FacePytorch	A100-SXM4-80GB 400W	2048	2048	328	72	72
Infery-LLM	A100-SXM4-80GB 400W	512	512	4558	1024	4096
Infery-LLM	A100-SXM4-80GB 400W	2048	2048	3997	512	2048
Infery-LLM	A10	512	512	1345	128	512
Infery-LLM	A10	2058	2048	599	32	128

In order to replicate the results of the PyTorch benchmark, use this code example.

Infery-LLM, Deci’s optimization and inference SDK’s features a suite of optimization techniques, including selective quantization, optimized beam search, continuous batching, and custom CUDA kernels. To explore the full capabilities of Infery-LLM, we invite you to book a demo with our experts.

How to Use

Use the code below to get started with the model.

# pip install -q transformers

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "Deci/DeciLM-7B"
device = "cuda" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype=torch.bfloat16, trust_remote_code=True).to(device)

inputs = tokenizer.encode("In a shocking finding, scientists discovered a herd of unicorns living in", return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=100, do_sample=True, top_p=0.95)
print(tokenizer.decode(outputs[0]))

How to Cite

Please cite this model using this format.

@misc{DeciFoundationModels,
title = {DeciLM 7B},
author = {DeciAI Research Team},
year = {2023}
url={[https://huggingface.co/Deci/DeciLM-7b](https://huggingface.co/Deci/DeciLM-7b)},

Resources

Improve Your DeciLM Training, Optimization, and Deployment

Community and Feedback

We’d love your feedback on the information presented in this card. Please also share any unexpected results.

To report a bug, file an issue on GitHub.
Be a member of our Discord community and stay up to date with new features and models, important announcements, and upcoming events.

For a short meeting with the SuperGradients team, use this link and choose your preferred time.

DeciLM-7B

Overview

Model Highlights

Model Architecture

Uses

Metrics and Performance

Evaluation

Runtime Benchmarks

How to Use

How to Cite

License

Resources

Further Reading and Resources

Improve Your DeciLM Training, Optimization, and Deployment

Community and Feedback

DeciLM-7B

Overview

Model Highlights

Model Architecture

Uses

Share

Add Your Heading Text Here