DeciLM-7B-instruct

Description

DeciLM-7B-instruct is a derivative of the recently released DeciLM-7B language model, a pretrained, high-efficiency generative text model with 7.04 billion-parameters. DeciLM-7B-instruct is a model for short-form instruction following. It is built by LoRA fine-tuning on the SlimOrca dataset.

Publishers
Deci AI Team

Submitted Version
December 12, 2023

Latest Version
N/A

Size
N/A

Text to Text

Overview

DeciLM-7B-instruct is a derivative of the recently released DeciLM-7B language model, a pretrained, high-efficiency generative text model with 7.04 billion parameters. DeciLM-7B-instruct is one the best 7B instruct models obtained using simple LoRA fine-tuning, without relying on advanced techniques such as RLHF, DPO, etc.

DeciLM-7B-instruct is available under the Apache 2.0 license, offering unrestricted use. It’s designed for versatile deployment, whether locally or on any cloud platform.

Model Highlights

Task: Text Generation
Model Type: An auto-regressive language model using an optimized transformer decoder architecture that includes variable Grouped-Query Attention
Languages (NLP): English

Model Architecture

Parameters	Layers	Heads	Sequence Length	GQA Key Value Heads
7.04B	32	32	8K	Variable

*AutoNAC was employed to optimize the selection of the GQA num_key_value_heads for each model layer.

Decoder layer: Varible Grouped Query Attention. Grouped Query Attention (GQA) was introduced in Ainslie et al., 2023
Position Embeddings: Dynamic NTK Scaling Rotary Position Embeddings Su et al., 2021

Uses

This model is intended for commercial and research use in English and can be fine-tuned for use in other languages.

Metrics and Performance

Evaluation

Below are DeciLM-7B’s evaluation results:

Model

Description

Leader-

board

average

ARC

HellaSwag

MMLU

TruthfulQA

Wino-

grande

GSM8K

DeciLM-7B-instruct

DeciLM-7B-base fine-tuned on SlimOrca

63.19

61.01

82.37

60.24

49.75

79.72

46.02

Runtime Benchmarks

Inference Tool / Hardware	Hardware	Prompt Length	Generated Length	Generated tokens/s	Batch Size	# of Prompts
Hugging FacePytorch	A100-SXM4-80GB 400W	512	512	1174	352	352
Hugging FacePytorch	A100-SXM4-80GB 400W	2048	2048	328	72	72
Infery-LLM	A100-SXM4-80GB 400W	512	512	4558	1024	4096
Infery-LLM	A100-SXM4-80GB 400W	2048	2048	3997	512	2048
Infery-LLM	A10	512	512	1345	128	512
Infery-LLM	A10	2058	2048	599	32	128

In order to replicate the results of the PyTorch benchmark, use this code example.

Infery-LLM, Deci’s optimization and inference SDK’s features a suite of optimization techniques, including selective quantization, optimized beam search, continuous batching, and custom CUDA kernels. To explore the full capabilities of Infery-LLM, we invite you to book a demo with our experts.

How to Use

Use the code below to get started with the model.

# pip install -q transformers

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "Deci/DeciLM-7B"
device = "cuda" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype=torch.bfloat16, trust_remote_code=True).to(device)

inputs = tokenizer.encode("In a shocking finding, scientists discovered a herd of unicorns living in", return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=100, do_sample=True, top_p=0.95)print(tokenizer.decode(outputs[0]))

How to Cite

Please cite this model using this format.

@misc{DeciFoundationModels,
title = {DeciLM 7B-Instruct},
author = {DeciAI Research Team},
year = {2023}
url={[https://huggingface.co/Deci/DeciLM-7b-Instruct](https://huggingface.co/Deci/DeciLM-7b-Instruct)},
}

Resources

Improve Your DeciLM Training, Optimization, and Deployment

Community and Feedback

We’d love your feedback on the information presented in this card. Please also share any unexpected results.

To report a bug, file an issue on GitHub.
Be a member of our Discord community and stay up to date with new features and models, important announcements, and upcoming events.

For a short meeting with the SuperGradients team, use this link and choose your preferred time.

DeciLM-7B-instruct

Overview

Model Highlights

Model Architecture

Uses

Metrics and Performance

Evaluation

Runtime Benchmarks

How to Use

How to Cite

License

Resources

Further Reading and Resources

Improve Your DeciLM Training, Optimization, and Deployment

Community and Feedback

DeciLM-7B-instruct

Overview

Model Highlights

Model Architecture

Uses

Share

Add Your Heading Text Here