Description
DeciLM-7B is a 7.04 billion- parameter decoder-only text generation model. This Apache 2.0-licensed model is currently the top-performing 7 billion parameter base language model on the Open LLM Leaderboard.
Publishers
Deci AI Team
Submitted Version
December 12, 2023
Latest Version
N/A
Size
N/A
DeciLM-7B is a pretrained large language model for generative text with 7.04 billion parameters and outperforms models in its class with a 4.4x reduction in throughput over Mistral 7B. DeciLM-7B was further fine-tuned for instruction with LoRA on the SlimOrca dataset, creating DeciLM 7B-Instruct.
With support for 8K-token sequence length, the highly efficient model uses variable Grouped-Query Attention (GQA) to achieve an optimal balance between performance and computational efficiency.
The model’s architecture was generated using Deci’s proprietary Neural Architecture Search-based technology, AutoNAC.
DeciLM-7B is available under the Apache 2.0 license, offering unrestricted use. It’s designed for versatile deployment, whether locally or on any cloud platform.
Parameters | Layers | Heads | Sequence Length | GQA Key Value Heads |
7.04B | 32 | 32 | 8K | Variable |
*AutoNAC was employed to optimize the selection of the GQA num_key_value_heads for each model layer.
This model is intended for commercial and research use in English and can be fine-tuned for use in other languages.
Below are DeciLM-7B’s evaluation results:
Model | Description | Leader- board average | ARC | HellaSwag | MMLU | TruthfulQA | Wino- grande | GSM8K |
DeciLM-7B-base | Base model, trained on permissively-licensed data. | 61.55 | 59.39 | 82.51 | 59.76 | 40.33 | 79.95 | 47.38 |
Inference Tool / Hardware | Hardware | Prompt Length | Generated Length | Generated tokens/s | Batch Size | # of Prompts |
Hugging FacePytorch | A100-SXM4-80GB 400W | 512 | 512 | 1174 | 352 | 352 |
Hugging FacePytorch | A100-SXM4-80GB 400W | 2048 | 2048 | 328 | 72 | 72 |
Infery-LLM | A100-SXM4-80GB 400W | 512 | 512 | 4558 | 1024 | 4096 |
Infery-LLM | A100-SXM4-80GB 400W | 2048 | 2048 | 3997 | 512 | 2048 |
Infery-LLM | A10 | 512 | 512 | 1345 | 128 | 512 |
Infery-LLM | A10 | 2058 | 2048 | 599 | 32 | 128 |
In order to replicate the results of the PyTorch benchmark, use this code example.
Infery-LLM, Deci’s optimization and inference SDK’s features a suite of optimization techniques, including selective quantization, optimized beam search, continuous batching, and custom CUDA kernels. To explore the full capabilities of Infery-LLM, we invite you to book a demo with our experts.
Use the code below to get started with the model.
# pip install -q transformers import torch from transformers import AutoModelForCausalLM, AutoTokenizer checkpoint = "Deci/DeciLM-7B" device = "cuda" # for GPU usage or "cpu" for CPU usage tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype=torch.bfloat16, trust_remote_code=True).to(device) inputs = tokenizer.encode("In a shocking finding, scientists discovered a herd of unicorns living in", return_tensors="pt").to(device) outputs = model.generate(inputs, max_new_tokens=100, do_sample=True, top_p=0.95) print(tokenizer.decode(outputs[0]))
Please cite this model using this format.
@misc{DeciFoundationModels, title = {DeciLM 7B}, author = {DeciAI Research Team}, year = {2023} url={[https://huggingface.co/Deci/DeciLM-7b](https://huggingface.co/Deci/DeciLM-7b)},
Apache 2.0
We’d love your feedback on the information presented in this card. Please also share any unexpected results.
For a short meeting with the SuperGradients team, use this link and choose your preferred time.
Deci is ISO 27001
Certified
from transformers import AutoFeatureExtractor, AutoModelForImageClassification
extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50")
model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")