DeciCoder-6B

Description
DeciCoder 6B is a 6 billion parameter decoder-only code completion model trained on the Python, Java, Javascript, Ruby, Rust, C++, C, and C# subsets of the Starcoder Training Dataset. The model uses variable Grouped Query Attention and has a context window of 4096 tokens.

Publishers
Deci AI Team

Submitted Version
January 17, 2024

Latest Version
N/A

Size
N/A

Code Generation

Overview

DeciCoder 6B is a 6 billion parameter decoder-only code completion model trained on the Python, Java, Javascript, Ruby, Rust, C++, C, and C# subsets of the Starcoder Training Dataset. The model uses variable Grouped Query Attention and has a context window of 4096 tokens. It was trained using a Fill-in-the-Middle training objective. The model’s architecture was generated by Deci’s proprietary Neural Architecture Search-based technology, AutoNAC and optimized to maximize performance on cost efficient hardware such as the Qualcomm Cloud AI 100.

Model Highlights

Task: Code Generation
Model Type: Auto-regressive language model based on the transformer architecture, using Grouped Query Attention
Languages: Python, Java, Javascript, Ruby, RUST, Go, C++, C, and C#

Model Size and Parameters

	Number of Parameters (In billions)	Layers	Heads	Sequence Length	GQA num_key_value_heads
DeciCoder 6B	6B	32	32	2K	Variable

Decoder layer: Grouped Query Attention
Position Embeddings: Rotary Position Embeddings Su et al., 2021

Model Architecture

DeciCoder 6B employs variable Grouped Query Attention to refine the balance between operational efficiency and model quality. It consistently uses 32 queries/heads per layer and introduces variability in the GQA group parameter across different layers. Some layers mimic Multi-Query Attention with only a single group, while others incorporate multiple groups.

By customizing the grouping according to the specific needs of each layer, DeciCoder 6B achieves an ideal equilibrium between the speed of inference and the fidelity of its outputs. It effectively leverages the computational and memory efficiencies of grouped attention, while also benefiting from the detailed and varied attention patterns that this approach facilitates.

Uses

The model is intended to do single/multiline code completion from a context window of up to 4096k tokens. It is not an instruction model and commands like “Write a function that computes the absolute value of an integer,” won’t yield the desired results. A more effective approach is to frame instructions in the style of source code comments (e.g. # this function calculates the absolute value of an integer) or to present a function signature and docstring, enabling the model to complete the function’s body.

Limitations

The model has undergone training with source code from Python, Java, JavaScript, Ruby, RUST, C++, C, and C#. While the primary language in the source is English, it does contain other languages. Therefore, the model can produce code snippets given some context. However, there’s no assurance that the resulting code will function as expected. It might be suboptimal, contain bugs, or even exploits.

The model can produce incorrect output. The model was trained on various data. While great efforts have been taken to clean the pretraining data, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

Metrics and Performance

Evaluation

Below are DeciCoder 6B pass@1 on MultiPL HumanEval scores

Python	Java	JavaScript	C++	C#	RUST	GO
33.3%	30.3%	29.3%	29.93%	20.31%	20.5%	77.47%

Performance

DeciCoder 6B has a smaller parameter count, leading to a reduced memory footprint. Its compact size allows ample room for batching. Additionally, it requires minimal data storage for each inference cycle. When deployed on hardware with powerful processing cores, like Qualcomm’s AI 100, the model fully leverages the computational strengths of hardware.

Inference Tool	Hardware	Prompt Length	Generation Length	Throughput (tokens/second)
Qualcomm Cloud AI 100 SDK	Qualcomm Cloud AI 100	1024	1024	531.3

Measured for maximal batch size on the device

How to Use

# pip install -q transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "Deci/DeciCoder-6B"
device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype=torch.bfloat16, trust_remote_code=True).to(device)
inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
### Attribution
DeciCoder-6B was trained on StarCoder Training Dataset, filtered for
Python, Java, JavaScript, Ruby, RUST, C++, C, and C#. For additional information, please
refer to [https://huggingface.co/datasets/bigcode/starcoderdata](https://huggingface.co/datasets/bigcode/starcoderdata).

How to Cite

Please cite this model using this format:

@misc{DeciFoundationModels,
title = {DeciCoder-6B},
author = {DeciAI Research Team},
year = {2024}
url={[https://huggingface.co/deci/decicoder-6B](https://huggingface.co/deci/decicoder-6B)},
}

Resources

Improve Your DeciCoder Training, Optimization, and Deployment

Community and Feedback

We’d love your feedback on the information presented in this card. Please also share any unexpected results.

To report a bug, file an issue on GitHub.
Be a member of our Discord community and stay up to date with new features and models, important announcements, and upcoming events.

For a short meeting with the SuperGradients team, use this link and choose your preferred time.

DeciCoder-6B

Overview

Model Highlights

Model Size and Parameters

Model Architecture

Uses

Limitations

Disclaimer

Metrics and Performance

Evaluation

Performance

How to Use

License

Resources

Further Reading and Resources

Improve Your DeciCoder Training, Optimization, and Deployment

Community and Feedback

DeciCoder-6B

Overview

Model Highlights

Model Size and Parameters

Model Architecture

Uses

Limitations

Disclaimer

Share

Add Your Heading Text Here