DeciCoder

Description
DeciCoder 1B is a 1 billion parameter decoder-only code completion model trained on the Python, Java, and Javascript subsets of Starcoder Training Dataset.

Publishers
Deci AI Team

Submitted Version
August 15, 2023

Latest Version
N/A

Size
N/A

Code Generation

Overview

DeciCoder 1B is a 1 billion parameter decoder-only code completion model trained on the Python, Java, and Javascript subsets of Starcoder Training Dataset. The model uses Grouped Query Attention and has a context window of 2048 tokens. It was trained using a Fill-in-the-Middle training objective. The model’s architecture was generated by Deci’s proprietary Neural Architecture Search-based technology, AutoNAC.

Model Highlights

Task: Code generation
Model type: An auto-regressive language model based on the transformer decoder architecture, using Grouped Query Attention
Languages: Python, Java, and Javascript
Dataset: Trained on The Stack and StarCoder datasets

Model Size and Parameters

Parameters	Layers	Heads	Sequence Length	GQA num_key_value_heads	Hidden Size
1.1B	20	32	2048	4	2048

Decoder layer: Grouped Query Attention Ainslie et al., 2023
Position Embeddings: Rotary Position Embeddings Su et al., 2021

Uses

The model is intended to do single/multiline code completion from a context window of up to 2048k tokens. It is not an instruction model and commands like “Write a function that computes the absolute value of an integer,” won’t yield the desired results. A more effective approach is to frame instructions in the style of source code comments (e.g. # this function calculates the absolute value of an integer) or to present a function signature and docstring, enabling the model to complete the function’s body.

Limitations

The model has undergone training with source code from Python, Java, and JavaScript. While the primary language in the source is English, it does contain other languages. Therefore, the model can produce code snippets given some context. However, there’s no assurance that the resulting code will function as expected. It might be suboptimal, contain bugs, or even exploits.

Metrics and Performance

Training Data

DeciCoder was trained on the Python, Java, and Javascript subsets of Starcoder Training Dataset.

Training Procedure

Warm-Up Steps: 9000
Total Training Steps: 284k
Total Tokens: 446B
Global Batch Size: 768
Optimizer: AdamW
Optimizer Parameters: beta1=0.9, beta2=0.95
Weight Decay: 0.1
Learning Rate: 4e-4
Learning Rate Schedule: cosine

Evaluation

Below are DeciCoder’s pass@1 on MultiPL HumanEval scores

Python	JavaScript	Java
19.1%	18.4%	16.6%

Runtime Benchmarks

Inference Tool/Hardware	A10 (tokens/sec)	A100 (tokens/sec)
PyTorch	1,364.2	3,244.4
Infery LLM	3,889.3	11,676.8

Throughput (tokens/sec) – Measured with optimal batch size per hardware – A10 on BS 128, A100 on BS 512

How to Use

You can use the DeciCoder model to do code generation. Below, see how you can easily load the DeciCoder model.

# pip install -q transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "Deci/DeciCoder-1b"
device = "cuda" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype=torch.bfloat16, trust_remote_code=True).to(device)

inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))

Resources

Improve Your DeciCoder Training, Optimization, and Deployment

Community and Feedback

We’d love your feedback on the information presented in this card. Please also share any unexpected results.

To report a bug, file an issue on GitHub.
Be a member of our Discord community and stay up to date with new features and models, important announcements, and upcoming events.

For a short meeting with the SuperGradients team, use this link and choose your preferred time.

DeciCoder

Overview

Model Highlights

Model Size and Parameters

Uses

Limitations

Metrics and Performance

Training Data

Training Procedure

Evaluation

Runtime Benchmarks

How to Use

License

Resources

Further Reading and Resources

Improve Your DeciCoder Training, Optimization, and Deployment

Community and Feedback

DeciCoder

Overview

Model Highlights

Model Size and Parameters

Uses

Limitations

Share

Add Your Heading Text Here