AMD Instinct™ GPUs Power DeepSeek-V3: Revolutionizing AI Development with SGLang

Jan 07, 2025

Overview

AMD is excited to announce the integration of the new DeepSeek-V3 model from DeepSeek on AMD Instinct™ GPUs, optimized for performance powered by SGLang (https://github.com/sgl-project/sglang/releases). This integration will help accelerate the development of cutting-edge AI applications and experiences. DeepSeek-V3 is an open-source, multimodal AI model designed to empower developers with unparalleled performance and efficiency. By seamlessly integrating advanced capabilities for processing both text and visual data, DeepSeek-V3 sets a new benchmark for productivity, driving innovation and enabling developers to create cutting-edge AI applications.

The DeepSeek-V3 model is a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were a part of its predecessor, DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. DeepSeek-V3 allows developers to work with advanced models, leveraging memory capabilities to enable processing text and visual data at once, enabling broad access to the latest advancements, and giving developers more features. DeepSeek-V3 achieves the best performance on most benchmarks, especially on math and code tasks.

AMD Instinct™ GPU Accelerators and DeepSeek-V3

AMD Instinct™ GPUs accelerators are transforming the landscape of multimodal AI models, such as DeepSeek-V3, which require immense computational resources and memory bandwidth to process text and visual data. AMD Instinct™ accelerators deliver outstanding performance in these areas.

Leveraging AMD ROCm™ software and AMD Instinct™ GPU accelerators across key stages of DeepSeek-V3 development further strengthens a long-standing collaboration with AMD and commitment to an open software approach for AI. Scalable infrastructure from AMD enables developers to build powerful visual reasoning and understanding applications.

Extensive FP8 support in ROCm can significantly improve the process of running AI models, especially on the inference side. It helps solve key issues such as memory bottlenecks and high latency issues related to more read-write formats, enabling larger models or batches to be processed within the same hardware constraints, resulting in a more efficient training and inference process. In addition, FP8 reduced precision calculations can reduce delays in data transmission and calculations. AMD ROCm extends support for FP8 in its ecosystem, enabling performance and efficiency improvements in everything from frameworks to libraries.

Inference with SGLang on AMD Instinct™ GPUs

SGLang: Fully supports the DeepSeek-V3 model inference modes: https://github.com/sgl-project/sglang/releases

Generic Build for ROCm Docker Image

To build Docker image with ROCm support, follow these steps:

Launch the Docker Container:
docker run -it --ipc=host --cap-add=SYS_PTRACE --network=host \ --device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined \ --group-add video --privileged -w /workspace lmsysorg/sglang:v0.4.2.post3-rocm630
Get Started:
1. Login to Hugging Face:Log in to Hugging Face using the CLI:
  huggingface-cli login
2. Start the SGLang Server:
  Launch server to host the DeepSeekV3 FP8 Model on your local machine:
  python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3 --port 30000 --tp 8 --trust-remote-code
3. Generate Text:
  Open another terminal and send requests to generate text after server running:curl http://localhost:30000/generate \ -H "Content-Type: application/json" \ -d '{ "text": "Once upon a time,", "sampling_params": { "max_new_tokens": 16, "temperature": 0 } }'
Benchmark:
export HSA_NO_SCRATCH_RECLAIM=1
one batch throughput and latency:
python3 -m sglang.bench_one_batch --batch-size 32 --input 128 --output 32 --model deepseek-ai/DeepSeek-V3 --tp 8 --trust-remote-code
server:
python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V3 --tp 8 --trust-remote-code python3 benchmark/gsm8k/bench_sglang.py --num-questions 2000 --parallel 2000 --num-shots 8

Accuracy: 0.952
Invalid: 0.000

Notes: since FP8 training is natively adopted in DeepSeek-v3 framework, it only provides FP8 weights. If the user requires BF16 weights for experimentation, they can use the provided conversion script to perform the transformation. Here is an example of converting FP8 weights to BF16:

cd inference python fp8_cast_bf16.py --input-fp8-hf-path /path/to/fp8_weights --output-bf16-hf-path /path/to/bf16_weights

AMD and DeepSeek Collaboration: Day 0 Support Readiness:

With the release of DeepSeek-V3, AMD continues its tradition of fostering innovation through close collaboration with the DeepSeek team. This partnership ensures that developers are fully equipped to leverage the DeepSeek-V3 model on AMD Instinct™ GPUs right from Day-0 providing a broader choice of GPUs hardware and an open software stack ROCm™ for optimized performance and scalability. AMD will continue optimizing DeepSeek-v3 performance with CK-tile based kernels on AMD Instinct™ GPUs. AMD is committed to collaborate with open-source model providers to accelerate AI innovation and empower developers to create the next generation of AI experiences.

Acknowledgement:

We sincerely appreciate the exceptional support and close collaboration with the DeepSeek and SGLang teams. A special thanks to AMD team members Peng Sun, Bruce Xue, Hai Xiao, David Li, Carlus Huang, Mingtao Gu, Vamsi Alla, Jason F., Vinayak Gok, Wun-guo Huang, Caroline Kang, Gilbert Lei, Soga Lin, Jingning Tang, Fan Wu, George Wang, Anshul Gupta, Shucai Xiao, Lixun Zhang, Xicheng (AK) Feng A and everyone else who contributed to this effort.

Additional Resources:

Visit the ⁠ROCm AI Developer Hub for additional tutorials, blogs, open-source projects, and other resources for AI development on AMD GPUs.
Explore AMD ROCm™ Software, an open software stack that includes programming models, tools, compilers, libraries, and runtimes for AI and HPC solution development on AMD GPUs: https://www.amd.com/en/products/software/rocm.html
Discover AMD Instinct™ Accelerators designed to deliver breakthrough performance for AI and HPC workloads: https://www.amd.com/en/products/accelerators/instinct.html
Learn more about DeepSeek-V3 on Hugging Face, including its architecture and performance benchmarks: https://huggingface.co/deepseek-ai/DeepSeek-V3
Chat with DeepSeek-V3 on the DeepSeek official chat platform: chat.deepseek.com
Access DeepSeek’s OpenAI-compatible API to build and integrate your own applications on the DeepSeek platform: platform.deepseek.com

Article By

George Wang

Bruce Xue

Anshul Gupta

Data Center

Business Systems

Personal & Gaming

Embedded

Resources

GPU Accelerators

Adaptive Accelerators

DPU Accelerators

Ethernet Adapters

Workstations

Desktops

Laptops

Resources

Adaptive SoCs & FPGAs

System-on-Modules (SOMs)

Technologies

Resources

Evaluation Boards & Kits

Processor Tools

Graphics Tools & Apps

Adaptive SoC & FPGA Tools

Intellectual Property & Apps

GPU Accelerator Tools & Apps

Overview

For Data Center & Cloud

For Edge & Endpoints

For Developers

Industries

Industries

Industries

Industries

Industries

Workloads

Gaming

Systems

Technologies

Resources

EPYC Processors

Radeon Graphics & AMD Chipsets

Adaptive SoCs & FPGAs

Alveo Accelerators & Kria SOMs

Ryzen Processors

Ethernet Adapters

Overview

Processors

Accelerators

Adaptive SoCs, FPGAs, & SOMs

Graphics

Overview

Resources by Market Segment

Resources by Product

Resources by Type

About Our Partners

AMD Global Support

Processors & Graphics

Accelerators

Adaptive SoCs & FPGAs

Gaming & Personal Computing

Adaptive & Embedded Computing

Get AMD Fan Gear

Shop Our Retail Partners

AMD Instinct™ GPUs Power DeepSeek-V3: Revolutionizing AI Development with SGLang

Overview

AMD Instinct™ GPU Accelerators and DeepSeek-V3

Inference with SGLang on AMD Instinct™ GPUs

Generic Build for ROCm Docker Image

AMD and DeepSeek Collaboration: Day 0 Support Readiness:

Acknowledgement:

Additional Resources:

Article By

Company

News & Events

Community

Partners

Investors