Introducing the First AMD 1B Language Models: AMD OLMo

Nov 04, 2024

Core Contributors:Jiang Liu, Jialian Wu, Prakamya Mishra, Zicheng Liu
Contributors:Sudhanshu Ranjan, Pratik Prabhanjan Brahma, Yusheng Su, Gowtham Ramesh, Peng Sun, Zhe Li, Dong Li, Lu Tian, Emad Barsoum

Introduction

In recent years, the rapid development of artificial intelligence technology, especially the progress in large language models (LLMs), has garnered significant attention and discussion. From the emergence of ChatGPT to subsequent models like GPT-4 and Llama, these language models have demonstrated remarkable capabilities in natural language processing, generation, understanding and reasoning. Continuing AMD tradition of open-sourcing models and code to help the community advance together, we are excited to release our first series of fully open 1 billion parameter language models,AMD OLMo.

Why Build Your Own Language Models

The ability to pre-train and fine-tune your own LLM helps towards the incorporation of domain-specific knowledge, ensuring better alignment with unique use cases. This approach allows organizations to tailor the model’s architecture and training process to meet their unique requirements, achieving a balance between scalability and specialization that off-the-shelf models may not provide. As the demand for customized AI solutions continues to grow, the ability to pre-train LLMs unlocks unprecedented opportunities for innovation and product differentiation across industries. Aligned with the goal of advancing accessible AI research, AMD has open-sourced its complete training details and released the checkpoints for the first series of AMD OLMo models. This initiative empowers a diverse community of users, developers, and researchers to explore, utilize, and train state-of-the-art large language models. By demonstrating the capabilities of AMD Instinct™GPUs in demanding AI workloads, AMD aims to highlight its potential for running large-scale multi-node LM training jobs with trillions of tokens to achieving improved reasoning and instruction-following performance compared to other fully open similar size LMs. In addition, the community can run such models onAMD Ryzen ™ AIPCs that are equipped with Neural Processing Units (NPUs) utilizing AMD Ryzen AI softwareto enable easier local access without privacy concerns, efficient AI inference, and lower power consumption.

Unveiling AMD OLMo Language Models

AMD OLMoare a series of 1 billion parameter language models pre-trained with 1.3 trillion tokens on 16 nodes, each with four (4)AMD Instinct MI250 GPUs. Along with complete details to reproduce, we are releasing three (3) checkpoints corresponding to the various stages of training:

AMD OLMo 1B: Pre-trained on a subset ofDolma v1.7that consists of 1.3 trillion tokens.
AMD OLMo 1B SFT: Supervised fine-tuned (SFT) onTulu V2dataset (1st phase) and thenOpenHermes-2.5,WebInstructSub, andCode-Feedbackdatasets (2nd phase).
AMD OLMo 1B SFT DPO: Aligned with human preferences using Direct Preference Optimization (DPO) onUltraFeedbackdataset.

AMD OLMo 1Bis based on the model architecture and training set up of fully open source 1 billion version ofOLMo, with some key differences. We pre-trainwith less than half the tokens used forOLMo-1B(effectively cutting the compute budget by half while maintaining comparable performance) and execute post-training comprising of a two-phase SFT and DPO alignment to enhance performance in general reasoning, instruction-following and chat capabilities (OLMo-1Bdoes not carry-out any post-training steps). For the two-phase SFT, we create a data mix of high quality and diverse instructional datasets that are publicly available. Overall, our training recipe helps to produce a series of models that achieve better performance over various types of benchmarks as compared to other similar sized fully open-source models trained on publicly available data.

Results

We compareAMD OLMomodels with other similarly sized fully open-source models that have publicly released their data, model weights and training code. The pre-trained baseline models that we used for comparison include:TinyLLaMA-v1.1(1.1B),MobiLLaMA-1B(1.2B),OLMo-1B-hf(1.2B),OLMo-1B-0724-hf(1.2B), andOpenELM-1_1B(1.1B).

Using an end-to-end training pipeline running onAMD Instinct GPUsthat consists of a pre-training stage with 1.3 trillion tokens (which is half the pre-training compute budget as compared to OLMo-1B), a two-phase supervised fine-tuning stage, and DPO based human preference alignment stage,AMD OLMomodels are comparable to or outperform the other similar sized fully open models across general reasoning and chat capabilities, while performing at par on responsible AI benchmarks. The language model was also deployed onto AMD Ryzen AI PCs that can potentially help enable a diverse set of edge use cases. Open sourcing the data, weights, training recipes and code is primarily aimed at helping developers to reproduce as well as innovate further on top. AMD remains committed to providing the open-source community with a steady stream of new AI models and eagerly anticipates the innovations that will emerge from their collaborative efforts.

To dive deeper into the three stages of training, and AMD OLMo model results, please reference the full article here:Introducing the First AMD 1B Language Models: AMD OLMo Fuels AI Advancements

Article By

AMD AI Group

white pearl gradient medium color divider

Related Blogs

View All Blogs

Data Center

Business Systems

Personal & Gaming

Embedded

Resources

GPU Accelerators

Adaptive Accelerators

DPU Accelerators

Ethernet Adapters

Workstations

Desktops

Laptops

Resources

Adaptive SoCs & FPGAs

System-on-Modules (SOMs)

Technologies

Resources

Evaluation Boards & Kits

Processor Tools

Graphics Tools & Apps

Adaptive SoC & FPGA Tools

Intellectual Property & Apps

GPU Accelerator Tools & Apps

Overview

For Data Center & Cloud

For Edge & Endpoints

For Developers

Industries

Industries

Industries

Industries

Industries

Workloads

Gaming

Systems

Technologies

Resources

EPYC Processors

Radeon Graphics & AMD Chipsets

Adaptive SoCs & FPGAs

Alveo Accelerators & Kria SOMs

Ryzen Processors

Ethernet Adapters

Overview

Processors

Accelerators

Adaptive SoCs, FPGAs, & SOMs

Graphics

Overview

Resources by Market Segment

Resources by Product

Resources by Type

About Our Partners

AMD Global Support

Processors & Graphics

Accelerators

Adaptive SoCs & FPGAs

Gaming & Personal Computing

Adaptive & Embedded Computing

Get AMD Fan Gear

Shop Our Retail Partners

Introducing the First AMD 1B Language Models: AMD OLMo

Article By

Related Blogs