Introducing the First AMD 1B Language Models: AMD OLMo
Nov 04, 2024

Core Contributors:Jiang Liu, Jialian Wu, Prakamya Mishra, Zicheng Liu
Contributors:Sudhanshu Ranjan, Pratik Prabhanjan Brahma, Yusheng Su, Gowtham Ramesh, Peng Sun, Zhe Li, Dong Li, Lu Tian, Emad Barsoum
The ability to pre-train and fine-tune your own LLM helps towards the incorporation of domain-specific knowledge, ensuring better alignment with unique use cases. This approach allows organizations to tailor the model’s architecture and training process to meet their unique requirements, achieving a balance between scalability and specialization that off-the-shelf models may not provide. As the demand for customized AI solutions continues to grow, the ability to pre-train LLMs unlocks unprecedented opportunities for innovation and product differentiation across industries. Aligned with the goal of advancing accessible AI research, AMD has open-sourced its complete training details and released the checkpoints for the first series of AMD OLMo models. This initiative empowers a diverse community of users, developers, and researchers to explore, utilize, and train state-of-the-art large language models. By demonstrating the capabilities of AMD Instinct™GPUs in demanding AI workloads, AMD aims to highlight its potential for running large-scale multi-node LM training jobs with trillions of tokens to achieving improved reasoning and instruction-following performance compared to other fully open similar size LMs. In addition, the community can run such models onAMD Ryzen™ AIPCs that are equipped with Neural Processing Units (NPUs) utilizing AMD Ryzen AI softwareto enable easier local access without privacy concerns, efficient AI inference, and lower power consumption.
Unveiling AMD OLMo Language Models
AMD OLMoare a series of 1 billion parameter language models pre-trained with 1.3 trillion tokens on 16 nodes, each with four (4)AMD Instinct MI250 GPUs. Along with complete details to reproduce, we are releasing three (3) checkpoints corresponding to the various stages of training:
- AMD OLMo 1B: Pre-trained on a subset ofDolma v1.7that consists of 1.3 trillion tokens.
- AMD OLMo 1B SFT: Supervised fine-tuned (SFT) onTulu V2dataset (1st phase) and thenOpenHermes-2.5,WebInstructSub, andCode-Feedbackdatasets (2nd phase).
- AMD OLMo 1B SFT DPO: Aligned with human preferences using Direct Preference Optimization (DPO) onUltraFeedbackdataset.
AMD OLMo 1Bis based on the model architecture and training set up of fully open source 1 billion version ofOLMo, with some key differences. We pre-trainwith less than half the tokens used forOLMo-1B(effectively cutting the compute budget by half while maintaining comparable performance) and execute post-training comprising of a two-phase SFT and DPO alignment to enhance performance in general reasoning, instruction-following and chat capabilities (OLMo-1Bdoes not carry-out any post-training steps). For the two-phase SFT, we create a data mix of high quality and diverse instructional datasets that are publicly available. Overall, our training recipe helps to produce a series of models that achieve better performance over various types of benchmarks as compared to other similar sized fully open-source models trained on publicly available data.
Results
We compareAMD OLMomodels with other similarly sized fully open-source models that have publicly released their data, model weights and training code. The pre-trained baseline models that we used for comparison include:TinyLLaMA-v1.1(1.1B),MobiLLaMA-1B(1.2B),OLMo-1B-hf(1.2B),OLMo-1B-0724-hf(1.2B), andOpenELM-1_1B(1.1B).
Using an end-to-end training pipeline running onAMD Instinct GPUsthat consists of a pre-training stage with 1.3 trillion tokens (which is half the pre-training compute budget as compared to OLMo-1B), a two-phase supervised fine-tuning stage, and DPO based human preference alignment stage,AMD OLMomodels are comparable to or outperform the other similar sized fully open models across general reasoning and chat capabilities, while performing at par on responsible AI benchmarks. The language model was also deployed onto AMD Ryzen AI PCs that can potentially help enable a diverse set of edge use cases. Open sourcing the data, weights, training recipes and code is primarily aimed at helping developers to reproduce as well as innovate further on top. AMD remains committed to providing the open-source community with a steady stream of new AI models and eagerly anticipates the innovations that will emerge from their collaborative efforts.
To dive deeper into the three stages of training, and AMD OLMo model results, please reference the full article here:Introducing the First AMD 1B Language Models: AMD OLMo Fuels AI Advancements
