Scaling AI in Production with Vultr
Abstract
Explore how enterprises can scale AI from training to inference using AMD-powered infrastructure on Vultr. Through a deep dive into the University of Cambridge's Tessera model, learn how organizations can accelerate AI deployment, improve operational efficiency, and scale globally. The session also highlights real-world AI initiatives across healthcare, retail, finance, manufacturing, and hospitality.
July 23, 2026 4:00 PM - 4:45 PM PDT
Speakers
Presented By
Manager, Product Marketing | AMD
Session Type
Breakout Session
Related Product
Instinct
Related Sessions
-
How to Right-size Your Memory
How to Right-size Your Memory
Your finance team doesn't care about tokens per second. They care about predictable costs, compliance risk, and vendor lock-in. With agentic AI, the metrics for tracking success are even more complex. But benchmarks don't answer the question that actually matters: Should you undertake this effort and is it viable for your business? In this interactive technical discussion, we’ll break down the tradeoffs, work through the math, and pressure-test the strategy together.;Your finance team doesn't care about tokens per second. They care about predictable costs, compliance risk, and vendor lock-in. With agentic AI, the metrics for tracking success are even more complex. But benchmarks don't answer the question that actually matters: Should you undertake this effort and is it viable for your business? In this interactive technical discussion, we’ll break down the tradeoffs, work through the math, and pressure-test the strategy together.
July 23, 2026
-
Training at Scale with AMD Primus
Training at Scale with AMD Primus
Primus makes large-scale training on Instinct reliable, debuggable and highly performant. It supports the latest OSS training frameworks, models, and is expanding support to new, cutting-edge model architectures, training techniques, and datatypes. SOTA pre and post training performance with Primus, proven at scales of thousands of GPUs, positions an AMD Instinct GPU as a competitive solution for model development at frontier labs, enterprises, and AI startups.;Primus makes large-scale training on Instinct reliable, debuggable and highly performant. It supports the latest OSS training frameworks, models, and is expanding support to new, cutting-edge model architectures, training techniques, and datatypes. SOTA pre and post training performance with Primus, proven at scales of thousands of GPUs, positions an AMD Instinct GPU as a competitive solution for model development at frontier labs, enterprises, and AI startups.
July 23, 2026
-
Benchmarking AI Systems: from Model Metrics to Real-World Performance
Benchmarking AI Systems: from Model Metrics to Real-World Performance
AI benchmarking is evolving rapidly as enterprises scale from experimentation to deployment. This interactive session explores measuring real world performance across inference and training workloads. We will discuss metrics that matter, throughput vs. latency tradeoffs, memory bandwidth, and open software ecosystems. Gain practical insights into evaluating AI infrastructure for performance, scalability, efficiency, and TCO in modern enterprise and developer environments.;AI benchmarking is evolving rapidly as enterprises scale from experimentation to deployment. This interactive session explores measuring real world performance across inference and training workloads. We will discuss metrics that matter, throughput vs. latency tradeoffs, memory bandwidth, and open software ecosystems. Gain practical insights into evaluating AI infrastructure for performance, scalability, efficiency, and TCO in modern enterprise and developer environments.
July 23, 2026
-
Redefining Server Performance for AI and Cloud
Redefining Server Performance for AI and Cloud
Discover how MSI server platforms powered by AMD EPYC and DC-MHS modular architecture are advancing performance, efficiency, and scalability for modern data centers. Learn how this standardized modular design enables greater flexibility and expandability for faster integration, easier serviceability, and optimized resource utilization. The session will highlight higher VM density, improved throughput, expanded memory bandwidth, and greater power efficiency for AI, cloud, and enterprise workloads.;Discover how MSI server platforms powered by AMD EPYC and DC-MHS modular architecture are advancing performance, efficiency, and scalability for modern data centers. Learn how this standardized modular design enables greater flexibility and expandability for faster integration, easier serviceability, and optimized resource utilization. The session will highlight higher VM density, improved throughput, expanded memory bandwidth, and greater power efficiency for AI, cloud, and enterprise workloads.
July 23, 2026
-
Agentic Kernel Performance Tuning with AMD ROCm
Agentic Kernel Performance Tuning with AMD ROCm
This session introduces an agentic kernel development workflow for optimizing AI and HPC workloads on AMD ROCm. Learn how a self-directing optimization loop can profile, analyze, optimize, validate, and generate production-ready kernel improvements with minimal manual tuning. The talk highlights how AMD is accelerating kernel engineering by reducing weeks of performance optimization effort into an automated, scalable workflow for developers and performance engineers.;This session introduces an agentic kernel development workflow for optimizing AI and HPC workloads on AMD ROCm. Learn how a self-directing optimization loop can profile, analyze, optimize, validate, and generate production-ready kernel improvements with minimal manual tuning. The talk highlights how AMD is accelerating kernel engineering by reducing weeks of performance optimization effort into an automated, scalable workflow for developers and performance engineers.
July 23, 2026
-
Redefining Scalable AI Performance: OCI Supercomputing in the Cloud
Redefining Scalable AI Performance: OCI Supercomputing in the Cloud
Organizations building frontier AI models need infrastructure designed for performance at scale. This session shows how OCI combines AMD Instinct, AMD EPYC, and Pensando in Oracle Acceleron to enable ultra-low-latency networking for high-throughput distributed workloads, with practical guidance for designing infrastructure for large language, multimodal, and scientific AI models.;Organizations building frontier AI models need infrastructure designed for performance at scale. This session shows how OCI combines AMD Instinct, AMD EPYC, and Pensando in Oracle Acceleron to enable ultra-low-latency networking for high-throughput distributed workloads, with practical guidance for designing infrastructure for large language, multimodal, and scientific AI models.
July 23, 2026
-
Accelerating Inference at Scale: Crusoe's Experience with AMD
Accelerating Inference at Scale: Crusoe's Experience with AMD
As a customer and operator of AMD technology, Crusoe’s Managed Inference team has built a production inference stack designed for speed, efficiency, and scale. This session will show how AMD Instinct, including MI355X, helped shape its serverless inference offering and what teams can apply when building production AI services that balance performance, memory bandwidth, and cost.;As a customer and operator of AMD technology, Crusoe’s Managed Inference team has built a production inference stack designed for speed, efficiency, and scale. This session will show how AMD Instinct, including MI355X, helped shape its serverless inference offering and what teams can apply when building production AI services that balance performance, memory bandwidth, and cost.
July 23, 2026
-
Efficient LLM Serving at Scale with Unified Caching (vLLM+LMCache)
Efficient LLM Serving at Scale with Unified Caching (vLLM+LMCache)
This is an advanced user hands-on workshop to show TensorMesh and AMD enabling efficient LLM serving through an unified caching layer. You will learn how tiered KV cache management can brings out the benefits of cache-aware inference, improving throughput under interactive latency SLAs, reducing TTFT through KV cache reuse/offload & enabling production-style distributed inference on Instinct GPUs.;This is an advanced user hands-on workshop to show TensorMesh and AMD enabling efficient LLM serving through an unified caching layer. You will learn how tiered KV cache management can brings out the benefits of cache-aware inference, improving throughput under interactive latency SLAs, reducing TTFT through KV cache reuse/offload & enabling production-style distributed inference on Instinct GPUs.
July 23, 2026