OpenXLA provides flexibility for ML applications


Machine learning developers have gained new capabilities to develop and run their ML programs on the framework and hardware of their choice thanks to the OpenXLA Project, which today announced the availability of key open source components.
Data scientists and ML engineers often spend a lot of time optimizing their models to work on specific hardware targets. Whether working in a framework like TensorFlow or PyTorch and targeting GPUs or TPUs, there was no way to avoid this manual effort that consumed valuable time and made it difficult to move applications later.
That’s the general problem targeted by the folks behind the OpenXLA Project, founded last fall and now including Alibaba, Amazon Web Services, AMD, Apple, Arm, Cerebra Systems, Google, Graphcore, Hugging Face , Intel, Meta and NVIDIA. as members.
By creating a unified machine learning compiler that works with many ML development frameworks, hardware platforms, and runtimes, OpenXLA can accelerate the delivery of ML applications and provide greater code portability.
The group announced today that three open source tools are available as part of the project. XLA is an ML compiler for CPUs, GPUs and accelerators; StableHLO is a set of operations for high-level operations (HLO) in ML that provides portability between frameworks and compilers; while IREE (Intermediate Representation Execution Environment) is an end-to-end MLIR (Multi-Level Intermediate Representation) compiler and runtime environment for mobile and edge deployments. All three can be downloaded from the OpenXLA GitHub site
Initial frameworks supported by OpenXLA include TensorFlow, PyTorch, and JAX, a new Google framework JAX is for transforming numerical functions and is described as combining a modified version of autograd and TensorFlow while following the structure and workflow of NumPy . Initial hardware targets and optimizations include Intel CPU, Nvidia GPU, Google TPU, AMD GPU, Arm CPU, AWS Trainium and Inferentia, Graphcore IPU, and Cerebras Wafer-Scale Engine (WSE). OpenXLA’s “target-independent optimizer” targets Albebra functions, op/kernel fusion, weight update splitting, whole graph layout propagation, scheduling, and SPMD for parallelism.
OpenXLA translation products can be used with a variety of ML use cases, including full training of massive deep learning models, including large language models (LLM) and even generative computer vision models such as Stable Diffusion. It can also be used for inference; Waymo is already using OpenXLA to make real-time inferences for its self-driving cars, according to a post on Google’s open source blog today.

The OpenXLA compiler ecosystem provides portability between ML development tools and hardware targets (Image source OpenXLA Project)
OpenXLA members have shown some of their early success with the new compiler. For example, Alibaba claims that using OpenXLA it was able to train a GPT2 model 72% faster on Nvidia GPUs and 88% faster on the Swin Transformer training task on GPUs.
Hugging Face, meanwhile, said it saw about a 100% speedup when pairing XLA with its text generation model written in TensorFlow. “OpenXLA promises standardized building blocks upon which we can build much-needed interoperability, and we look forward to following along and contributing!” said Morgan Funtowicz, head of machine learning optimization at the Brooklyn, New York-based company.
According to PyTorch lead maintainer Soumith Chintala, Facebook was able to achieve “significant performance improvements on important projects,” including using XLA on PyTorch models running on cloud-based TPUs.
Chip startups are happy with XLA, which reduces the risk of customers adopting relatively new, unproven hardware. “Our IPU compiler has been using XLA since it was released,” said David Norman, director of software engineering at Graphcore. “XLA’s platform independence and stability provide an ideal platform for introducing novel silicon.”
“OpenXLA helps extend user reach and time-to-solution by providing the Cerebras Wafer-Scale Engine with a common interface to higher-level ML frameworks,” said Andy Hock, vice president and chief product officer at Cerebras. “We are extremely excited to make the OpenXLA ecosystem available for even wider community engagement, contribution and use on GitHub.”
AMD and Arm, which are fighting with the major chip manufacturers for parts of the ML training and serving the pies, are also happy members of the OpenXLA Project.
“We value projects that deliver open governance, flexible and broad applicability, cutting-edge features, and peak performance, and we look forward to continued collaboration to expand the open source ecosystem for ML developers,” said Alan Lee, AMD’s vice president of software development. . he said in the blog.
“The OpenXLA Project marks an important milestone on the road to simplifying ML software development,” said Peter Greenhalgh, VP of Technology and Associate at Arm. “We fully support the OpenXLA mission and look forward to leveraging OpenXLA’s stability and standardization across Arm’s Neoverse hardware and software roadmaps.”
Interestingly, IBM, which continues to innovate chips with its Power10 processor, and Microsoft, the world’s second largest provider behind AWS, are also absent.
Related articles:
Google announced the open source ML translation project OpenXLA
AMD joins the New PyTorch Foundation as a founding member
Intel’s nGraph, a universal deep learning translator
Alibaba, Amazon Web Services, AMD, Apple, ARM, Cerebras Systems, google, Graphcore, Hugging Face, Intel, Meta, NVIDIA