The rumors are true: Microsoft has built its own custom AI chip that can be used to train large language models and potentially avoid costly reliance on Nvidia. Microsoft has also built its own Arm-based CPU for cloud workloads. Both custom chips are designed to power its Azure data centers and prepare the company and its enterprise customers for an AI-filled future.
Microsoft’s Azure Maia AI chip and Arm-powered Azure Cobalt CPU will arrive in 2024, on the back of increased demand this year for Nvidia H100 GPUs that are widely used to train and run generative image tools and large language models. These GPUs are in high demand and some even bring them in Over $40,000 on eBay.
“Microsoft actually has a long history in silicon development,” explains Rani Borkar, head of hardware systems and Azure infrastructure at Microsoft, in an interview. the edge. Microsoft has been collaborating on silicon for Xbox for more than 20 years, and has also co-designed chips for Surface devices. “These efforts are based on that experience,” Burkar says. “In 2017, we started designing our cloud hardware portfolio and started on that journey to put us on track to build our new custom chips.”
The new Azure Maia AI chip and Azure Cobalt CPU were built internally at Microsoft, as well as an overhaul of the entire cloud server stack to improve performance, power and cost. “We’re rethinking cloud infrastructure for the age of AI, and we’re literally optimizing every layer of that infrastructure,” Burkar says.
The Azure Cobalt CPU, named after the blue tint, is a 128-core chip built on the Arm Neoverse CSS design and custom made by Microsoft. It is designed to run public cloud services on Azure. “We put a lot of thought into not only making it high-performance, but also making sure we take care of energy management,” Borkar explains. “We made some very deliberate design choices, including the ability to control performance and power consumption per core and on each VM.”
Microsoft is currently testing its Cobalt CPU on workloads such as Microsoft Teams and SQL server, with plans to make virtual machines available to customers next year for a variety of workloads. Although Borkar won’t be making direct comparisons with Amazon’s Graviton 3 servers available on AWS, there should be some noticeable performance gains over the Arm-based servers Microsoft currently uses for Azure. “Our initial testing shows that our performance is up to 40 percent better than what is currently seen in our data centers using commercial Arm servers,” Burkar says. Microsoft is not sharing full system specifications or benchmarks yet.
Microsoft’s Maia 100 AI accelerator, named after a bright blue star, is designed to power cloud AI workloads, such as training large language models and inference. It will be used to run some of the company’s largest AI workloads on Azure, including parts of a multibillion-dollar partnership with OpenAI where Microsoft runs all of its OpenAI workloads. The software giant is collaborating with OpenAI in the design and testing phases of Maia.
“We were excited when Microsoft first shared its designs for the Maia chip, and we worked together to improve and test them with our models,” says Sam Altman, CEO of OpenAI. “Azure’s end-to-end AI architecture, now optimized down to silicon with Maia, paves the way to train more capable models and make those models cheaper for our customers.”
Manufactured using TSMC’s 5-nanometer process, the Maia has 105 billion transistors — about 30 percent fewer than the 153 billion in AMD’s Nvidia competitor, the MI300X AI graphics processor. “Maia supports our first implementation of 8-bit subtypes, MX data types, for co-designing hardware and software,” Burkar says. “This helps us support faster model training and inference times.”
Microsoft is part of a group including AMD, Arm, Intel, Meta, Nvidia, and Qualcomm that is working to standardize the next generation of data formats for AI models. Microsoft relies on collaborative and open work for Open Computing Project (OCP) To adapt entire systems to the needs of artificial intelligence.
“Maia is Microsoft’s first fully liquid-cooled server processor,” Borkar reveals. “The goal here was to enable a higher density of servers with greater efficiency. As we reimagine the entire stack, we are intentionally thinking about each layer, so these systems will actually fit into our existing data center footprint.
This is key for Microsoft to spin up these AI servers more quickly without having to make room for them in data centers around the world. Microsoft has built a unique rack to place Maia server boards in, with a liquid cooler “holder” that acts like the cooler you’d find in your car or a fancy gaming PC to cool the surface of the Maia chips.
Besides sharing MX data types, Microsoft also shares its rack designs with its partners so they can use them on systems that have other silicon chips inside them. But Maia chip designs won’t be shared more widely, as Microsoft keeps them in-house.
Maia 100 is currently being tested on GPT 3.5 Turbo, the same model that runs ChatGPT, Bing AI, and GitHub Copilot workloads. Microsoft is still in the early stages of deployment, and like Cobalt, isn’t ready to release exact Maia specifications or performance benchmarks just yet.
This makes it difficult to decipher exactly how the Maia compares to Nvidia’s popular H100 GPU, the recently announced H200, or even AMD’s latest release. MI300X. Borkar did not want to discuss comparisons, instead reiterating that partnerships with Nvidia and AMD remain very key to the future of the Azure AI cloud. “At the scale at which the cloud operates, it’s really important to optimize and integrate every layer of the stack, to maximize performance, diversify the supply chain, and frankly give our customers infrastructure choices,” Burkar says.
Diversifying supply chains is important for Microsoft, especially since Nvidia is the main supplier of AI server chips right now and companies have been racing to buy these chips. Estimates You have a suggestion OpenAI needed more than 30,000 of Nvidia’s older A100 GPUs to market ChatGPT, so Microsoft’s own chips could help lower the cost of AI for its customers. Microsoft also developed these chips for its own Azure cloud workloads, not to sell to others like Nvidia, AMD, Intel, and Qualcomm.
“I look at this as a complement to them, not a competitor,” Burkar insists. “We have both Intel and AMD in our cloud computing today, and likewise on AI, we are announcing AMD where we already have Nvidia today. These partners are very important to our infrastructure, and we really want to give our customers choices.”
You may have noticed the Maia 100 and Cobalt 100 being named, indicating that Microsoft is already designing second-generation versions of these chips. “This is a series, it’s not just 100 episodes… but we’re not going to share our roadmaps,” Burkar says. It’s not clear how often Microsoft will introduce new versions of Maia and Cobalt yet, but given the speed of AI, I wouldn’t be surprised to see the successor to the Maia 100 arrive at a similar pace to Nvidia’s H200 announcement (about 20 months).
The key now will be how quickly Microsoft activates Maia to accelerate the deployment of its broad AI ambitions, and how these chips will impact the pricing of using cloud AI services. Microsoft isn’t ready to talk about the price of this new server just yet, but we’ve already seen the company quietly launch Copilot for Microsoft 365 for a premium of $30 per month per user.
Copilot for Microsoft 365 is limited to only Microsoft’s largest customers for now, with enterprise users having to commit to at least 300 users to get on the list for the new AI-powered Office assistant. As Microsoft moves forward with rolling out more Copilot features this week and rebranding Bing Chat, Maia could soon help balance demand for the AI chips that power these new experiences.
“Web maven. Infuriatingly humble beer geek. Bacon fanatic. Typical creator. Music expert.”