No doubt anyone who has studied basic computer science has spent time devising a sorting algorithm – code that takes an unordered list of items and puts them in ascending or descending order. It’s an interesting challenge because there are so many ways to do this and because people have spent a lot of time figuring out how to do this sorting as efficiently as possible.
Sort is so basic that the algorithms are built into most standard libraries of programming languages. And in the case of the C++ library used with the LLVM compiler, the code hasn’t been touched in over a decade.
But Google’s DeepMind AI group has now developed a reinforcement learning tool that can develop highly optimized algorithms without first being trained on human code examples. The trick was to prepare him to treat programming as a game.
It’s all a game
DeepMind, among other things, is known for developing software that teaches itself how to play games. This approach has proven highly effective, conquering games as diverse as chess, He goesAnd StarCraft. While the details vary depending on the game it’s dealing with, the program learns by playing itself and discovers options that allow it to maximize the score.
Because it has not been trained on games played by humans, the DeepMind system can figure out ways to play games that humans haven’t thought of. Of course, since it always plays against itself, there are instances where it has developed blind spots that humans can exploit.
This approach is closely related to programming. Great language paradigms write efficient code because they’ve seen plenty of human examples. But because of that, they are very unlikely to develop something humans haven’t done before. If we’re looking to improve well-understood algorithms, such as sort functions, basing something on existing human code will, at best, get you equivalent performance. But how do you get an AI to really select a new approach?
The folks at DeepMind have taken the same approach they did with chess and He goes: They turned code optimization into a game. The AlphaDev system developed x86 compilation algorithms that treated code latency as a hit and tried to minimize that hit while ensuring that the code ran to completion without errors. Through reinforcement learning, AlphaDev gradually develops the ability to write highly efficient and robust code.
Inside AlphaDev
Saying that the system improves latency is quite another matter than explaining how it works. Like most other complex AI systems, AlphaDev is made up of several distinct components. One is the representation function, which tracks the overall performance of the code as it is being developed. This includes the overall structure of the algorithm, as well as the use of x86 registers and memory.
The system adds assembly instructions individually, selected by a Find the Monte Carlo tree– again, an approach borrowed from gaming systems. The “tree” aspect of this approach allows the system to quickly narrow down to a limited area from a large range of possible instructions, while Monte Carlo adds a degree of randomness to the exact instructions that are chosen from that branch. (Note that “help” in this context includes things like the specific records chosen to create a valid, complete assembly.)
The system then evaluates the state of the assembly code for latency and validity and assigns it a score, and compares this to the score of the previous score. And through reinforcement learning, it relates information about how the different branches of the tree work, given the state of the program. Over time, you “learn” how to achieve a winning game condition – sort completed – with maximum score, which means minimum latency.
The main benefit of this system is that training it does not have to include any code examples. Instead, the system generates its own code examples and then evaluates them. In the process, it is hung up on information about which instruction sets are effective in sorting.
Useful code
Sorting in complex programs can handle large, arbitrary groups of items. But at the level of standard libraries, they are built from a large set of very specific functions that handle only one situation or few cases. For example, there are separate algorithms for sorting three items, four items, and five items. And there is another set of functions that can handle an arbitrary number of items up to a maximum – meaning you can call one that sorts up to four items, but no more.
DeepMind has set AlphaDev on each of these functions, but they operate very differently. For functions that handle a specified number of elements, it is possible to write code without any branches where it executes different code based on the state of the variable. As a result, the performance of this code is generally proportional to the number of required instructions. AlphaDev was able to shave all Sort-3, Sort-5, and Sort-8 instructions, and even more so than Sort-6 and Sort-7. There was only one (rank 4) where he couldn’t find a way to improve the human code. Repeated running the code on actual systems showed that fewer instructions resulted in better performance.
Sorting a variable number of entries involves branching in the code, and different processors have different amounts of hardware dedicated to handling these branches. Therefore, the code was evaluated based on its performance on 100 different devices. Here again, AlphaDev has found ways to squeeze in extra performance, and we’ll take a look at how to do it in one situation: a function that sorts up to four items.
In the current implementation in the C++ library, the code runs a series of tests to see how many elements it needs to sort and calls the custom sort function for that number of elements. The revised code does something even stranger. Tests if there are two elements and calls a separate function to sort them if necessary. If the count is greater than two, the code calls to sort the first three. If there are three elements, the results of this sort will be returned.
However, if there are four items to sort, it runs specialized code that is very efficient at inserting a fourth item into the appropriate place within an array of three sorted items. This seems like an odd approach, but it’s consistently outperformed my existing code.
in production
Because AlphaDev produced more efficient code, the team wanted to reintegrate it into the LLVM standard C++ library. The problem here is that the code was in assembly rather than C++. So, they had to work backwards and figure out which C++ code would produce the same assembly. Once this was done, the code was merged into the LLVM toolchain – the first time some of the code had been modified in over a decade.
As a result, researchers have estimated that AlphaDev code is now being executed trillions of times a day.
Nature, 2023. DOI: 10.1038 / s41586-023-06004-9 (about DOIs).
“Certified food guru. Internet maven. Bacon junkie. Tv enthusiast. Avid writer. Gamer. Beeraholic.”
More Stories
Nintendo is launching a music app with themes from Mario and Zelda, and more importantly, a Wii Shop channel
The Google Pixel Tablet 3 will take another step towards replacing your laptop
Apple still excels at building the best computers