Google's DeepMind group brought you to the championship -playing AI's AlphaGo and AlphaGoZero-back with a new, improved and more general version. Double AlphaZero, this program taught three different board games (chess, Go and Shogi, a Japanese form of chess ) in just three days without human intervention. 19659005] A paper describing the performance was just published in Science . "From a completely random game, AlphaZero begins to learn what good games look like and make their own reviews about the game," said Demis Hassabis, CEO and co-founder of DeepMind. "In that sense, it's free from the limitations in the way people think of the game . "
Chess has long been an ideal test opportunity for gaming machines and the development of AI. The first chess program was written in the Los Alamo National Laboratory in the 1950s, and in the late 1960s, Richard D. Greenblats was Mac Hack IV program the first to play in a human chess tournament – and win against a person in tournament play. Many other computer chess programs followed every bit better than the last, to IBM's Deep Blue computer defeated chess master Garry Kasparov in May 1997.  As Kasparov points out in an accompanying editor in Science, these days your average smartphone chess is playing the app far more powerful than Deep Blue. So AI researchers showed up in recent years to create programs that can master the game Go, a hugely popular board game in East Asia dating over 2500 years. It's a surprisingly complicated game, much more difficult than chess, despite just involving two players with a simple set of basic rules. This makes it an ideal test opportunity for AI.
AlphaZero is a direct descendant of DeepMind's AlphaGo, who made headlines worldwide in 2016 by defeating Lee Sodol, the ruling (human) world champion of Go. Not happy to rest on the laurels, AlphaGo received a major upgrade last year and was able to learn self-winning strategies without the need for human intervention. By playing herself again and again, AlphaGo Zero (AGZ) trained to play Go from scratch in just three days and defended the original AlphaGo 100 games to 0. The only bet it received was the basic rules for the game.  The secret ingredient: "Enhancing Learning", which plays for millions of games, allows the program to learn from experience. This works because AGZ is rewarded for the most useful measures (ie design of winning strategies). AI does this by considering the most likely next moves and calculating the probability of winning for each of them. AGZ can do this in 0.4 seconds using only one network. (The original AlphaGo used two separate neural networks: one particular next move, while the other calculated the probabilities.) AGZ only needed to play 4.9 million games to master Go, compared to 30 million games for its predecessor.
"Instead of treating human instructions and knowledge at immense speed, AlphaZero generates its own knowledge."
AGZ was designed specifically for playing Go. AlphaZero generalizes this enhanced learning method to three different games: Go, Chess and Shogi, a Japanese version of chess. According to an accompanying perspective written by Deep Blue team member Murray Campbell, this latest version combines deep reinforcement learning (many layers of neural networks) with a general Monte Carlo three search method.
"AlphaZero learned to play each of the three board games very fast by using a large amount of processing power, 5000 tensor processing units (TPUs), equivalent to a very large supercomputer," Campbell wrote.
"Instead of treating human instruction and knowledge at enormous speed, like all previous chess machines, AlphaZero generates its own knowledge," Kasparov said. "It does this in just a few hours and the results have surpassed any known person or machine "Hassabis, who has long been passionate about chess, says that the program has also developed its own new dynamic play style – a style that Kasparov looks as much as its own.
There are some warnings. As its immediate predecessor, AlphaZero's basic algorithm works really only for problems where there are a number of actions you can take. It also requires a strong model for its environment, that is, game rules. In other words, Go is not the real world: it's a simplified, very limited version of the world, which means it is far more predictable.