قالب وردپرس درنا توس
Home / Technology / Move over AlphaGo: AlphaZero learned to play three different games

Move over AlphaGo: AlphaZero learned to play three different games



  From a random game and only knowing the rules of the game, AlphaZero plays a world championship in the games Go, Chess and Shogi (Japanese Chess).
Enlarge / From random games and only to know the rules of the game, AlphaZero defeated a world championship program in the games Go, Chess and Shogi (Japanese Chess).

DeepMind Technologies, Ltd.

Google's DeepMind group brought you to the championship -playing AI's AlphaGo and AlphaGoZero-back with a new, improved and more general version. Double AlphaZero, this program taught three different board games (chess, Go and Shogi, a Japanese form of chess ) in just three days without human intervention. 19659005] A paper describing the performance was just published in Science . "From a completely random game, AlphaZero begins to learn what good games look like and make their own reviews about the game," said Demis Hassabis, CEO and co-founder of DeepMind. "In that sense, it's free from the limitations in the way people think of the game . "

Chess has long been an ideal test opportunity for gaming machines and the development of AI. The first chess program was written in the Los Alamo National Laboratory in the 1950s, and in the late 1960s, Richard D. Greenblats was Mac Hack IV program the first to play in a human chess tournament – and win against a person in tournament play. Many other computer chess programs followed every bit better than the last, to IBM's Deep Blue computer defeated chess master Garry Kasparov in May 1997. [19659005] As Kasparov points out in an accompanying editor in Science, these days your average smartphone chess is playing the app far more powerful than Deep Blue. So AI researchers showed up in recent years to create programs that can master the game Go, a hugely popular board game in East Asia dating over 2500 years. It's a surprisingly complicated game, much more difficult than chess, despite just involving two players with a simple set of basic rules. This makes it an ideal test opportunity for AI.

  From a random game, only basic rules, AlphaZero won a world championship program in the games Go, Chess and Shogi. "Src =" https: // cdn .arstechnica.net / wp content / uploads / 2018/12 / alpha1-640x396.jpg "width =" 640 "height =" 396 "srcset =" https: //cdn.arstechnica .net / wp-content / uploads / 2018/12 / alpha1.jpg 2x
Enlarge / Starting from random games, only knowing the basic rules, AlphaZero turns a world championship into the games Go, Shack and Shogi.

DeepMind Technologies, Ltd.

AlphaZero is a direct descendant of DeepMind's AlphaGo, who made headlines worldwide in 2016 by defeating Lee Sodol, the ruling (human) world champion of Go. Not happy to rest on the laurels, AlphaGo received a major upgrade last year and was able to learn self-winning strategies without the need for human intervention. By playing herself again and again, AlphaGo Zero (AGZ) trained to play Go from scratch in just three days and defended the original AlphaGo 100 games to 0. The only bet it received was the basic rules for the game. [19659005] The secret ingredient: "Enhancing Learning", which plays for millions of games, allows the program to learn from experience. This works because AGZ is rewarded for the most useful measures (ie design of winning strategies). AI does this by considering the most likely next moves and calculating the probability of winning for each of them. AGZ can do this in 0.4 seconds using only one network. (The original AlphaGo used two separate neural networks: one particular next move, while the other calculated the probabilities.) AGZ only needed to play 4.9 million games to master Go, compared to 30 million games for its predecessor.

"Instead of treating human instructions and knowledge at immense speed, AlphaZero generates its own knowledge."

AGZ was designed specifically for playing Go. AlphaZero generalizes this enhanced learning method to three different games: Go, Chess and Shogi, a Japanese version of chess. According to an accompanying perspective written by Deep Blue team member Murray Campbell, this latest version combines deep reinforcement learning (many layers of neural networks) with a general Monte Carlo three search method.

"AlphaZero learned to play each of the three board games very fast by using a large amount of processing power, 5000 tensor processing units (TPUs), equivalent to a very large supercomputer," Campbell wrote.

"Instead of treating human instruction and knowledge at enormous speed, like all previous chess machines, AlphaZero generates its own knowledge," Kasparov said. "It does this in just a few hours and the results have surpassed any known person or machine "Hassabis, who has long been passionate about chess, says that the program has also developed its own new dynamic play style – a style that Kasparov looks as much as its own.

There are some warnings. As its immediate predecessor, AlphaZero's basic algorithm works really only for problems where there are a number of actions you can take. It also requires a strong model for its environment, that is, game rules. In other words, Go is not the real world: it's a simplified, very limited version of the world, which means it is far more predictable.

  AlphaZero only seeks a small part of the positions considered by traditional chess engines. "Src =" https://cdn.ar stechnica.net/wp-content/uploads/2018/12/alpha2-640x361.jpg "width =" 640 "height =" 361 "srcset =" https://cdn.arstechnica.net/wp-content / uploads / 2018 /12/alpha2.jpg 2x
Enlarge / AlphaZero only seeks a small fraction of the positions considered by traditional chess engines.

DeepMind Technologies, Ltd. [19659011] "[AlphaZero] is not going to put chess trainers out of business yet," Kasparov writes. "But the knowledge it generates is information we can all learn from." David Silver, senior researcher of the AlphaZero project, has high expectations for future applications of that knowledge. "My dream is to see the same type of system that is not only used for board games, but for all sorts of real applications, [such as] fabric design, material design or biotechnology," he said.

Poker is a competitor for future AIs to hit. It's really a game of partial information, a challenge for existing AI. As Campbell notes, there have been some programs that are capable of mastering heads-up, no-limit Texas Hold'em, when only two players remain in a tournament. However, most poker games involve eight to 10 players per table. An even bigger challenge will be multiplayer video games, such as Starcraft II or Dota 2 . "They are partially observable and have very large government spaces and action sets, creating problems for Alpha-Zero as reinforcing learning methods," he writes.

One thing seems clear: Chess and Go are no longer the gold standard to test the abilities of AIs. "This work has in practice closed a multi-annual chapter in AI research," writes Campbell. "AI researchers must look at a new generation of games to give the next set of challenges."

DOI: Science 2018. 10.1126 / science.aar6404 (About DOIs).


Source link