From January 28 to January 30, 2018, EmTech China, the emerging technology summit of MIT Review, was held in Beijing International Trade Center Hotel. Continuing the tradition of the MIT Review Summit, EmTech China in 2018 will also focus on emerging technologies where artificial intelligence, blockchain, quantum computing, sustainable energy, biomedical and autopilot become The key words of this conference.
Oriol Vinvals, a researcher at Google DeepMind, published a paper titled "AI vs. Starcraft Odds? "Speech. He put forward the importance of data and tasks in the field of machine learning and artificial intelligence, introduced the breakthrough achievement based on AlphaGo in deep intensive learning, and now the players in how machine learning overcome the "StarCraft II" are scientists Research topic.
The following is a keynote speech by Google DeepMind research scientist Oriol Vinvals, refined by the articles business review, there are cuts:
First of all, I would like to introduce what scientists are doing while studying machine learning and artificial intelligence. I want to share with you not the algorithm, but the data. Data and tasks are very important. We have to confirm what our technology frontier is and what our mission is. Then we can find a reasonable matrix to deal with these problems.
There is a very interesting phenomenon, not a long time to make a major breakthrough in technology, because such a breakthrough can be achieved only by finding a suitable algorithm. We have made many milestones in speech recognition and image classification, and we also use machine translation technology to bridge the gap between humans and machines. We also have a lot of applications in Generative Networks, such as producing celebrity faces based on photos, zebra based on normal horses, winter pictures based on summer landscapes, and more. These aspects of the interpretation of the picture can be said that has been very successful.
Next I would like to discuss with you the depth of intensive learning. Here are some groundbreaking research we've done based on AlphaGo in the last few years.
There is still some difference between the intensive learning and supervised learning and the human learning style. For example, for the observation itself, the observation of the algorithm requires the environment. Without adequate observation, the early humanoid robots often fail to cope with falls while encountering obstacles. We will think, can we build a simulation environment to train these robots? If we have a good environment, we can go training it first. In other words, we must have a perfect environment to achieve our goal. To this end, we have established a virtual scene, and to improve its simulation as much as possible.
Only in such an enhanced environment can we make further progress. For example, mentioned application scenarios, we often think of the game. People are always careful when designing games to ensure that players have a smart experience. For example, GoGo, with AlphaGo's 3000-year history, is a challenging environment because no single solution ensures the best results. Of course, we can also integrate different capabilities so that they can play different games, for example by training the robot to learn to play chess.
We also have a special Go-Go algorithm, at which point the goal becomes more complicated and the gameplay gets more complicated. Currently there is not a single machine that can play this game well by searching strategy.
How does AlphaGo play this game? It is to strengthen learning. Our neural network can learn some features automatically from the data. So that we can make it look at the board and see how humans go, the board also shows wins and losses. In other words, we do not need to expand the entire network to showcase the win-lose approach, and you can make good simulations by expanding part of the network. This is a good break.
But this is not particularly good. Because we learn from a human perspective, we use data sets to train. Later, we ran the game randomly. After a game, AlphaGo learned how the game was going and adjusted the entire network to learn to play chess.
These networks are constantly evolving during game play. AlphaZero randomly moves chess. After a few days of training, learn to walk professional players.
So, our first version of AlphaGo defeated Fan Hui, and later the next version played against Lee Seok in South Korea and won. Then we further train the network, the entire network is three times stronger than before, won Ke Jie and other professional players. We started from scratch, accumulated a little bit of data training, and finally defeated professional chess players.
In addition, we are more interested in the game StarCraft II. This is also a very interesting and complex game, which basically builds buildings and units that compete with one another on the same map. In this game, even if it is just building a building, also need to make a lot of decisions. In addition, we also need to constantly collect and utilize resources, construct different buildings and expand continuously, so the whole game is very challenging.
The method we use in this game is still intensive learning. We want to mimic the way humans play this game, but even mimicking human behavior when they click the mouse and tap the keyboard can be very difficult. To this end, we have introduced a game engine.
The biggest difference from the Go mission is that Go can see the entire board, but in StarCraft II we usually can not see the entire map and we need to send a soldier out to investigate. And the game is uninterrupted. The whole game even has more than 5000 steps of operation. And for enhanced learning, in addition to the normal movement up, down, left, and right, we found it very difficult to control the movement and behavior of different objects with the mouse click interface. We have released this environment so everyone can participate. We also released a related report, which is basically an open source platform, we can test our algorithm.
We have not finished the game but have done the more important 7 operations for StarCraft II, such as selecting a unit and letting it move past. The algorithm we use can do this, and basically the same effect as human players. Other tasks, such as building buildings and collecting resources, are still harder. The algorithm we test, the performance will be better than a random environment, but still far from the professional players.
Our first version was released on the Linux platform. I may be the first person to play StarCraft games on Linux. Our enhanced learning is still very good, so we can watch the game directly from a human perspective. As we just said, we can think of the map as a 40 x 60 pixel. Starting from the pixel to judge, in fact, can help us better understand how the machine is playing the game, although the machine has no way completely like humans.