Editor's Note: Recently, Google to join the neural network algorithm to search the phone side. In order to make the search more intelligent, Google employs 100 linguistics PhDs around the world, day and night to mark text data, to train neural networks. While unsupervised learning has been a hotspot for some time, Google has yet to get out of the dilemma of manually manipulating data.
"What is the world's fastest bird?" & Rdquo;
Google will tell you: "peregrine falcon. According to YouTube, Peregrine Falcon was recorded at speeds of up to 389 km / h. & Rdquo;
This is indeed the correct answer, but it does not come from Google's database. When you enter this question, Google search engine to find a description of the world's five fastest birds Youtube video. Then it only the fastest "a" bird information extracted, not to mention the other four.
This is the latest technological advance for Google Search. In order to answer these questions, Google needs the help of deep neural network. As one of the AI technology, it is not only reshape Google search engine, Google is also a full set of artificial intelligence services innovation. Other Internet giants, of course, have also been affected, such as Facebook and Microsoft.
Deep neural network is a pattern recognition system. It can analyze massive data and learn how to handle specific tasks. In this example, it learns how to find the relevant sentence or passage in the long text on the web, and then extract the points presented to you.
Mobile-side Google search just on-line this "sentence compression algorithm" (sentence compression algorithms). This is very simple for humans, but the task of the traditional machine is difficult, and finally be able to complete the AI system. This shows that in-depth learning is promoting the development of natural language understanding (understanding and responding to human language).
"You have to use the neural network algorithm," says David Orr, a product manager at Google R & D, "because it's the only way we've found it." & Rdquo;
In order to train the neural network algorithm, Google has hired about a hundred linguistics PhDs around the world to process the data and artificially screen them. In fact, Google's system is to learn from humans, how to extract large sections of useful information in the text. And this process needs to be repeated over and over - mdash; This is a great depth of learning restrictions. Hiring a large number of linguists to screen data is cumbersome and extremely expensive, but in the short term Google has no other way.
"Golden data" and "Silver data" and "
Google also uses expired news to train the AI Q & A system. This makes the AI gradually understand the news headlines is how the main body of the article summarized. But that does not mean that Google does not need lots of linguists. They not only demonstrate sentence compression, but also mark the different parts of the statement to help the neural network understand how human language works. David Orr to the Google linguist team processing data known as "golden data", expired news is "silver data". "Silver data" role is not small, because of its large volume. But the greatest value or "golden data", they are the core of AI training. Linguist team leader Linne Ha said that in the foreseeable future, linguists will continue to expand.
Such artificial learning needs artificial AI is "supervised learning" (supervised learning), at present, the neural network is so the operation. Sometimes the company will be the business of crowdsourcing, and sometimes it will spontaneously. For example, Internet users around the world have added the "cat" tag to millions of cat photos, which will allow the neural network to learn to recognize cats as easily - mdash; & mdash; training data has been processed. But in many cases, researchers have no choice but to add tags to the data again and again.
In the long run, manually tagging data is not feasible, says Chris Nicholson, founder of Skymind, a spin-off startup. He said: "The future will not be the case." This is an extremely boring job. I can not think of a more boring PhD than this. & Rdquo;
The shortcomings of supervised learning go far beyond this: Unless Google engages language linguists in all languages, the system will not work in other languages. The team of linguists now spans 20 to 30 languages. Google must be in the future one day, to take a more automated AI training, that is, "unsupervised learning" (unsupervised learning).
By then, the machine will be able to learn from the data without manual labeling. The amount of digital information on the Internet can be directly used for neural network learning. Big giants like Google, Facebook, and OpenAI have already begun research in this area, but its practical use is still far away. Now, AI learning still needs behind the scenes a large number of linguists.