Lei Feng network news, according to MIT and Google researchers recently published papers, they are training AI to the image, sound and text and many other information matching.
In terms of voice recognition, image recognition, and weiqi, AI is good enough to even surpass humans. But if AI can only use one perception at a time, and can't match what they see and hear, then it's impossible to completely understand the world around them. That's why researchers from MIT and Google conducted the study.
The researchers did not teach the algorithms anything new, but they created a way to connect and coordinate the knowledge gained from multiple senses. This is critical.
One of the co authors
To train the system, MIT's team first presented audio related video frames to the neural network. After the neural network finds the object in the video and identifies the special audio, the AI attempts to predict which object is associated with the sound. For example, can a wave make a sound?
Next, researchers gave the algorithm a caption like picture in the same way, allowing it to match text and pictures. The network first needs to identify all the objects and related issues in the graph individually before matching them.
Because AI's ability to recognize sounds, images, and text independently is excellent, the network is not so great at first sight. But the researchers said, when they were on the AI sound / image, image / text matching training, the system can in untrained words with different voices, which guide the associated voice and text. This suggests that the neural network has formed a more objective view of what is seen, heard, or read, and that the formation of this view does not entirely depend on the medium in which it is used to understand the information.
By balancing the perception, hearing, and text of an object, the algorithm automatically transforms what is heard into visual images, thus enhancing understanding of the world.
It is reported that, Google also performed a similar study, but Google is more stressed the point that the new algorithm can be written into other forms of media, although the accuracy, it is not a single use of the algorithm.