Home > News content

Microsoft Li Di: Why is Xiao Bing the most difficult cultivation?

via:博客园     time:2019/6/24 16:50:34     readed:122

After Xiao Bing learned to draw, is there another skill?

Text: Wang Jinwang

“Little Ice Learning Painting is the longest and hardest project ever undertaken by the team I led. "The Microsoft Xiaobing R&D team said at the Microsoft Xiaobing Artificial Intelligence Creation Media Briefing in May this year.

The painting model regarded by Microsoft's Xiaobing R&D team as the longest and most difficult project is Microsoft Xiao Bing's model for visual ability, and the third type of AI model after text and speech. According to official data, this model can independently complete 100% original paintings by studying the paintings of 236 human painters in the past 400 years of art history, which is close to the level of professional human painters.

At the same time, this model is different from the previous two types of content creation models (text-based, speech-based models), both in terms of the technical and complex nature of model design, as well as from the application and productization. Of course, it is not completely different.

In terms of the similarities and differences between the three (text, voice, and visual models), Li Di, vice president of Microsoft (Asia) Internet Engineering Institute and head of Microsoft's Xiaobing global product line, concluded in an interview with Lei Feng. It is extremely similar and the details are completely different. ”

However, why did Microsoft build a painting model for Xiao Bing? Where is the technical difficulty of the painter Xiao Bing project? What is the difference between the Xiaobing painting model and the previous two model technologies? What kind of logical thinking does the young painter Xiao Bing have? ……

With these questions, Lei Feng came to the headquarters of Microsoft China R&D Group and sought answers from Li Di, vice president of Microsoft (Asia) Internet Engineering Institute and head of Microsoft's Xiaobing global product line.


Vice President of Microsoft (Asia) Internet Engineering Institute, Li Di, head of Microsoft's Xiaobing Global Product Line

The original idea about the painter Xiao Bing

Microsoft Xiaobing is an artificial intelligence system created by Microsoft. The difference is that when building a model for Microsoft Xiaobing, it is to find the landing needs in the industry, and then launch a conceptual model to overcome the technology. The final step is to measure Production model promotion. The overall logic can be seen as: industrial demand & mdash; — conceptual model & mdash; & mdash; mass production model.

Li Di told Lei Feng. Microsoft first saw the demand for text generation in the financial field. Then there was the conceptual model of Xiao Bing (poet Xiao Bing), and then there was a text generation model that was extended to relevant application fields. In the design of the pattern design and so on, there is a painting model (painter Xiao Bing).

Lei FengwangWhy did Microsoft choose to build such a painting model for Xiao Bing, what was the original idea?

Li Di:In the branch of Microsoft Xiaobing artificial intelligence creation, what we see outside is that we have a conceptual model first, such as writing poetry, but this is often not the case.

In fact, we first found a mass production plan in the real industry. For example, Microsoft saw the demand for financial text generation in the real industry and formulated corresponding plans. At the same time, we will find a conceptual model in this field ( Poet Xiao Bing), in the process of conquering this model, we will get a lot of technical accumulation, so that we can do this mass production model (financial text generation model).

Previously, we carried out model design and industrialization for speech and text. This time, the painting model for visual aspects is also reversed by the product demand completed by such a concept, so as to build a conceptual model and mass production model.

Lei Feng. According to data released by Microsoft officials at the press conference, Xiao Bing constructed this painting model by studying the paintings of 236 human painters. What are the data of these trainings (pictures of 236 human painters)? An age distribution?

Li Di:Between the past 400 years and the past 200 years, we have consciously circumvented contemporary artists.


The painting created by the girl painter Xiao Bing with the title "One person's Beijing"

Lei Feng.com: Xiao Bing's paintings are more abstract. Why do you choose such a style of painting instead of the more popular contemporary paintings?

Li Di:On the one hand, our model choice is inseparable from industrial application and content industry; on the other hand, art is not needed by artificial intelligence, but artificial intelligence has the characteristics of high concurrency and stable quality, which can correspond to the content industry. The content industry is necessary.

Xiaobing's painting style basically covers classical to abstract styles. This is similar to the reason why we chose modern poetry when we built the Xiaobing poet model at that time, and did not choose ancient poetry. Ancient poetry is more regular, which has limited value for the mass production models we have to do at the time (such as lyrics generation, financial text generation).

The painting model corresponds to the industrial application, such as the product design field of textile design. If the textile design is made with contemporary art form, the quantity is too small to form a scale effect. This type of design is more suitable for human artists to complete. The abstract and classical art forms of painting are more inherited in the design of textile patterns.

Three models + traceability algorithm, propped up the small ice painting model hard core

When it comes to the hard core of the Xiaobing painting model, it is natural to have its emotional computing framework. Among them, artificial intelligence creation is a branch of Xiao Bing's emotional computing framework. For artificial intelligence to create this branch, Li Di divides it into two branches, “One branch is to climb the peak of artistic concept, such as singing, writing poetry, painting, etc.; another branch is engineering mass production (emphasis on content industry) For example, financial text generation, radio programs, audio books, and financial industry text generation, for example, 90% of financial traders in China use our financial text generation model. ”

The Xiaobing painting model belongs to the former and is an AI model in the art field. As mentioned in the previous article, Microsoft's thinking is to use this trained AI model to scale to adapt to multiple industry needs, similar to a more complex general model. the concept of.

In the field of AI painting models, Xiao Bing's painting model is not the first. As early as the auction of Christie's in October 2018, the painting created by AI was Edmond Belamy ("Edmund · Bellamy Portrait") Once auctioned at Christie's, the price is expected to be between $7,000 and $10,000, with an actual turnover of $432,500. His creative team, Obvious, has created 11 paintings using GAN (Generative Adversarial Networks).

One of Obvious's team members, Caselles-Dupré previously said: "The system consists of two parts, one on the other side and the other on the other side. We provided the system with 15,000 portrait data sets from the 14th to the 20th century. The generator generates a new image from this data set, and then the discriminator attempts to identify the difference between the portrait of the human painting and the image created by the generator. Our goal is to fool the discriminator and let it think that the new image produced is a real portrait and get such a painting. ”


AI painting of $432,500 at Christie's on October 25, 2018

Lei Feng.com: What technical problems have the Xiaobing painting model solved in the R&D process?

Li Di:In 2018 Christie's produced an AI painting, which was generated using GAN. It is customary to understand that the Xiaobing painting model uses a variety of GAN hybrid models: one model to solve the generation of specific elements, one model to complete the composition, one model to complete the application of color and the interpretation of the proposition, so, in the small ice painting In the paintings of the model creation, a bird and a horse in the picture are completely generated by the painting model.

A painting model generated by GAN is to transfer the content of the painting to the new work. The creation of a painting model generated by GAN is a quantitative problem. If the content of the painting does not look good enough, perhaps because the model does not converge.

Xiao Bing's painting model solves the convergence problem by merging the three models. These three can be well integrated. This difficulty is actually quite big.

Lei Fengwang: When Dr. Song Ruihua introduced Xiao Bing's poetry model, he said that when training Xiao Bing to write poetry, training Xiao Bing to write poetry requires modern poetry for 519 poets, reading 10,000 times and reading 10,000 yuan. Through the use of hierarchical recursive neuron models to polish the language of poetry, what adjustments have been made after the formation of the Xiaobing painting model?

Li Di:The data training of the Xiaobing painting model and the writing poetry model has many similarities in form, including the number of trainings are very close. The difference is that we have also added a judgment function —— to determine whether Xiao Bing's paintings are traceable.

After the training of Xiaobing's poetry model, every poem written by Xiao Bing is a text. You don't need to see who is the style of this text. The style of her text is a more uniform style. Painting is different. The Xiaobing painting model now has about 30% of the paintings. It is quite clear that the painting style (such as Monet and Rembrandt's painting style) is traceable.

In other words, Xiao Bing learned a commonality for each poet, but the study of each painter, due to the difference in the art of painting, is equivalent to learning the skills of these painters. So it will involve how to judge that I learned the painter's technique.

Lei Feng.com: We have already formed models for text, voice and visual aspects and made external technical output. What are the similarities and differences between the three technologies?

Li Di:Simple can be understood as the concept is very similar, the details are completely different.

Including the use of these three technologies to complete the creation of artificial intelligence, the details are very different, the details of painting and the details of singing can be said to be very different, the specific problems to be solved, including model problems, engineering problems are also very different, but the concept it's the same.

The same thing includes that all three need an excitation source.

Writing a poetry model requires an excitation source, and the model training process is to solve how the model produces appropriate results for the excitation source. Human beings write poetry first with a proposition, then create it, and feel it; Xiao Bing writes poetry with the picture as the excitation source, and obtains enough information from the picture to stimulate Xiao Bing to create. Painting models are similar. They are created by inputting a piece of text or providing other sources of information. Painting, writing poetry, and music creation are all like this. There is a need for an excitation source.

The differences include the data types of the three, and the method of solving the problem is different in details.

For example, what you are solving in music is your fundamental frequency, the degree of harmony, and the prediction of a syllable. What needs to be solved is a sequence problem. Painting is very different in terms of data types, and it is necessary to solve the problem of data on color and spatial composition.

The logical thinking of the young painter Xiao Bing

A week after the official release of the Microsoft Xiaobing painting model, the young painter Xiao Bing was also launched as a skill in the form of a small program and an H5 page. When painting through the girl painter Xiao Bing, in the 3-minute waiting time, the screen will display “Extract images, stimulate creative inspiration, select content themes, try to compose the picture, draft the line drawing, paint the bottom layer, and deepen the picture. The details are repeatedly polished & rdquo; eight steps.

The basic theory of deep learning tells us that big data brings about correlations, not causal relationships. AI is more of a "black box" model in the creative process. If you want to understand the working principle in detail, it is also after the model is built. The researchers reversed the results.

What kind of logical thinking will the girl painter Xiao Bing have?


Girl painter Xiao Bing shows the painting steps on the APP side

Lei Feng.com: After the release of the Xiaobing painting model, Microsoft officially launched the young girl painter Xiao Bingxiao, which can generate a painting in three minutes. It is noted that during the three minutes of waiting, the screen will display “Extracted imagery”. Eight steps, such as inspiring creative inspiration, selecting content themes, and trying to frame the composition, is this the real running logic of the Xiaobing painting model?

Li Di:It must be admitted that part of it is real, and part of it is actually set to increase the fun of the product. For example, as I mentioned earlier, she has three models to complete composition, color, and certain intentions. These are some of the real running logic of the Xiaobing painting model.

Lei Feng.com: What is the logic of the small ice painting model actually generating a piece of work?

Li Di:In the little painter Xiao Bing's small program, it seems to be running serially. In fact, the real logic is very simple and rude. It is to "go in and out" (Lei Feng network note: the model accepts the excitation source to start creating and generating works), and the three models just mentioned also work at the same time. But this makes no sense, so we have added interest.

Algorithm execution requires so much runtime, and some intermediate steps or results are actually formed during the operation, but these results are not enough to come up. Humans draw a picture, and his next version has a bearing relationship with the previous version. For example, the previous version has a background, and the next version is to create some details on the basis; Xiao Bing is not, Xiao Bing The previous version of the painting is completely different from the next version of the painting. The logical relationship behind it is the "black box" principle of deep learning. In fact, there is no way to describe it in other ways.

Where does Xiao Bing's training data come from?

Both the data model and the intelligent search engine are masterpieces of the era of big data. Big data has made the current artificial intelligence, Xiao Bing as one of the mainstream artificial intelligence systems, behind which there is a strong R & D team, but also need to support massive data. Lei Feng also learned that the first generation of Xiao Bing's big data came from Microsoft's search engine. After Xiao Bing was promoted, there were more interaction data with users.

At the same time, Xiao Bing did not build his own hardware equipment, nor did he have much hardware equipment as the main voice assistant. More is the intelligent hardware of Dual AI strategy in other brands, and implanted Xiao Bing's emotional computing framework into cooperation. The ecology of the partners, now smart voice assistants such as Xiaomi can already summon Xiao Bing. However, does the access mode of such a non-master voice assistant affect Xiao Bing's training data set?

Lei Feng.com: Where does the training data for Xiao Bing come from? Is it a search engine? Still have other sources?

Li Di:In the first year and the second year, search engines were the main source of data for our training of Xiao Bing. From the second year onwards, at the time of the third official Xiao Bing conference, Microsoft officially stated that the data used to train Xiao Bing was already half and half of the time. Half of the data came from the interaction between Xiao Bing and the user. The data is gone. Now, we have Xiao Bing on all QQ groups and many other platforms. In addition, Xiao Bing has many aliases, and many third parties (such as radio hosts and singers) are empowered by Xiao Bing.

Therefore, Xiao Bing's data source is not particularly dependent on our search engine. We can obtain interactive data in various forms, and the resulting training results are used to serve a single domain.

Lei Feng.com: Xiaobing has an idea of ​​the external deployment of voice capabilities.

Li Di:When we released the conference last year, we proposed Xiao Bing's Dual AI strategy, namely the dual AI ecosystem, and we are also practicing this commitment. In the future, you may see that the most popular circle of friends is Xiao Bing, and Xiao Bing will be the only artificial intelligence assistant that you can see on every platform so far.

To some extent this is our choice. Microsoft has its own technological advantages in China, but it also has certain limitations in the market. So we choose to continue to develop with an ecological model that is more suitable for us.

Lei Feng.com: Does this model make Microsoft Xiao Bing at a disadvantage in terms of acquiring data capabilities or scene embedding capabilities?

Li Di:These tasks still need to be step by step. In terms of the amount of data, Xiao Bing now has more than 100 million active users per month. In the field of interactive artificial intelligence, 90% of the interaction data on a global scale is available to us. So, so far, we really don't rely on a particular terminal.

Lei Feng.com: How does the Xiaobing painting model obtain the copyright of training data? Does the copyright of our external output have copyright?

Li Di:There is no problem with this aspect of the data. These artists are public data of artists 400 years ago. Xiao Bing’s own original works, in terms of visuals, we have copyright protection, and Xiao Bing’s paintings have a code for each pair. Because visual works are easier to judge, each of our works can be traced. Xiao Bing wrote poetry. At that time, we explicitly gave up the copyright of poetry. We will not give up on painting.

Lei Feng.com: Is it because of the painting, will we do more commercial projects abroad?

Li Di: It is not more commercial, but the copyright of the painting itself is different.

Xiao Bing's emotional computing framework and future development plan

Unlike many AI voice assistants and AI engines that focus on IQ (IQ), Microsoft Xiao Bing pays more attention to EQ (Emotional Intelligence). The hard core part of Microsoft's Xiao Bing is based on its emotional computing framework. The reason why Microsoft will pay attention to Xiao Bing's EQ is the strategic tilt of Microsoft on artificial intelligence, and on the other hand, it is also a kind of real problem for the current market AI development. “迂回”” or “compromise” ;


Microsoft Xiao Bing's emotional computing framework

Lei Feng.com: What do you think is the reason why the current smart speakers do not have the same demand for social software, telephone, and camera on the smart voice application?

Li Di:there are many reasons,I personally think that the main reason is “previous generation suppression”.

The development of smartphones in the mobile Internet era, including the development of social networks and various decentralized apps, has been so successful that the era that followed has not been so successful.

This is a bit like when Chinese DVDs have become very popular. Our seemingly old-fashioned videotapes are not clear enough and large in size, but they are enduring in the US and Japan, which directly curbs the DVD in both countries. popular. why? Because their popularity in the entire industry chain of videotape era, including videotape equipment, is too mature, it directly curbs the development of subsequent DVDs.

For example, many people today try to use a more complex artificial intelligence system to achieve ordering than another artificial intelligence system, but the user is concerned that you are not so convenient, there is no convenient order on the mobile app, because that button will not go wrong. However, there must be a mistake in the dialogue, so it is your mobile Internet era that is too successful, especially in China, the development is too mature, but will stop the development of the next era. This also means that AI needs to spend more time, or it will become higher and mature time will be longer.

Lei Feng.com: What do you think are the key technical breakthroughs in terms of technology and products in the current intelligent voice assistant or AI engine?

Li Di:There are still many shortcomings in the product. Taking smart speakers as an example, smart speakers have at least two different concepts. We often confuse: whether it is a smart speaker, or a cost-effective or a new form of speaker. It sells hardware or AI capabilities. I sometimes go to see some smart speaker conferences. At the press conference, one-third of the time is about sound quality, one-third of the time is about content, and the remaining one-third is about price.

The AI ​​voice assistant in the smart speaker, so far, most of the design is still in a good, less useful, or better voice control. If you use a smart speaker to control the light on and off, it is convenient, but if you only want to turn on the light, turn off the light, and then just repeat it, it is difficult to achieve real communication without emotion, so it is only A speaker with voice control.

When the smart speaker can introduce its AI features at the press conference, instead of introducing the sound quality, content and price, I think this is the solution.

Lei Feng.com: Actually, for this reason (the current intelligence is not smart enough), so Microsoft Xiaobing will pay more attention to the emotional framework and EQ?

Li Di:This industry is slowly developing. Our view is that if you can make an absolutely strong AI engine, if you are an AI today, whether it is a personal assistant or other application, it can be as wise as Einstein, or what mission can be. Finish, then he does not need EQ, and people can accept it. People can accept Einstein EQ lower, but the problem is that you can't do it, so if you don't have EQ again, it won't be much value. So the current plan looks beautiful, but it doesn't really get that good. A good product needs to have a good "experience underlying", and to artificial intelligence this is EQ.

Lei Feng.com: Microsoft Xiaobing AI engine made text, voice, and then visually in the direction of technology research and development. What kind of overall planning will there be?

Li Di:Next, we will be richer in mass production and conceptual models. We have now broken through three areas of text, speech, and vision (models). The next step is to expand coverage in these three areas, so we Affirming the conceptual model of text is not just about writing poetry, there are other things, and the vision is not just about staying in static paintings, but also dynamic. Future application areas will continue to expand, but there will be more in the field.

Note: The “model” (such as text, speech, and visual based models) in this article refers to the “content creation model”.

China IT News APP

Download China IT News APP

Please rate this news

The average score will be displayed after you score.

Post comment

Do not see clearly? Click for a new code.

User comments