The day before, in the Shanghai University of Finance and Economics Interdisciplinary Science Research Institute (RIIS) sponsored by the number of fir Technology Co. Ltd. Co "modern operational research development seminar", Tencent AI Lab (Tencent Artificial Intelligence Laboratory) director Dr. Zhang Tong published a wonderful speech. As a well-known scholar in the field of machine learning, he started to show that there are many common problems in machine learning and operation optimization. Next, he describes the progress of optimization in the field of machine learning and the research topics that are more interested. At the end of the speech, he said that the researchers of the optimization of operations can communicate with the machine learning researchers and work together to promote them together.
Following his speech, Lei Feng AI did not change the original technology review editorial, Dr. Zhang Tong this paper correction and confirmation, but also to thank the assistant professor of finance Deng Qi comments on the proposed.
Thank Ge Dongdong for inviting me to come. Today is Mr. Ye Yinyu's birthday. I'm very happy to be here to discuss it with you. My main research directions is machine learning, there are many optimization problems in machine learning field at present, research on some optimization may be promoting themselves in the field of machine learning, I will introduce the progress in this field, I hope you can work together to better promote the development.
The optimization of machine learning is relatively narrow, which is mainly concerned with the data. The main three kinds of data are as follows:
The first is the statistical distribution of independent data. This problem has a structure of summation or expectation. This structure exists in supervised learning and unsupervised learning.
The second one is data similar to graphical model (graph model). Here we are concerned about the structure of graph (graph), and there are more structure of summation in this data.
The third is the sequence data, in which the most fundamental structure is the sum.
Because many problems are expressed in the form of statistical expectation, random optimization is the research direction that people are more interested in.
Random optimization appeared in 50s and 60s, which belongs to the category of traditional optimization. There are some books related to random optimization, and there are also special researchers in this field.
In fact, in the field of machine learning, it is basically random optimization that is used in the field of machine learning, and few people use deterministic optimization. In addition, recent progress has been proved by variance reduction that stochastic optimization has better convergence rate, which is one of the reasons why we use stochastic optimization.
Now I'll introduce the first order stochastic optimization for you, and there's a lot of research in this field..In this field, the direction for which everyone is more interested may be non convex optimization. Recently, many computer theorists are doing research on non convex optimization, but I haven't seen those who do the optimization in this field. In the non convex optimization, Newton cubic method written by Nesterov is mostly cited before. This work is based on the Newton method, and has achieved some good results in non convex cases. Now, some researchers are based on the cubic Newton method, but they have been studied more deeply.
Another direction that you may study is two or three - order optimization, which is also random.The research here includes how to do some special design of the sampling process. This is mainly research in the field of machine learning, but I know that there are some people who have been optimized at the moment.
Another piece of research that you might be interested in is related to acceleration.. The earlier research methods in this field include momentum algorithm, Heavy Ball optimization algorithm, and Nesterov's accelerated algorithm in the convex optimization problem. It may be the acceleration of the non convex problem that everyone is interested in now. How to speed up in a non convex case? There have been some related literature recently. Actually, we can't do acceleration in non convex cases, but the latest articles show that in executing the algorithm, we can detect convexity, and once it is convex, it will change immediately.
When people are using the momentum method, what should we do when the conditions are not suitable? It may be more biased here. The momentum method is also applied in practice, and the Adam algorithm in deep learning uses its ideas and also uses some other scaling methods.
In addition, there is a research question that I am more interested in on the acceleration issue. The acceleration itself is deterministic, and it is not very good to use in the random optimization.At random, we can only increase mini-batch scale by accelerating. So we are also studying how to speed up at random. However, this problem is not very good until now under the stochastic mini-batch algorithm. Some people may know that speeding up the convergence speed by superposing the acceleration method with the random algorithm is going to go through a process. This process needs to be transformed by a deterministic form. Is there a better way here, and I'm not very sure now.
We are also interested in some special structures, such as compound loss functions and proximal structures (such as sparse and low rank). The study of special structures in non convex problems is relatively less than that of convex problems.
There is a piece of your interest is super parameter optimization, this research is actually more complicated. Recently learning to optimize (learning optimization) this study is concerned with the optimal hyper parameters, the optimization process is not your own derived, machine learning is out of school, it is also quite interesting. This method at present is still the primary, so we can consider the problem
The front is a series of studies related to mononuclear optimization. Another direction that people are interested in is large-scale distributed and multi-core optimization.Even now, many optimization packages have no multi core support.
Here, the first step may be a multi - nuclear calculation, and the second step is to do distributed computing, which is the actual demand. On the other hand, theoretically speaking, we are more interested in the communication and computation balance in this structure, if there are different computing units, exchange of information needed in the implementation of the algorithm in the process, the amount of information exchange, how much computing time, and how to balance. There are many related studies, which are synchronous and asynchronous. You may actually use asynchrony more or less, and there is a series of research related to asynchronous distributed optimization. Besides, some people are interested in de center optimization and low precision optimization (for example, derivative with low accuracy). These methods can even be combined with new low precision hardware in the future. There is another part of the research that is combined with the chip, which is model compression, which is also a research field related to optimization.
Another direction is the study of a practical non convex optimization algorithm.In addition to solving the problems in deep neural network, non convex optimization can solve some other problems, but what you may be concerned about is deep neural network. Many researchers will make some effective but not too many theoretical foundations of the heuristic algorithm, including the batch normalization algorithm. In addition, there is an algorithm similar to Adam, which is also very popular in practice. It combines the two convex optimization ideas together: combining the momentum optimization algorithm with the Adaptive Gradient, and making some parameter adjustments.
In addition, there have been some recent advances in theory.. Recently, a number of young Chinese scientists have done some comparative research in this field, for example, how to optimize the escape saddle point. Finally, they obtained a local optimal solution.
Much of this is theoretical work , such as research on convergence complexity ,
Another interesting study is the global optimal convergence of the algorithm on some non convex problems, where some structural information is required.. The present study has made some conclusions on some problems. Some are optimization algorithms, but some may change the algorithm and use other structures.
There is a new field of research that is more interested in the saddle point problem.. I am personally interested in this problem, and I have studied the relevant literature. At present, in the optimization, the research on this field is not too much. What is better to do now is the problem of convex, concave, and linear to variables, and the relationship between primal and dual as bilinear. On these issues, there are some results, including the field of machine learning that is now beginning to be studied in this area. If it was not bilinear, the situation would be relatively difficult, and some of the conclusions would not be popularized. At present, in the case of non convex case, or not convex or concave, the result of this block is very few. Even convergence has no general result.
In fact, there will be such a problem, for example, some forms of reinforcement learning can be written as a saddle point problem. Including some other generative models, such as generating antagonism networks, will appear in this form, neither convex nor concave, but it has some special structures, so some people who are interested in it will do special research.The intensive learning itself is also a closer direction to the optimization, which is closely related to the Markoff decision-making process in the research of operational research. At present, this direction is the focus of research, and there are more and more results.
Although the problem of machine learning is rather narrow, many traditional optimization problems do not belong to the research field of machine learning, but as I mentioned before, there are also many interesting problems, which are closely related to optimization. The field of machine learning is relatively deep in these problems, and often makes some theoretical work beyond the optimization field itself.
I hope that we can communicate with you so much. If you are interested, you can join in and study together.