Home > News content

LinkedIn source Photon Machine Learning: Support Spark

via:博客园     time:2016/7/5 10:30:22     readed:1978

English text:Open Sourcing Photon ML: LinkedIn & rsquo; s Scalable Machine Learning Library for Spark

Machine learning is a key component of LinkedIn company related marketing. They use machine learning to feed, advertising, recommendation systems (such asPeople You May Know), Mail optimization, search engine ranking algorithm training. A little deeper you can see an example of the feed stream to achieve LinkedIn [Part of a,Part two], Related to how to apply machine learning to feed stream ordering.

These algorithms play an important role in enhancing the user experience, so they need to provide a machine learning tool engineers a simple and easy to use, so that engineers can create high-quality machine learning model and the model can be quickly applied to large data sets. To meet this demand, LinkedIn Photon developed a machine learning. Photon Machine Learning Support Apache Spark, quickly process vast amounts of data through a combination of capacity and strong Spark model training and diagnostic tools, Photon machine learning research engineers to provide more information to the decision which type of recommendation system algorithm.

Photon machine learning in many different areas of research engineers to provide a wide range of value nowAlready open.

What Photon Machine learning is?

Photon mass regression machine learning to provide support, the support band L1, L2 and elastic-net regularized linear regression, logistic regression and Poisson regression. Photon machine learning provide alternative diagnostic model, create a table to help diagnose problems fitting model and optimization. Photon machine learning to achieve a generalized mixed effect model experimental nature, will be described in detail below.

In Photon machine learning how to use LinkedIn?

A typical machine learning systems represented in the following flowchart. The first stage is pre-processing of data, clear data from the online system, create tables, feature extraction. The next stage is the application of machine learning algorithms to recommend learning system or search system function to get a good score, and then select the best model. Finally, the optimal model online A / B test publisher to test its impact on the user experience.

Photon Machine learning is the core of the company LinkedIn model training can be used as an alternative to other hot-swappable machine learning libraries. In the flow chart above, the circles represent the behavior of representatives of the cylinder data sets.

Learning how to run Photon machine in the cluster?

In LinkedIn company, Photon machine learning operation using Spark on Yarn mode, sharing the same cluster with other Hadoop MapReduce applications. We can easily mix with a workflow using Photon machine learning and traditional Hadoop MapReduce program or script. Migrating from Hadoop MapReduce model training to Spark on Yarn can speed up to 30x 10 times. In order to better use Spark, machine learning algorithms support team contributed Spark of Dr. Elephant.

Spark and Hadoop workflows share the same cluster supports LinkedIn existing machine learning input and output formats, which greatly improved Photon Machine Learning in the promotion of LinkedIn. Many teams use Photon Machine Learning at the associated development and application of scientific and safety data, some of the team online use.

Development Direction Photon Machine Learning: GAME

On the open source community to Photon machine learning, learning to build and will affect the application of industrial-grade machine to others. Although currently there are many open source machine learning library, but the authors believe Photon Machine learning is a very important supplement. Photon Machine Learning provides generalized mixed effect model (GAME).

Current, Photon machine learning realization GAME, support generalized linear mixed effects model (GLMix). GLMix by the fixed effects model and random effects model multiple components. Fixed effects model corresponds to the traditional model and generalized linear models, assuming that each observation variables are independent. Random effects in multi-granularity retained additional parameters fixed effects (users, items, segments) for additional heterogeneity. The general rule is the used to avoid over-fitting. And random effects can cause marginal dependence observed variables.

GAME using coordinate descent address each factor in turn.

We use all coordinate descent optimization problem, every single step debugging sequential effect, using a suitable child optimizer solve the problem. For the fixed effects coordinates, we use regression algorithm distributed column-wise data. Spark of RDD each iteration of the use of local advantages of data, rapid optimization without the use shuffle data. In order to effectively solve the random effects coordinates, we partition the data based on the random variable, with stand-alone algorithm to solve the random effects coordinates.

GAME model provides precise schematics to aid research engineer positioning. The authors hope to use these techniques to improve the algorithm recommended broader systems. LinkedIn internal company use A / B test display Photon machine learning model training GLMix increase by 15 percent to thirty percent of the recommended work, raise ten percent to twenty percent (based on the recommendation in the mail CTR). Although these test only in the early stages, but the result showed Photon machine learning can significantly improve the recommendation effect.

GAME Photon machine learning algorithm to train the model provided by the author of the follow-up will continue to improve its robustness and ease of use. In addition to the generalized linear model, the authors have developed a shredded random effects model experimental code, using matrix decomposition to interactive and random effects. In the future, the authors will continue to use the framework to achieve broad mix of other machine learning algorithms.

Translator's Introduction: Man days, focused on big data, machine learning and mathematics-related content and personal public Number: bigdata_ny share relevant technical articles.

China IT News APP

Download China IT News APP

Please rate this news

The average score will be displayed after you score.

Post comment

Do not see clearly? Click for a new code.

User comments