Recently Google open-source version of the Web page data visualization tool Embedding Projector, the project as part of Tensorflow, high-dimensional data can be visual display and analysis. The following is the Lei Feng network (public: Lei Feng network) compiled compiled Google Research Institute content, without permission shall not be reproduced.
Recent machine learning industry can be described as fruitful, from image recognition, language translation and then to medical diagnosis, can be described as eye-opener. With the widespread use of machine learning, exploring how models understand data is becoming increasingly important, but data are usually expressed in hundreds of thousands of dimensions, so we need a specialized tool to explore, This data is high-dimensional.
In order to make it more intuitive to study the data, we open up our own web page version of the data visualization tool Embedding Projector, this visualization tool is part of TensorFlow, can be used for high-dimensional data visualization and analysis, in addition to a Single version, this version does not need to install TensorFlow, can be run directly, we can go to projector.tensorflow.org download.
Usually we need to train the data is not directly as a machine learning algorithm input, we should these data (such as: words, sound, video and other data) expressed as a machine can understand (processed) data form. We use the embedded method, that is, the data represented as a vector, the vector contains all aspects of data information. For example, in natural language, two similar words are mapped to two different points in the same vector space, but the positions of the two points should be similar.
Embedding Projector This tool is very simple to use, it can achieve the data display of 2D or 3D effects. Click the mouse, you can achieve the data rotation, zoom. We follow the word2vec tutorial in TensorFlow training on a number of word vector, the word vector through our tools for visual display, click on any point in the graph (that word vector point), then calculated by this algorithm, and the word semantics Related words and their vector space will be listed out. It provides us with a very important method of exploring the performance of the algorithm. The following figure shows the semantic similarity of the word "important" in the vector space
Dimensionality reduction method
The Embedding Projector provides three common methods for data dimensionality reduction, which can make complex data visualization simpler. Specifically for the following three: PCA ,, t-SNE, custom linear projections. PCA is used to explore the internal structure of data and discover the important dimension information of the data. The t-SNE is used to explore the surrounding information of the data and to determine which surrounding data belong to the same cluster effect (clustering effect) to ensure the vector retains the meaning information of the data. (In generating a model, unified data, the formal and informal tones have different meanings, and adding these factors can improve the adaptability of the machine learning model).
The following figure is a 35K e-mail on the commonly used phrase data set, using the method of custom linear projection approach and & lsquo; See attachments & rsquo; adjacent 100 points
There are also some datasets on the Embedding Projector website. You can go to the website and try out our visualization software. If you want to expose your training results, it is very simple. You only need to click on the software "Publish" Button, you can achieve the sharing of training results. We hope Embedding Projector in machine learning applications, the R & D personnel help, but also want to help you better understand the machine learning algorithm is how to interpret the data. If you want to know more details, please seeHere.