Why is Python developing so fast? Python has a wide range of applications, from website development to data science, and then to DevOps, everywhere you can see it. So it's worth studying carefully what specific aspects of Python have been used recently. I'm a data scientist using the R language, and I'm interested in the development of Python in my field. In this article, I'll look at the data of Stack Overflow from another point of view, understand what specific Python applications are growing, and what kind of companies and organizations use Python most.
Two conclusions are drawn from the analysis. First, the fastest growing uses of Python are in the following areas: Data Science, machine learning and academic research. This point fromPandasThe growth rate of packet usage can be easily seen, and this is the fastest growth in the number of tags related to Python on the web. As for what industries are using Python, we find that there are more uses in the following sectors: electronics, manufacturing, software, government, especially universities. However, overall, the growth of Python is still relatively balanced among different industries. In a word, we can conclude from the conclusion that data science and machine learning have been popularized in many different types of companies, while Python is a widely accepted choice in the process.
Our analysis comes from dataThe high income countries recognized by the World Bank.
Types of Python development
Python is a programming language that can be used in a variety of applications, and can be used for various types of tasks, such as web development, data science, and so on. So how do we sort out Python's recent developments in these areas?
As a novice, we can look at the most famous Python packages in each domain, and see the growth of the number of visits to their tags. You can compare website development frameworks Django and Flask with data science packages such as NumPy, Matplotlib, and Pandas. You can use itStack Overflow TrendsCompare the question rate of the question, not just the amount of access
Visits from high income countries Stack Overflow, obviously Pandas is the fastest growing Python package: it has just appeared in 2011, Stack Overflow now has about 1% of the problem is about it. As time went on, there was a big increase in the number of NumPy and Matplotlib problems. In contrast, the amount of Django related problems kept stable during this period, although Flask has increased, but the proportion is still relatively small. This suggests that the growth of Python should be largely due to data science rather than website development.
We only consider the visits in the summer of 2017 (July and August), which excludes the impact of students, and eliminates the enormous computational complexity that spans a very long period of statistics. We only consider registered users, and we have to browse at least 50 Stack Overflow problems in this time period. We believe that to call a person as a Python user, we should meet at least two conditions: 1, the main label he reads is Python; 2, at least 20% of the pages he visited are related to Python.
What tags would people like to browse through Python tags?
Other techniques can be seen in the lower part of the listQuite a lot.
Now we've seen that Stack Overflow access related to Python can be roughly divided into several topics. Next, we can analyze what topics have brought a huge increase in Python access on Stack Overflow.
Imagine, assuming that when we look at a user's browsing history, we find that Python is the most visited tag. So how can we tell if he's a web developer, a data scientist, a system administrator, or something? We should look at his visit to the more than 2 label, then third, and so on, following his visits to the list has been read, until they found a cluster of some related and above things.
We've come up with a simple way to categorize a user into a topic. Here are nine tags that are most frequently accessed by users, and you can categorize them according to these.
It's not rigorous enough, but it's enough to give us a quick assessment of the impact of each kind of Python growth. We've tried something like thatPotential Dirichlet distributionIt's a more rigorous algorithm, but the results are pretty much the same.
What kind of Python developers are becoming more and more? Notice that we are classifying users rather than browsing them, and we show a part of all registered users on Stack Overflow, including those without access to Python.
In the last three years, Python browsing generated by websites or systems management has been relatively slow and steady. But Python browsing related to data science is growing very fast. This suggests that the widespread use of Python in the field of data science and machine learning should be the main driver of its rapid growth.
This helps us to prove that most of the growth related to Python is related to data science and machine learning. The colors of those clusters are developing toward orange, suggesting that the tags have become a major part of the Python ecosystem.
Another aspect of understanding the growth of Python usage is the type of company that considers the amount of browsing that comes from. The difference between this and the developer types that consider web browsing is that both retail companies and media companies employ both data scientists and web developers.
We focus on two very large Python growth countries: the United States and the United kingdom. In these two countries, we can split the visits according to the industryAWS and AzureSame).
The top of the list of visits is the academic community dominated by universities. The reason is that now undergraduate students in the programming course is Python?
That's reasonable, but not entirely true. We are inA previous articleAs mentioned, Python visits from universities are very smooth in the summer, not only in spring and fall. For example, Python and Java are high in the number of visits from colleges and universities, but they can be distinguished by season.
The percentage can see, every summer traffic will Java cliff fall because in college class Java class is very common. By contrast, Python accounts for a very high proportion of the summer flow. Therefore, the main flow of Python in Colleges and universities comes from academic researchers, because they work continuously throughout the year. This also provides evidence for the growth of Python from scientific computing and data analysis from another angle.
Government sectorPython is widely used, and applications are growing very rapidly. Python is also widely used in electrical and manufacturing industries. I'm not so familiar with these industries, so I'm curious to know why. Python is not widely used in retail and insurance companies, and some surveys show that Java is still mainstream there.
The main idea of this article is to investigate the reasons for the growth of Python. Is Python traffic growing much more in some industries?
At least from the data of the United States and the United Kingdom, last year, Python applications have been widely promoted in many industries. In each industry, Python traffic has an absolute growth of two percent to three. (notice that this means that the industry is less widely used than insurance, retail and other applications, and the relative growth in these industries is even greater)
According to the data so far in 2017, Java is still the most visited label in most industries, but Python has been growing. For example, from the data of the financial industry (a big contributor to Stack Overflow traffic), the number of Python tags has increased from fourth in 2016 to second in 2017.
As a data scientist who used Python before, and now using R, see if I should change back and continue using Python after this analysis
I don't think so. On the one hand, the growth momentum of R is also good. One of the previous articles shows that it is next to Python in the fastest programming language list. On the other hand, I prefer to do data analysis with R, which is not related to the wide extent of its application. I'm planning to write another article about my experience from Python to R, and I like the features of the two languages, and why I don't want to be forced back.
In any case, data science is an exciting and rapidly developing field, and naturally there are many languages in common development. My main purpose is to encourage new entrants to think about training their skills in the field of data science. There is no doubt that this is the fastest growing part of software development, and it has been widely promoted in many industries.