Home > News content

Some Thoughts on Python

via:博客园     time:2017/2/15 12:30:30     readed:1197

Introduction: Python has always been the most criticized performance, many of the initial use of Python projects and even began to migrate to other languages, Duolingo is one example. The most successful framework for the entire Python community is PyPy, but Python uses big Dropboxes and is not used. Instead, they have written a Pyston from the stove again. Kevin Modzelewski has something to say about Python's cliché.

All along, Python's performance is the most criticized where most of the initial use of Python projects and even began to migrate to other languages, Duolingo is one example. The most successful framework for the entire Python community is PyPy, but Python uses big Dropboxes and is not used. Instead, they have written a Pyston from the stove again. Kevin Modzelewski has something to say about Python's cliché.

"Come out sooner or later to have the" ralph lauren pas cher;

Duolingo(Multi-neighbor) is a completely free multi-language learning tool dedicated to providing an equal opportunity for language education for users around the world. The current number of users more than 150 million people (the author is one of the users), its founderLuis von AhnIs Professor of Carnegie Mellon University, Turing TestCAPTCHAs, Commonly known as the verification code) inventor, a member of the MacArthur family.

Not long ago Duolingo engineer Andrekhorie was on the official technology blogPublished an article, Elaborated rightDuolingoThis complex system to reconstruct the various experiences. inDuolingoBack-end, more than 88 courses through the machine to continue to optimize the study provided to the user.DuolingoThe original system was written in Python and was later rewritten with Scala. Although the refactoring system has spent a lot of manpower and material resources (engineers have to interrupt the new R & D program instead of a few months to rewrite the entire system), but all this is worthy & mdash; & mdash; Of the system delay from 750ms down to 14ms, uptime from 99.9% to 100%.

System refactoring

There is no exception, all the companies will be in the early growth of such a technical debt and mdash; as you owe the bank money, over time it will continue to grow like a snowball. There is a saying that is good, come out sooner or later to be back. This pain will be accompanied by the growth of the business until you have to face it.

At the beginning of the system design, due to the need to share data, Duolingo's system architecture as shown below:

The problem with this architecture is that there are too many serious dependencies, and in this case, more and more network requests make the system latency more and more. In order to solve this problem, the idea of ​​redesigning the architecture is to decouple all dependencies as much as possible so that the system becomes simple and robust (as shown below). In the new architecture, often used to share the course data stored on the AWS S3, Duolingo just do some lightweight cache can be.

Rewrite the code

Duolingo made the biggest decision to rewrite the session builder with Scala. Duolingo's back-end session generator was originally written in Python. The advantages of Python are not much to say, but its shortcomings are obvious.

  1. Performance: For example, Python is much slower than C or Java.
  2. Memory management: Python's GIL restricts the developer's management and utilization of memory.
  3. Dynamic types: for complex systems, which led to the deployment process had to repeat a lot of bug fixes, but slowed down the progress of the implementation of the project.

As a functional programming language, Scala has a lot of very attractive features. It is very refined, the preparation of the code more readable, debugging and maintenance is also more convenient. Scala draws on the benefits of its predecessor programming language and addresses the pain points that can not be overcome by the language, with better fault tolerance mechanisms and fewer bugs. In addition, Scala based on Java Virtual Machine means that a rich Java class library is available. Scala is used in many large data projects that are less complex than Duolingo, which seems to be a good choice. The most important thing is that in the complex system programming language, Scala's learning curve is easier.

Thinking of Duolingo

Through this refactoring, Duolingo wrote all the test cases, on the one hand to improve the stability of the system, on the other hand accumulated a lot of development documents. In the process of their seamless integration of many independent development components.

The rework is less time than Duolingo's expectations, and the reconstructed system is more robust and the codebase is more readable and maintainable. Some of the Scala libraries lack the available documentation throughout the refactoring process, and some trouble with Java integration, and some Scala features can not be well supported.

In terms of performance data, the reconstructed system was 100% higher in the first few months, and this data was originally 99%, and most of the delay occurred in the cache that was downloaded from S3 without being hit by a cache Course data files.

Often Tucao Python classmates & ldquo; you do not understand & rdquo;

In view of the recent everyoneAbout the future of PystonTo discuss more, just recentlyPyston released version 0.6.1, The latest version of Pyston in the benchmark performance than CPython fast 95%, and Dropbox performance increased by 10% (Dropbox internal a lot of projects are written in Python, Pyston is Dropbox launched an open source project, the goal is to use LLVM and modern JIT technology to develop a high-performance Python implementation). Kevin Modzelewski, the core developer of the Pyston projectan articleTalked about some of his views on Python.

Why should JIT and why repeat wheels?

Time back to 2013, then Dropbox website Web server 90% of the CPU resources are a variety of access requests consumed, the speed of the server procurement so that everyone is scared. It was generally accepted that the bottleneck of Python was I / O, and the problem was quite tricky because the problem was widespread throughout the system. Looking at the time there is no mature technology solution & mdash; you can not expect and can not imagine millions of lines of code PyPy can be compatible with the large number of Dropbox itself. Kevin Modzelewski is still convinced that the original choice is correct. People do not want to give up the ecological Python, and want to improve performance, only to find ways to improve Python, this is the best choice.

Another common complaint is why did not you use PyPy or CPython's code base? This question really does not answer. Compatibility is Dropbox first to consider, followed by a reasonable performance improvement. Dropbox's needs and PyPy in the design concept and technology to achieve both sides are contrary to the Kevin Modzelewski view, this is the PyPy project continues to achieve greater success of the bottleneck. PyPy small repair small patch can certainly be improved, but like memory use, support C expansion, stable performance and other issues have been cured to PyPy's architecture to go, it is obviously not tinkering to solve the problem. Kevin Modzelewski doubts that PyPy, which has been modified for Dropbox, can no longer be called PyPy.

As for CPython, this is more of a pragmatic decision, Dropbox was the goal of using CPython as much as possible. Kevin Modzelewski also admitted that today, 90% of the Pyston code libraries are CPython code. From this point on, Pyston is obviously based on CPython implementation. However, in the early stages of the project, Dropbox's primary task is to verify that its strategy is correct. & Mdash; using LLVM to develop a high-performance Python implementation, CPython is not suitable for such experiments. Kevin Modzelewski thought that some of the value of the Dropbox was that they had better understood the technology, and that might have encountered similar situations in the future, as the previous experience believed that they could do it, despite some detours and repeated wheels. better.

Some people did not think of the point

In Kevin Modzelewski's view, there is no investigation without a voice, but some people are more keen to publish unrealistic views. For example, the V8 engine makes the browser running JS faster, Kevin Modzelewski is thinking about how to make Python get such a speed. Kevin Modzelewski is thinking about why Python is slower than dynamic language? Instead of saying that you see Lua's faster, so it's better to use Lua. Kevin Modzelewski wants people to understand that Python is slow because of its rich object model, not its dynamic type and dynamic range. The problem is that every Python operation has a lot of key points, and because many features are widely used, the user ignores these places.

For example, when a frame containing a local variable exits, how to modify the corresponding function smoothly These features are supported by a wide range of applications in the absence of dynamic mechanisms such as JS or Lua. But people obviously ignore that.

On the other hand, Python compatibility is not the same as everyone's understanding. For example, Kevin Modzelewski found that Dropbox's code base was so large that compatibility issues were unavoidable. When switching from the reference count to the tracking garbage collector and even switch to the sort dictionary, it will cause countless interrupts. Eventually Dropbox had to separate and re-execute the two implementations to match the behavior of CPython.

Especially in the field of Web applications memory usage is the most criticized Python where the place. Part of the GIL mechanism problem & mdash; & mdash; similar to multi-threaded, multi-process will certainly take up more memory. No matter what the considerations, Python prohibits the sharing of memory between different processes. The memory usage here is not the memory used in Python space (except MicroPython), which is often discussed, and this is another reason PyPy does not apply to Dropbox. Dropbox system in many places in fact there is such a restriction, one of the key indicators is "per GB memory per second request count" rdquo ;. Kevin Modzelewski once thought that when the number of requests increased at a rate of 50%, and memory to 2 times the speed of expansion is feasible, in fact, this is not feasible in the memory binding class service.

Kevin Modzelewski finally said that despite the above problems, but at the time were considered by the choice, if again, he believes that can do better. Dear reader, is your knowledge of Python so refreshed?

China IT News APP

Download China IT News APP

Please rate this news

The average score will be displayed after you score.

Post comment

Do not see clearly? Click for a new code.

User comments