Although the dual-camera mobile phone has been popularized in the market, the portrait mode and multi-zoom zoom brought by it have become the "standard" of many smart phones, but some manufacturers still insist on using a camera configuration.
For example, Google, known for its software, from the first generation of Pixel phones to today's Pixel 3 / 3XL, in the past two years (the original Pixel was released in 2016), they still insist on a single-camera configuration.
Interestingly, Pixel is not only a frequent visitor to the top of the DxOMark list, but also has a long-lasting performance of the flagship, and it also produces a better portrait mode than the dual-production.
These are all due to software algorithms and AI neural networks. Yesterday, Google analyzed the algorithmic principles of Pixel 3 "Portrait Mode" on their Google AI blog. In fact, they have worked hard on this algorithm.
Google's algorithmic principles
If you're new to Google's camera algorithms, let's take a look at Pixel 2's portrait algorithm.principle.
Last year, Google used the AI (Neural Network) algorithm to allow Pixel 2 / 2XL to take photos with a background image that is comparable to a dual-camera phone under the hardware of a single camera.
▲ Image from:Google
By last year GoogleAnnounceIn this set of comparison maps, we can quickly distinguish the difference between the HDR+ proofs on the left and the proofs on the right Portrait Mode "Portrait Mode" in the background blur.
As you can see, in the "Portrait Mode", the background behind the portrait is subjected to software blurring, which contrasts with the normal HDR+ and appears to have a stronger visual impact, while also avoiding the visual interference of the background to the subject.
According to Google’s introduction to AI Blog last year, Pixel 2’s camera will first take a set of HDR+ photos and multi-frame compositing to improve the dynamic range, detail retention, and highlight performance of the final film.
Through the comparison of the HDR+ below, we can see the difference between the function before and after the opening (note the foreground exposure and floor line details in the upper right corner).
▲ Left: HDR+ front; right: HDR+ after image from: Google
If you want to take a "Portrait Mode" photo, then after getting an HDR+ film, the camera will use TensorFlow's AI neural network to filter the pixels of the person, the pixels of the object, and the pixels of the object.
Google’s comparison chart in AI Blog shows a more intuitive display:
On the left is the original image taken by HDR+, the black part on the right is the background part recognized by AI, and the white part is the outline of the recognized subject (including the facial features of the character and the objects inside the outline).
Interestingly, from the final film we can see that the cookies on the table are part of the "non-human" part of the AI recognition, but in the end this part is not blurred. This is because the system recognizes the object around the subject in addition to the subject and the background, so the AI does not blur the object below the character. Because this part is not a focus subject, it is a close-up, but this effect is not perfect.
Although last year's Pixel 2 and this year's Pixel 3 series didn't have dual cameras, Google doesn't seem to be a hardware-winning company. They are better at solving problems with software and algorithms.
▲ Image from: Google
Although there is no dual-camera on the hardware, Pixel's cameras are equipped with PDAF dual-core phase focusing technology, and Google can divide a camera into two by pixel division:
The picture taken on the left side of the lens will have a different visual spacing of 1mm from the picture on the right side. If it is in portrait orientation, the lens will be divided into upper and lower parts.
After shooting, the system will side by side the pixels captured on both sides of the lens. Use Google's own Jump Assembler algorithm to derive a depth-optimized map of stereo calculus, using a two-way solver to transform the depth map into high resolution.
▲ Figures 1 and 2 are for the upper half and lower half of the camera. Figure 3 is the first two images. The difference is from Google.
The left side of the figure above is the depth map taken and calculated by PDAF. The darker the white part, the closer the distance to the lens; the right side determines the degree of pixel blur, the black part is the “no blurring” range, and the red part is the “ambiguous range”. Through the depth of red, the system will automatically determine the strength of the background blur.
▲ final rendering
Finally, the system merges the background image segmented in step 2 with the depth map from step 3. Under the discrimination of the AI object, the system can estimate the distance between the close-up biscuit and the porcelain plate to the focus (person) and blur. The result is a more comprehensive and natural portrait photo than the primary treatment of step 2.
Comparing the final renderings in steps 2 and 3, you can see that the cookies in the close view are also properly blurred. Through software algorithms, we can "fabricate" the blurring range into any shape.
How does Google train neural networks?
Knowing the Pixel 2's portrait mode principle, Pixel 3's optimization is not difficult to understand.
Through software algorithms, the camera system can roughly estimate the distance between the subject and the background and blur the vision. However, in handheld shooting, the phone will inevitably experience slight jitter, which will affect the final blur effect. This is the case, many users have encountered problems with depth of field recognition errors on the Pixel 2 series.
According to the Google AI BlogIntroductionBased on the characteristics of neural network learning, on Pixel 3, Google is trying to fix the recognition error in "Portrait Mode" by adding recognition hints for AI systems and training AI neural network algorithms.
For example, the distance between the object and the lens is determined by the number of pixels, and a more accurate distance judgment result is obtained for the AI; or the defocusing prompt is provided to the AI by clearing the inside and outside of the focus.
"Franken Phone" is a device used by Google to train a neural network system written by TensorFlow. The device consists of five Pixel 3s and WiFi.
During the test, Google used the 5 phones in the Franken Phone to shoot at different angles at the same time, and finally got a dynamic depth map synthesized by multiple angles and stereo algorithms. In order to achieve the purpose of simulating the shooting dynamics and training the neural network to accurately recognize the characters (close-up view) and the background (distant view) in complex scenes.
▲ Figure 1 shows Google Franken Phone. Image from: Google
Of course, if you are interested in the Google algorithm, you can also research them yourself. Google AI BlogExpressAfter taking the "Portrait Mode" version of Google Camera App 6.1, you can view the depth map of the photo through Google Photos.
Alternatively, you can use a third-party software to extract the depth map to see how it is identified under AI neural network optimization.