Google Camera AI: How does the GCam create so much better pictures?
The Google Pixel smartphones have always been characterized by an extremely powerful camera despite old hardware. The reason for this is the well-known Google camera app, also called Google camera.
Of course, one wonders how this can be achieved. Demonstrated on one Google Pixel 4 In this post we present you two relevant algorithms from Google that improve the results in a unique way.
Jump to section
- 1 Short camera test - how good are the results really?
- 2 Super Res Zoom - We at Google only need 12 megapixels
- 3 LiveHDR Plus - aggressive dynamic range already in the viewfinder
- 4 Summary
Short camera test - how good are the results really?
On the hardware side, Google still uses the Sony IMX363 main sensor from the Pixel 2 with optical image stabilization. While that P50 Pro should already have a one-inch sensor, the classic 1 / 2,55-inch with 12,2 megapixel resolution remains.
First of all, when looking at the results, it is noticeable that Google has managed the dynamic range really well, because the pixel can handle the bright sun well and the colors have also remained beautifully realistic. In general, I like the color rendering extremely well, because it's like the perfect compromiss between real and artificial.
The special thing is still the image sharpness. Here, Google plays a league above the competition with fewer megapixels, without over-sharpening manually, which usually doesn't look good. It also affects the best digital zoom. With a double enlargement without an extra lens, the picture comes to the level of a smartphone with optical zoom.
But how can it be that such good images come out with such "bad" hardware? With this question we get into the topic of the article.
Super Res Zoom - We at Google only need 12 megapixels
The high sharpness and the good digital zoom are thanks to the "Super Res Zoom" technology. This isn't just stitching multiple images together, it's a much deeper technology. What is special is that Google uses the natural shaking of the human hand, the real difficulty in mobile photography, as the crux of the solution. That was unexpected. In the following I explain what exactly is meant. The first Proproblem what to solve is the lost color information.
Color recognition on a camera sensor
The actual image sensor cannot detect colors, only the strength and intensity of the light rays shining on it. To capture the colors in the scene, cameras use Bayer color filters in front of the sensor, so each pixel gets exactly one color - red, green, or blue. That ProThe problem here is that each pixel can only recognize a single color.
Here the camera processing pipeline has to reconstruct the real colors, taking the partial information into account. This describes estimating the missing color information by the nearby pixels. Two thirds of a typical RGB image is a reconstruction. This whole process is then called demosaicing.
Of course, modern smartphones use much more complete technologies than this one, but it still leads to incomplete results and artifacts. That ProEven large DSLR cameras have problems. Google found a good solution.
Solution for artifacts in reconstruction
For more than a decade, the Drizzle technique has been known from astronomy, where images from many slightly different perspectives, or rather positions, are recorded and combined. In ideal lighting conditions, this corresponds to a resolution of two to three times the optical zoom.
The following technique is called the "Multi-Frame Super-Resolution" algorithm. To put it simply at first, you take a lot of low-resolution images from different perspectives. This is placed in a high-resolution field and a high-resolution image is obtained by calculating the mean value, i.e. the pixels that are occupied in every perspective.
If we scale the left picture above with two equally large high-resolution circles to a low resolution, we see: Two different, low-resolution versions of the circle are created. Another small example for demonstration: I once positioned a high-resolution image of the letter A differently in different image files. All images were then scaled to an extremely low resolution. You can also see here that you get many different versions of the letter:
I now have all the images in an image editorprogram, in my case Adobe Photoshop. When calculating the mean value of all images, you can see: The A is high-resolution again.
I carried out the same procedure once on the sample image provided by Google and got the same result here as well.
Intentional blurring of images?
But how do you get the recordings from slightly different perspectives? With large cameras or in astronomy there are special technologies for this, but with mobile photography you have a very natural advantage. The unconscious wiggling of the human hand while taking a picture. With these slight movements you get a lot of pictures from different angles. With the optical image stabilization, excessively strong movements are compensated for.
But if you then use a tripod and there is no more camera shake, is artificial intelligence really at its end? No, and that is even more interestingly solved by Google. They make the optical stabilization module move slightly. So a stabilization that is used for shaking. This movement is very small and does not interfere with photography, but you can still see it when you maximize the digital zoom of the pixel.
Why only google?
Above I have explained a very fluid and apparently simple scheme, because it has of course been simplified to make it easier to understand. If every step really worked that perfectly, the technology would probably already have every smartphone. There are certainly reasons why it has only ripped off Google so far.
First, a smartphone has a limited sensor size. Even with sufficient brightness, every image may contain a lot of noise due to the exposure time being too low.
Second: When taking a picture, the subject can move in addition to the hand, for example leaves on a tree, cars or fellow human beings. All this is disruptive for "Multi-Frame Super-Resolution". In addition, the movements of the hand and the subjects are all random, so even if the images are well matched, the image information can be uneven. In order not to go beyond the scope, I'll leave the explanation at that. How does the Google Pixel now solve the Protrouble?
Avoid movement in the picture
First, it selects an image from the many frames with the best sharpness as a reference. The other images are then compared with this one and analyzed. The comparison does not take place individual pixel by pixel, but in larger areas. The aim is to find the same settings as for the reference image (i.e. the correct settings). Only then is the information combined.
That was not all, because the Google Pixel uses software to find the edges of objects in the image and especially combines the pixels along these edges. Not only are the details retained, but the edges become clearer.
After the reasons listed above, it can now be easily explained why Google can simply capture more details with the same resolution.
LiveHDR Plus - aggressive dynamic range already in the viewfinder
When opening the camera app on a pixel smartphone, we can already see the LiveHDR Plus in the view finder, indicating what the "HDR Burst" technology is behind. Here the software takes pictures in the viewfinder with the same deliberately low exposure time and keeps them in the cache, a temporary memory.
The images are combined and rendered in such a way that a temporary “linear RGB image” with 14-bit colors and extremely high levels of detail is retained. However, this cannot currently be recognized due to the low dynamic range. Therefore, all pixels with a certain (low or high) brightness are assigned to a new one. So the details are also visible. In the end, the HDR result has details both in the shadows and in the light areas of the image.
Bis zum Pixel 3 it was not yet possible to get these images in real time at 30 frames pro second to process, so they couldn't display it in the viewfinder. That's why Google introduced LiveHDR Plus for the Pixel 4, which, to put it simply, describes HDR Plus with a real-time preview.
The two brightness sliders are also new. While the upper slider, with the sun as a symbol, adjusts the brightness of the entire image as before, the lower slider takes care of the shadows - it brightens them or only darkens them.
So this is the Google camera, at least a relevant part of it. With a light click on the trigger, the pixel behind the facade uses artificial intelligence and machine learning. Perhaps the most impressive software out there. Now you don't have to be surprised how the software can get that out of old hardware. Maybe that's a little mockery of the other Android manufacturers: "You use such blatant hardware because you can't get the software under control at all." is that a feeling where I can no longer see the edges at all.