An embedded-friendly PNG decoder
I wrote my own imaging codecs many years ago for all of the 'standard' file formats available. Over the last couple of years I've been dusting off that code to give it a new life as open source libraries for embedded/Arduino. I wrote blog posts about my JPEG and GIF decoders so I thought it would be useful to write about my new PNG decoding library.
What is PNG?
The (Portable Network Graphic) specification was created not too long after Unisys started enforcing their LZW patent to collect licensing fees from the use of GIF images. PNG was specifically designed to not infringe any patents and offered the benefit of supporting many more pixel types and an alpha channel. This offered a new feature not found in JPEG and much more functional than GIF's single transparent color. PNG uses the ZIP/FLATE compression scheme which is open source and without any patents attached to it. The FLATE compression by itself isn't drastically better than the LZW compression used in GIF files. What allows PNG files to compress images so well is the addition of a pre-filtering step which takes advantage of symmetries in the data to make it more compressible. For several years GIF hasn't been hampered by any active patents and could take advantage of the same trick to shrink its files, but alas it's been marked "deprecated" in the eyes of many. It's highly unlikely GIF will see any updates to its spec.
There are 4 filter types (not including 'none') used to predict pixels on the current line from neighboring pixels. Filters basically replace the pixels with the difference between the pixel and its best matching neighbor. For areas of constant color or constant gradients, this will replace the data with 0's (or the gradient delta). The compression scheme used on the filtered data can do a better job if there are fewer unique symbols (aka colors) to compress. Imagine an image composed of vertical stripes of colors. If the "up" filter is applied, The entire image would be converted to 0's except for the first line. LZW or FLATE would still do a decent job with the original color bars, but could do a much better job if most of the data were 0's. The filter type can be unique on each line of the image and is stored as the first byte of each line of the compressed data. If a 8-bit palette image is 320 pixels wide, then there will be 321 bytes per line of data in the file. PNG encoders that try to compress the data as small as possible will test each filter on each line and choose the one that results in the lowest 'entropy'. Some encoders simply choose the 'paeth' filter for every line since it generally performs the best. To decode a PNG image (non-interlaced) you need to buffer at least the previous line of pixels in order to be able to de-filter the image data. This sets the minimum amount of memory that my library uses - zlib (32K + 7K), file buffer (2K), current and previous lines (5K by default).
The MCU Challenge
Working with images on constrained memory devices is always challenging; pixels take a lot of space. Consider the case that you have an ATSAMD51 (192K internal RAM) and want to display a 320x240x32-bpp PNG image. If you were to decode the whole image into RAM, you would require 320x240x4 = 300K bytes, plus the 39K needed by the zlib inflate() code. This would normally be impossible to do if not for displays designed specifically for "Arduino" devices. These low cost displays have an internal frame buffer and receive commands+data over an SPI bus. One of the design goals of this library was to handle this use case. PNG decoding is already designed to be done a line at a time and so I added an optional callback function so that only a single line of pixels needs to be buffered and can then be transmitted to an external LCD. You can also decode the full image into memory in one shot if you have enough space. Part 2 of this challenge is managing all of the possible pixel types. PNG supports 1/2/4/8-bit palette images and grayscale or truecolor, all with optional alpha channel. I decided it would be better to deliver the native pixel format to the callback function (faster than forcibly converting it to another format) and then provide a separate function to convert any of the pixel types into RGB565 for those inexpensive LCD displays. This design gives the best speed and flexibility.
The second challenge of PNGdec was to remove dynamic memory calls (malloc/free). The reason I like to make my Arduino libraries not call malloc and free is to allow the code to be built on the simplest possible embedded system with no C-runtime library. I used zlib to handle the inflate'ing of the compressed data. It's got an alloc/free callback option to provide your own memory management. The only problem is that it makes multiple calls to allocate different sized blocks of memory. This could be handled, but I thought it would be easier to just remove the calls entirely and have it work from a single block of memory. This way, the caller can decide how to provide that memory - by reserving a static block or dynamically allocating it in one shot.
Where Can I use it?
I decided to set a max image width of 640 x 32-bpp (which can also be 2560 x 8-bpp). The total memory required for decoding images of up to this width (and any length) is around 46K. This seemed like a reasonable balance, but unfortunately excludes a lot of smaller embedded devices. This couldn't be avoided because the zlib structures require 39K of RAM by themselves. The most likely systems that this will run on are: ESP32, Cortex-M and RISC-V (all requiring at least 64k of RAM). You can certainly run this code on a PC or Linux level machine, but why would you? A CPU with enough memory to run an Operating System has enough memory to run the full version of libpng. I included a Makefile and Linux test app for convenience, but I don't expect anyone to use it that way.
How Fast is it?
PNG decoding is all about inflating the compressed data and then de-filtering it using the various filter functions. These two steps should take nearly 100% of the decode time and are split about 50/50 on non-SIMD implementations. The overhead of reading the data from the file and managing the output pixels can also take time (and does with the pngle library). My library does its best to not add any additional latency to the zlib+filter time, so it ends up being relatively fast. I also disable the internal Adler32 CRC check by default which speeds things up by 10-20%. PNG has multiple levels of CRC checking - zlib checks the compressed data as it inflates it and the PNG file has its own CRC calculation that runs on that same data afterwards. The zlib CRC is disabled by default and I didn't add code to check the PNG CRC values. If you're not sure of the integrity of the file you're decoding, then enable the zlib level CRC checking.
Below is a simple benchmark test I wrote which decodes 3 versions of a 240x200 image and either delivers the native pixels or converts the output to RGB565. The conversion from palette colors to RGB565 could be sped up with a lookup table, but that will require more memory.
With the "trifecta" of my JPEG, GIF and PNG decoders, embedded devices now have an efficient way to decode a wide range of standard image files. PNG specifically fills a need because of its high compression ratios, wide range of pixel types and support of an alpha channel. I hope it enables people to create brand new projects or allows existing ones to look prettier. As always, if you have any questions or comments, please leave them below.
My Github Sponsors Page