Displaying Unicode with MCUs
Intro
I grew up in the era of 8-bit computers (late 1970s). At that time, basically anything done on a computer meant it was written in English and text was written using 7-bit ASCII characters stored in 8-bit bytes. It wasn't until a few years later that IBM and Microsoft added support for accented characters to be more inclusive of European languages. The initial solution was to shoehorn a bunch of characters and symbols into the "unused space" of values 128 to 255. This became known as Extended ASCII. There were multiple de-facto standards, but Microsoft's seemed to win; they called it codepage 1252. This stuck for a while until the various countries got together to create "one character set to rule them all" and called it Unicode. In this new world, the old ASCII 7-bit set was slotted into the first 128 spots followed by the accented characters, symbols and then Asian and middle-eastern characters came after that. There are literally thousands of symbols in the whole set, but it's mostly a sparse array - there are a lot of gaps and empty sections. To make Unicode strings take up less space, an expansion code was created called UTF-8. This allows for ASCII text to still fit in one byte and 'extended' characters can require 2, 3 or 4 bytes to encode so as not to create any stray zeroes which would be interpreted as string terminators.
The MCU Challenge
When working on constrained devices, certain concessions must be made. The small amount of RAM and relatively slow CPU generally limits what kind of data you can work with. Displaying Unicode strings brings several challenges - you can either implement a TrueType rendering engine (lots of code and relatively large data sets) or work with bitmap fonts at a fixed point size (much less code and usually smaller data sets). For the bitmap font solution, there have been some partially working solutions which made use of Adafruit_GFX. I took this a step further.
The Practical Solution
The challenge of working with Unicode on MCUs is two-fold - working with a large, sparse array of characters (Unicode) and translating those codes to match a reduced set of glyphs (graphics) to draw them. I came up with a two-part solution - I modified the fontconvert tool to map codepage 1252 into Unicode characters in the TrueType font so that indices 128 to 255 of the output would contain the codepage 1252 characters and symbols, then I added a UTF-8 (Unicode) to cp1252 function in the print functions of the display library. This allows the user to type text into their Arduino sketch or Linux program using standard UTF-8 encoding and have it display correctly on their MCU project without having to do anything strange in the project code. Below is a screenshot of some Arduino code which prints Unicode characters. The strings are encoded as UTF-8 (an 8-bit expansion code) by most text editors. Below that is a photo of the output on an SSD1306 OLED display.
By creating the solution at both ends (TTF Unicode->CP1252 Bitmap and UTF-8 -> CP1252), the problem fades into the background and you can just get to work.
You can try this functionality in my OneBitDisplay and bb_spi_lcd libraries. Both are available in the Arduino library manager.
Comments
Post a Comment