How to speed up your project with DMA
Intro
DMA (direct memory access) is a topic that's similar to pointers in C - it's not easy for everyone to visualize how it works. My goal for this blog post is to explain, in the simplest way possible, how it works and why your project can benefit from proper use of it.
What's it all about?
DMA is a useful feature for a CPU/MCU to have because it means that data can move around without the CPU (your code) having to do the work. In other words, DMA can move a block of data from memory-to-memory, peripheral-to-memory or memory-to-peripheral independently from the CPU. For people used to programming multi-core CPUs with a multi-threaded operating system, that may not sound very special. For those of us familiar with programming slow, low power, single threaded, embedded processors, it can make quite a difference. Here's a practical example - sending data to an SPI device (e.g. a small LCD display):
Without DMA
<prepare data1 - 10ms > <send data1 to SPI - 10ms > <prepare data2 - 10ms > <send data2 to SPI...>
With DMA
<prepare data1 - 10ms > <send data1 to SPI - 0ms > <prepare data2 - 10ms> <send data2 to SPI...>
Notice the difference (see the red and green highlight)? The CPU can start working on the next set of data without having to wait for the SPI transmission to complete. In this fictional scenario, the effective throughput of your device has doubled. Instead of 10ms to prepare plus 10ms to send, your machine can be preparing and sending data at the same time! Obviously there's no magic happening to make something take no time at all, but what actually happens is that once you start the DMA going, your code can immediately get back to work. In the example above, if the data preparation were to take less time than the data transmission, the CPU would have to wait for the previous DMA transmission to complete before it could start the next (sending data2). Many systems have a way to chain DMA transactions together so that when one completes, the next will immediately start.
Where is it used?
- Reading samples from a sensor (SPI/I2C/ADC) at a fixed rate
- Sending and receiving blocks of data to DAC/I2C/I2S/SPI/UART devices
- Creating a repeating output pattern (e.g. signal generator)
A Common Pitfall
One of the most common points of failure for people new to DMA is accidental 'memory corruption'. This occurs when data is given to the DMA hardware to transmit and then the user's code returns to preparing new data and then oops! The output somehow gets corrupted. When the user disables DMA, everything works correctly - hmm...The problem in this case is that passing data to the DMA hardware implies that the data will be transmitted in the background. That block of data can't be modified before the DMA transaction completes or you will be changing/corrupting the output before it gets sent. The fix is to manage multiple buffers so that you can keep working while old data is being sent. A common way of handling this is called "ping-pong" or double-buffering. The idea is that you work in one buffer, pass it to the DMA hardware, then switch to working on new data in another buffer. Each time you send the current data, you swap to working in the other buffer. This is usually the most practical way to leverage DMA without using tons of memory to queue future transactions.
Example Project
One of my more popular open source libraries is AnimatedGIF. It allows you to play GIF animations on MCUs with all types of displays. One of its primary design features is that it decodes one line at a time and passes it to the user code in a callback function called GIFDraw(). I designed it this way to allow large images to be decoded by MCUs with tiny internal memories. The MCU only needs to pass these pixels to the display (usually a SPI LCD with its own frame buffer). If transmitted with SPI DMA enabled, the GIF decoder can be decoding the next line while the current line is being sent to the display. Depending on how fast the CPU can decode each line, this can potentially make the SPI transmit time effectively 0. I've written an Arduino sketch to demonstrate how this works:
https://github.com/bitbank2/CYD_Projects/tree/main/gif_example
In the photo above, you can see it running on the "original" Cheap Yellow Display. The 240x240 "Hyperspace" GIF can animate (unthrottled) at 22 frames per second without DMA and 31 FPS with DMA enabled. The speed isn't doubled by enabling DMA, so this indicates that preparing the pixels takes more time than sending them to the LCD. The SPI transmit time has basically been eliminated from the animation loop. This is a ILI9341 240x320 LCD at a clock speed of 40MHz. A device with a slower LCD interface would benefit more from enabling DMA. For this example, double-buffering isn't needed because by the time the pixels are ready to pass to the GIFDraw callback function, the last line has already finished transmitting.How do I use DMA in Arduino?
This is an excellent question. At the time of Arduino's inception, the MCU they chose (Atmel ATMega328) didn't include DMA hardware. Atmel's (and other vendors') newer MCUs do include DMA, but Arduino hasn't added support for it into their official API. I can't really blame Arduino for this omission; DMA hardware can vary greatly and making a simple API to access it isn't trivial. Adafruit made a separate DMA library for the ATSAM MCUs. For this project, I'm using an ESP32 MCU and I used the ESP-IDF functions to access SPI+DMA instead of the Arduino SPI class. The good news is that the complete ESP-IDF API is also present in the ESP32 Arduino board support, so this code can work inside and outside of the Arduino environment. It's also fortunate that Espressif's SPI (and DMA) API is easy to use and there are plenty of example projects. The ESP32 (depending on the model) can control 2 or more SPI buses on (mostly) any GPIO pins. Here's how to initialize it for our project:
spi_bus_config_t buscfg;
spi_device_interface_config_t devcfg;
esp_err_t ret;
// The bus numbering varies by chip type, for the original ESP32, VSPI is the one we want
assert(ret == ESP_OK);
Comments
Post a Comment