Maximum I2C IMU Sample rate?

Intro

For a client project I was asked to collect samples from an IMU at a high rate. For practical reasons, we chose the Seeed Studio Xiao nRF52840 Sense as a prototyping platform. It has a Nordic nRF52840 64MHz Cortex-M4 with a STMicro LSM6DS3 6-axis IMU on board (and digital microphone). The cost, size, and ready made product made it a natural choice to get started with the project. The IMU consists of a 3-axis accelerometer and 3-axis gyroscope. For our use case, we only need one axis of accelerometer for vibration detection. It has a built-in 8K FIFO capable of collecting the samples at a high rate (up to 3200 samples/sec). This sample rate was most likely designed for use with the SPI interface (we'll soon see why this is the case). 

I2C Speed Limit?

The I2C interface of the LSM6DS3 can probably run much faster than its stated max of 400Kb/s, but with the nRF52 series, we're basically stuck with that as the top speed because the I2C clock can't be set higher (based on their documentation). The actual throughput of I2C devices is usually significantly lower than the bit rate. The reason is the overhead caused by the de-facto standard used by most I2C sensors - they implement a register-based protocol. This places the sensor info in specific internal addresses (registers). Reading a specific sensor value (e.g. an accelerometer axis) usually involves writing the desired register address to the device, then reading back its value. Many I2C devices auto-increment the register address for you if you read more than one byte at a time. This is the case of the LSM6DS3. Here is what it looks like to read a single 16-bit value from the IMU:

- write: 8-bits (device addr + write bit)
- read ACK/NACK bit return
- write 8-bit register address
- read ACK/NACK bit
- write 8-bit device addr + read bit
- read ACK/NACK bit
- read 2 x 9 bit values (data + ACK)

A total of 24 bits need to be written and 21 bits need to be read to get a single 16-bit register value from the sensor. This doesn't count the delays in switching directions and initiating new transactions. Suffice it to say that an I2C speed of 400Kb/s has a much lower effective data throughput rate.

How fast can the data leave the device?

Before we hit the limits of what's possible with the IMU, we need to consider how to get the data off of the device and into your PC. Luckily the nRF52840 has native USB support, so it's able to provide a CDC-Serial interface. Translation - it provides a data transport that pretends to be a UART over USB. I forgot this fact when I made my first attempt at getting data off of the device; it didn't perform well at all. I had a simple loop which read 16-bit Z-axis values and wrote them two bytes at a time with the Serial.write() method. This would be a reasonable way to do it for a real serial port, but that's not how USB works. USB devices are normally polled by the PC at a fixed rate. This means that data will only leave your device when the PC checks to see if a new event occurred. That doesn't sound terrible, right? The Serial class should cache the data and send it all the next time the PC requests an update, right? Wrong 😞. Each write to the Serial class is treated as an event, so the best throughput you can get is the USB polling rate times the number of bytes you send in each write(). I checked the Arduino driver code and the USB spec - the maximum payload size for "full speed / aka 12Mb/s" USB is 64 bytes. I tested saturating the interface with fake data and by writing 64 bytes at a time, it can push 450K bytes per second to the PC. This is pretty impressive for a microcontroller and much higher than we could ever use with actual IMU samples. The TL;DR - whatever happens on the IMU side, we need to cache the samples on the MCU and write them 32 at a time (64 bytes) to the USB interface.

The I2C register model gets in the way

The meme image above should give you a clue as to where this is headed. The register-based protocol of the IMU prevents getting data off of it quickly. For my application, I just wanted to read a single accelerometer axis at the maximum possible rate. Here's how my experiments and thoughts progressed:

First Attempt
The IMU has a large internal FIFO - enable it and let it collect samples at a high rate. When we hit a good threshold (e.g. 32 samples), generate an interrupt and read out the samples. This sounds like a common-sense way to handle the situation. The samples will be written at the correct rate into the internal memory and then just grab them periodically. For STMicro (and other vendors), the FIFO can't be read quickly because of two problems:

- There are only two sequential registers to read data out of the FIFO. You must write the register address and read two bytes over and over to get the data out. This gums up the works quite a bit.

- The FIFO writes all three axis of the sensor you enable (Accel or Gyro). This means that if you only care about a single axis, you must read the other two also. Reading only the Z axis from the FIFO goes like this:
        Write FIFO level register address
        Read FIFO level value (sample count)
        While (samples to read) {
            Write FIFO data register address
            Read 1 FIFO sample (2 bytes of X)
            Write FIFO data register address
            Read 1 FIFO sample (2 bytes of Y)
            Write FIFO data register address
            Read 1 FIFO sample (2 bytes of Z - we keep this one)
        }
Now, calculate the maximum rate of Z values you can read...not great. Worse still is that I've seen many implementations of this code that access the LOW and HIGH byte of the FIFO data as separate transactions - cutting the data rate in half again. Here's an example from Seeed Studio's LSM6DS3 library:

int16_t LSM6DS3::fifoRead(void) {
    //Pull the last data from the fifo
    uint8_t tempReadByte = 0;
    uint16_t tempAccumulator = 0;
    readRegister(&tempReadByte, LSM6DS3_ACC_GYRO_FIFO_DATA_OUT_L);
    tempAccumulator = tempReadByte;
    readRegister(&tempReadByte, LSM6DS3_ACC_GYRO_FIFO_DATA_OUT_H);
    tempAccumulator |= ((uint16_t)tempReadByte << 8);

    return tempAccumulator;
}

Even if I wanted to capture all three axes instead of just one, the FIFO design literally kills the performance. However, the X/Y/Z sample registers are sequential in the register list; I can request the six bytes in a single read transaction. This leads me to believe that the FIFO was made for a system that is super busy with other tasks and gets to read the data once in a while. The data rate will be slow due to all of the time spent fighting the register interface, but the sensor can be ignored for extended periods of time and not lose any data.

Second Attempt
Without the FIFO to store the samples with perfect timing, we'll need to use the interrupt pin to tell us when each sample is ready to read. This necessitates immediately reading the sample, otherwise it will be discarded. A sad feature of the STMicro IMUs (and probably others) is that it's not sufficient to read out the data when an interrupt is triggered, the status register must be read in order to reset the interrupt status. Here's the pseudocode for reading one to three axes at the maximum possible rate:

while (1) {
    if (bIRQTriggered) { // this can be done in an ISR
            Write status register address
            Read 1 byte status (clears IRQ flag)
            Write Accel/Gyro data register start address
            Read 1 to 3 axes (2 to 6 bytes)
            Store the sample (Do something with the data)
    }
}

With the method above, I was able to read the Z axis of the accelerometer reliably at a rate of 833 samples per second. I was also able to write the data in groups of 32 samples to the USB CDC-Serial driver without losing any samples. Since the IMU only supports a limited number of sample rates, the next higher rate (1660s/s) did not work reliably on the nRF52840 at 400Kb/s I2C rate. I'm sure the IMU can handle a higher I2C rate (e.g. 1Mb/s), but the nRF52840 doesn't support higher clock values in an accessible way. An ESP32 might be a good MCU to try a higher rate.

Conclusion

This was definitely a useful exercise for me. I was reminded about the underlying implementation of USB CDC-Serial and knowing how it works is key to getting the maximum throughput. I also definitively resolved any doubt about IMU FIFOs and how they don't help you achieve higher sample rates; they merely help you to not miss any samples when your CPU is busy with other tasks. A non-standard I2C IMU could potentially provide a 'streaming mode' where you could read samples at a much higher rate, but I'm not aware of any vendor selling such a part. If you need to read samples from a low cost IMU at its maximum rate (typically they offer 2000-3000 samples per second), you'll need to use the SPI interface and at a much higher clock rate.

Comments

Popular posts from this blog

Surprise! ESP32-S3 has (a few) SIMD instructions

How much current do OLED displays use?

Fast SSD1306 OLED drawing with I2C bit banging