Posts

Showing posts from July, 2018

Mind the cache (reference to "mind the gap" in the UK tube)

Memory read and write performance may be identical as far as the physical memory is concerned, but CPU manufacturers have designed unique hardware to manage the very different situations of reading from and writing to memory (from a program's perspective).

When reading from memory, your program normally needs the data as fast as possible and often will halt the CPU execution until the result of the read is completed. The cache logic is designed to improve the overall speed of memory by optimizing access to the current data set (a subset of the total memory). Intel (and recently ARM) have designed smart cache logic which can predict read patterns and prefetch memory into the cache in anticipation of it being needed.

When writing to memory, there normally isn't any hurry to complete the write, so the write buffer can collect a few write requests without stalling the CPU. The data will work its way through the write buffer and cache logic and only stall the CPU when a write occur…

SIMD

Image
The dead horse that I like to keep beating - everyone should use SIMD (single-instruction-multiple data) in their software wherever possible. It's basically free performance with no down side. Every PC (x86 machine) sold in the last 10 years has it and so do mobile/embedded machines for at least the last 5 years. What is SIMD? It's a set of powerful instructions which make use of extra-wide registers (typically 128-bits) and can do multiple operations in parallel. For example, a regular CPU instruction can perform a single math operation (e.g. integer addition), while a SIMD instruction can do 2, 4, 8, or 16 separate additions in parallel in the same amount of time. It means your program can do its work many times faster. Many programmers are already aware of the existence of SIMD, but assume that by enabling the "auto-vectorization" option of their C compiler, their program will magically contain beautifully crafted SIMD code. You can already guess from my last sent…