Mind the cache (reference to "mind the gap" in the UK tube)
Memory read and write performance may be identical as far as the physical memory is concerned, but CPU manufacturers have designed unique hardware to manage the very different situations of reading from and writing to memory (from a program's perspective). When reading from memory, your program normally needs the data as fast as possible and often will halt the CPU execution until the result of the read is completed. The cache logic is designed to improve the overall speed of memory by optimizing access to the current data set (a subset of the total memory). Intel (and recently ARM) have designed smart cache logic which can predict read patterns and prefetch memory into the cache in anticipation of it being needed. When writing to memory, there normally isn't any hurry to complete the write, so the write buffer can collect a few write requests without stalling the CPU. The data will work its way through the write buffer and cache logic and only stall the CPU when a write o...