When working with SIMD intrinsics, it helps to have a thorough understanding of computer memory. In this post, I hope to shed some light on a really simple but essential operation to figure out if memory is aligned at a 16 byte boundary.

When writing an SSE algorithm loop that transforms or uses an array, one would start by making sure the data is aligned on a 16 byte boundary. If not, a single “warmup pass” of the algorithm is usually performed to prepare for the main loop. Many programmers use a variant of the following line to find out if the array pointer is adequately aligned.

if (((intptr_t)array_pointer & 0xF) != 0) {

/* unaligned pre-pass */

}


So what is happening? I will use theoretical 8 bit pointers to explain the operation. Lets illustrate using pointers to the addresses 16 (0x10) and 92 (0x5C). 92 being unaligned.

16          92
0001 0000 | 0101 1100

We first cast the pointer to a intptr_t (the debate is up whether one should use uintptr_t instead). This allows us to use bitwise operations on the pointer itself.

Next, we bitwise multiply the address with 15 (0xF). This operation masks the higher bits of the memory address, except the last 4, like so.

16          92
0001 0000 | 0101 1100
0000 1111 | 0000 1111  & 0xF
---------------------

The cryptic if statement now becomes very clear and intuitive. We simply mask the upper portion of the address, and check if the lower 4 bits are zero. If they aren’t, the address isn’t 16 byte aligned and we need to pre-heat our SIMD loop.