This was the slice of code that I never did post in the DELTA topic, because it's now unused due to my finding out about MMX and using inline assembly instead (which by the way is several orders of magnitude faster than raw C++).
I finally found this thing again in a random text file and thought I'd post it for all to see.
unsigned long int srcpx = 0x12AC5B;
unsigned long int dstpx = 0xFC1145;
int sum = ((dstpx & 0x7F7F7F)+(srcpx & 0x7F7F7F));
int ovf = ((dstpx & 0x808080)+(srcpx & 0x808080)+(sum & 0x808080));
int msk = ((ovf>>1) & 0x808080);
msk |= (msk>>1);
msk |= (msk>>2);
msk |= (msk>>4);
dstpx = ((ovf & 0x808080) | (sum & 0x7F7F7F) | msk);
//Result of calculation should be: 0xFFBDA0
What this does is add together two RGB pixels, saturating the values at 255 if the addition overflows. All done without conditional jumps. The only catch is that the alpha value will probably get clobbered in the process (which was by design).
The same thing in MMX assembly is roughly:#This code accounts for Magic Pink (which results in adding nothing).
#This processes two pixels at a time, as well.
# Register esi is the source pixel pointer.
# Register edi is the destination pixel pointer.
# mm7 == 00FF00FF 00FF00FF
movq (%%esi), %%mm1
movq %%mm7, %%mm3
pcmpeqd %%mm1, %%mm3
pandn %%mm1, %%mm3
movq (%%edi), %%mm2
paddusb %%mm3, %%mm2
movq %%mm2,(%%edi)