Thanks, I have read this application note and applied it to arrive at the optimized version in my original post.
I can still move to 8-byte loads to improve performance, and I think that should be fast enough for my purposes. However, I'd still be interested to know if there is something in the C64+ instruction set that could help optimize the byte-shuffling I am doing (right now I'm extracting bytes, masking them and shifting them - someone more experienced might be able to point out a better way.)
↧
Forum Post: RE: Help me optimize this 12bit to 16bit decoder
↧