Dear Atsushi,
That explanation makes perfect sense. However, shouldn't stack accesses show up as memory reads and writes in Trace? They don't seem to, which is what threw me off for some time on this issue. Stack accesses seem to show up as "Load Address", similar to code access. That led me to think that stack accesses would go through L1P cache, not L1D cache, but that's not the case. Can you confirm my understanding?
Thanks,
Manu