Hi JK,
Can you please confirm which Simulator configuration you are using. If it is the C64x+ CPU configuration, then this does not have any memory specific latency and would contain only CPU cycle.
Again the CPU model supported in CCS is very an accurate model and hence I do expect similar or worse number in the board.
Note :Also make sure you are running the application in optimized mode.
BR/Abhilash