__x128_t is not a native, built-in type in which all operations are supported (like, say, float). It's a "container type" that allows vector data to be stored. That's why you're getting the error. The _get32f_128 intrinsics are the proper way to extract values. In most cases, these "get" intrinsics will result in no instruction or 1 move instruction.
Regarding the suboptimal results: Can you post the C/C++ code of your loop or function? In some cases, if a loop is not M unit bound (i.e. isn't limited by multiply bandwidth), the use of qmpsp can make things worse because of increased register constraints. But there could be many other things going on, so it would be best if you could post the code.
-Todd