Please see this wiki article. You'll see that you need to change --opt_level=off (you wrote the equivalent -Ooff) to --opt_level=2 or higher. And you need to use --opt_for_speed=3 or higher.
Changing to those options does not result in NEON instructions for your simple example. That is because of a lack of optimization opportunity. When I changed the code to this ...
void tstfn(float *x, float *y, float * restrict z, float *k, int length)
{
int i;
for (i = 0; i < length; i++)
z[i] = (x[i] + k[i]) * y[i];
}
That change, along with the option changes, results in NEON instructions.
Why the restrict on the z pointer? This wiki article describes restrict in detail. In this case, it tells the compiler that the memory locations associated with z can only be written by z. That allows the compiler to reorder when memory is accessed, and thus order things so that NEON instructions can be used.
Thanks and regards,
-George