Previous variant was basically derived from C and MMX implementations.
Now new variant makes use of 'vmax' instruction, which is available in
NEON and can do this job faster. The same method for calculating scale
factors is also used in 'sbc_calc_scalefactors_j_neon'.
Benchmarked without joint stereo on ARM Cortex-A8: