Convolution.vi is could be made much faster.
First, it does not seem to take advantage of the presence of multiple processor cores. Convolution could be parallelized by cutting one input array into pieces and performing the convolution on the different pieces in different cores. It would be nice if the vi did this automatically.
Second, the vi only works on floating point values (doubles). A polymorphic version that used integer arithmetic (using the "direct" method) instead of floating point would be faster when integer arrays are being convolved.
Any idea that has not received any kudos within a year after posting will be automatically declined.