I am not sure why you are carrying the kernel as DBL, but maybe you need fractional numbers for it (Would SGL be OK?). In this case, you should replace the "To I32" in the center of the big loop to a "To DBL" Right now you are converting the 3x3 U16 array to I32 and then immediately to DBL (notice the grey coercion dot on the multiply!). The middle step is just extra work and will not change the result. (You could even simply delete the "To I32", because the coercion will automatically convert it to DBL for the multiplication).
It depends on the size of the convolution area vs. the total image, but if they are similar, you should insert a "To DBL" right between the "Perfect Target" control and the loop. This way, each element is only converted exactly once. If you do it inside the loop as described in the first paragraph, you do up to 9x more conversions. (This will only be OK if you typcally convolute significantly less than 10% of the total image area. Do the math!)
If this is often called, you should also pre-compute the formula and embed the 9 values as a diagram constant. They never change! (I assume that the Filter Kernel input varies). See attached (LabVIEW 7.1).
I am still not sure why you need duplicate inputs for the perfect target. Are they different when called?