Calculating euclidian distance faster

alexderjuengere · ‎05-23-2016

@altenbach wrote:
Ok, I implemented some of my ideas and got a version that does 512x512 with one big circle in under 3 seconds. (This is really the worst case scenario, for example if there are a couple of smaller circles it is significantly faster.)

Did you implement the idea you suggested earlier?

"I would probably do a lookup table of relative coordinate distances. You should never have to compare more than about one circle per pixel."

@joelsa wrote:
I got the time down to ~150s for the 512x512 image on my Quadcore processor. Merging the inner loops alone brought 60 seconds.

That's about the same I can achieve with a dbl distance map.

dbl distance
128:   0.521s
256:   8.023s
512:   147.686s

With a sgl distance map, it gets a bit faster, notice that there's no noticable difference in the grayscale output, if you change from dbl to sgl.

sgl distance
128:   0.519s
256:   7.641s
512:   123.346s

Using a different distance transform, which will affect the appearance of the grayscale output, but may still be acceptable, is likely to speed this thing up.

But it will still take a double-digit number of seconds to calculate (I did some testing)

So, altenbach, I am really curious, how to make this faster, I would really be delighted, if you attached your vi or give us some clues how to use a lookuptable in this context

Regards,
Alex

altenbach · ‎05-23-2016

@joelsa wrote:

@altenbach Sharing your VI would be much appreciated as I understand the correlations you are talking about, but do not have a clue on how to implement those. So if you would share it, I could learn how an experienced programmer would do this. (Looking at other people's code is actually the best method to learn IMHO)

Do you really only have LabVIEW 2010 or did you downgrade for posting? (I would prefer to attach in a more recent version if possible. Does 2010 have parallel FOR loops, concatenating tunnels, and subVI inlining?)

Anyway, here's a quick project of the faster version (<3s on my laptop. I will try on my 32 core Xeon later :D). There are some notes on the block diagram explaining the algorithm. The lookup table is generated in a subVI (the LUT generation is included in the timing, but if this is called often in the same session, you could cache the results, speeding things up by a fraction of a second). I have also included a demo VI (play LUT), that shows how the values are arranged in the LUT.

I am sure there are a couple of things that could be tweaked. This is just a first draft, so modify as needed. There could also be bugs. It might be better do do the LUT offsets as a cluster of integer xy points.)

Anyway, here's a ~~2010~~ and 2015 version of the same code. Let me know if you have any questions.

(2010 version was broken because of unsupported new features. Sorry. I attched a modified 2010 version below).

LabVIEW Champion.

altenbach · ‎05-23-2016

altenbach wrote:
It might be better do do the LUT offsets as a cluster of integer xy points.)

Nah, it only gains about 1%. Might still be worth it because the LUT uses fewer bytes in memory.

(looking at my code, one bug is misspelling my name in the comments 😮

That's why I program graphically!!! :D. I might swap out with a correctly spelled version later...)

LabVIEW Champion.

altenbach · ‎05-23-2016

@altenbach wrote:
Do you really only have LabVIEW 2010 or did you downgrade for posting? (I would prefer to attach in a more recent version if possible. Does 2010 have parallel FOR loops, concatenating tunnels, and subVI inlining?)

Sorry, the downconversion to 2010 does not really work, because it breaks the LUT subVI (It substitutes a shift register for the concatenating tunnel thus disallowing parallelization. You should probably replace with a regular tunnel and reshape to 1D before the sort operation, for example.

Here's a version that hopefully works correctly in 2010. (~same speed).

LabVIEW Champion.

joelsa · ‎05-23-2016

Thank you very much, this is really appreciated. Now I am going to find out how this magic works. 😄 I got LabVIEW 2009-2012 at home (used it a little bit way back then), but also got 2013-2015 at work. Just used 2010 for posting, so no need to downgrade.

Edit: Works well for LabVIEW 2010. 3.129 seconds. Roughly 0.2 seconds for 256x256 image.

altenbach · ‎05-23-2016

@joelsa wrote:

Edit: Works well for LabVIEW 2010. 3.129 seconds. Roughly 0.2 seconds for 256x256 image.

Yes, the speed is of course related to the number of pixels, but even more to the size of the white area (e.g. the average distance to the nearest black pixel, which is in the worst case scenario with one big filled circle as we have here). For example if you would place four circles of half the size in each quadrant of the 512x512, it will be much faster. Let me know if you have any questions.

LabVIEW Champion.

altenbach · ‎05-24-2016

Just as an additional datapoint:

Setting the FOR loop to 32 parallel instances and running on my Dual Xeon E5-2687W (32 virtual cores), drops the 512x512 time down to ~504ms 😄

(1024x1024 is under 7 seconds)

LabVIEW Champion.

LabVIEW

Calculating euclidian distance faster

Re: Calculating euclidian distance faster

Re: Calculating euclidian distance faster

Re: Calculating euclidian distance faster

Re: Calculating euclidian distance faster

Re: Calculating euclidian distance faster

Re: Calculating euclidian distance faster

Re: Calculating euclidian distance faster