Algorithm used for Sort 1D Array

RavensFan · ‎12-08-2016

I ran some benchmarks based on some data I'm using.

Sorted array size
First	1200	160000	200	160000
Second	160000	1200	160000	200
Merge time (msec)	32	33	31.5	30.3
Sort (msec)	79	37.6	80.6	37.7

In general, the merge routing (grab first from either away, and build into new array) was faster than the sort method (concatenate the two arrays, then use 1D sort.)

For the sort method, the order of the arrays mattered. When I put the shorter array ahead of the longer array, it took about twice as long as when I put the longer array first. That may be because there are more moves. The shorter array is supposed to be newer data that should remain near the end.

Overall, the merging of 2 presorted arrays seems to be faster than combining then sorting those 2 arrays. It might be a little faster or a lot faster (2x or more) depending on how many moves need to happen in the sort method. Larger arrays at the beginning and smaller at the end are faster than putting the small array at the beginning. (Though it might be because the smaller array contains the later timestamp data that belongs at the end.)

The merge method seems consistently quicker no matter the size or order of the arrays. So I'm going to work with that.

I found that merging a larger "small" array in was not that much longer than merging in a smaller "small" array. A larger array represents a larger chunk of time data. That means I don't have to call it as often, and it isn't taking really any longer to process. So I have a better chance of my processing not falling behind.

So I think my best method to the merging of two data streams is to combine take each packet of data and put it into separate buffers on the host side. Then use the merge method to combine 2 equally sized chunks together then place that in tthe DMA FIFO. Taking a small section of one stream and merging into a larger buffer, then later taking a large section of the other into a large buffer, just takes too long and the side of the process working on the small sections falls behind quickly and winds up making everything fall behind and just causes the backlog of data to grow exponentially. It should better balance the processing, but also keep the combining closer to the source of data, but just one step off of where I have it now.

I still don't know exactly what method the Sort 1D Array primitive uses. But using a straight up merge of presorted data seems to be consistently quicker.

RavensFan · ‎12-08-2016

@RavensFan wrote:

I ran some benchmarks based on some data I'm using.

In general, the merge ~~routing~~ *routine (grab first from either ~~away~~ *array, and build into new array)

Overall, the merging of 2 presorted arrays seems to be faster than combining then sorting ~~those 2 arrays~~ *the combined array.

So I think my best method to the merging of two data streams is to ~~combine~~ take each packet of data and put it into separate buffers on the host side. Then use the merge method to combine 2 *more or less equally sized chunks together then place that in tthe DMA FIFO. Taking a small section of one stream and merging into a larger buffer, then later taking a large section of the other into a large buffer, just takes too long and the side of the process working on the small sections falls behind quickly and winds up making everything fall behind and just causes the backlog of data to grow exponentially. It should better balance the processing, but also keep the combining closer to the source of data, but just one step off of where I have it now.

Rereading message after the fact had some errors I missed earlier.

UliB · ‎12-09-2016

Hello RavensFan.

This is a very interesting topic, because I'm doing something very similar and I learned a lot.

I use one FIFO to transfer an array of positions and an array of digital pattern to the FPGA. The digital pattern is a mix for two independent devices, which would highly benefit if I use two separate FIFO. I'll have a look at it.

@RavensFan wrote:

The FPGA card I am using only has 3 DMA's. And I'm already using the other 2, 1 for grabbing analog inputs, the other for grabbing a string of data coming from digital inputs. That leaves just 1 for the analog outputs. I wish I had another DMA. If I did the whole problem would go away, FPGA could then read both independently.

Just one idea: Is there any chance you can mix 'analog input' and 'string from digital input' in one FIFO and get it divided in the host?

e.g.:

- put 1 analog sample and then 1 string

- put 10 analog samples and then 3 strings

- insert a flag to the data to distinguish between analog sample and string

- ...

I do this with a Target-Host FIFO. I put 1 I32 Position and 3 I16 analog input samples together as an array of 5 I16 values in the FIFO. In the host I split the array to the original data types.

UliB

RavensFan · ‎12-09-2016

You can definitely mix together different data and use flags to indicate what the data represents and break things apart accordingly in the host.

The key things for FIFO's are identifying the basic communication element. Then identifying how many of those might make up a larger packet of data. And identifying how smaller pieces of data can be packed together within that larger data packet.

For my FPGA to Host communication, I have one FIFO dedicated to reading the 8 analog input channels (each an I16) and sending them to the host along with a 64-bit timestamp from the clock running on the FPGA. I've broken it up into 4 U64 elements I write out in order. The host will read 4 elements at a time (or multiples of 4) and know how to break them apart using the flags I set.

First U64, contains a flag where the first 4 bits are 0000, and the remaining 60 bits represent the 64 bit timestamp (with the first 4 bits masked off, which they'll never hit for millenia)

Second U64, First 16-bit word contains a flag of 01xx xxxx xxxx xxxx (x's are zeroes, but I don't care about them at the moment, they could be whatever), to indicate this is data group 1. The next 3 16-bit words are the I16 from analog channels 0-2.

Third U64, First 16-bit word contains a flag of 10xx xxxx xxxx xxxx, to indicate this is data group 2. The next 3 16-bit words are the I16 from analog channels 3-5.

Fourth U64, First 16-bit word contains a flag of 11xx xxxx xxxx xxxx, to indicate this is data group 3. The next 2 16-bit words are the I16 from analog channels 6-7. The final 16-bit word is all zeroes and unused at this time.

This way on the host, I read 4 U64 elements at a time. Then in a For Loop, I check the first two bits of each element to decide what to do with the remainder of the data. That works well for me.

Then I process and analyze the data and send some of it back to the FPGA

My other FPGA to Host FIFO is reading timestamps based on triggers from 1-4 digital inputs. It has some flags being mixed in with that data. Processing and analysis algorithms are going on in the host. The results of that are also being sent back to the FPGA.

For the Host to FPGA FIFO, I'm doing essentially the same things to pack the data. But with the the data coming from two different processing loops mentioned above, and usually do not have the same timestamps. I want to funnel the data in order through the FIFO. The FIFO reads 4 U64's at a time and waits until the timestamp has occurred before updating the analog output channels. An additional piece of the data in that group is a flag that tells which of the channels should be updated and which should not. (Because I only want source A to update channel 1 and not 2. I only want source B which yields different timestamps as a result of its processing to update channel 2 and not 1.)

In theory I could merge the 2 input FIFO's into one, and create 2 different output FIFO's. It would certainly help the problem I'm trying to solve with outputing the two different data streams. But I'd have to go back and rewrite a lot of the input code and I'm really hesitant to do that. The digital input data stream is actually the most important of everything I've got going on. The second of the two output streams was a late add to the project, and should only be a temporary situation. Something we are looking at while doing some prototype testing, but will go away in the final revision.

So rather than changing the head end of the system and risk messing up everything already developed and downstream processing, I was looking at adding this secondary stream within the primary output stream I already had. I came up with an idea of how to merge the two streams and implemented it, only to find out it wasn't working. I came up with an improvement on it, thought it was working, only to find out the modifications were taking longer in processing and falling behind. With what I learned in the last couple days, I think I have a new way of solving it and will be trying it out. If that fails, then I'll start weighing the pros and cons of doing what you suggested, but the cons of making modifications at the top end of the process and having that trickle down still seem pretty heavy right now.

Thanks Uli, Ben, Bob, Blokk, and everyone else for being a sounding board and providing suggestions. At the moment, I feel like I'm just around the corner from getting this working the way I want.

Yamaeda · ‎12-09-2016

@RavensFan wrote:

Hello Nathan,

Your code resembles what I've drawn by quite a bit.

Here is what I've done. I've just created my datasets to test it out.

You always compare the 1st elements ...

/Y

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems

RavensFan · ‎12-09-2016

Thanks. You are right. I discovered that immediately after I tested it for the first time. At the time I posted it, I had just finished it, but hadn't tried it to find the dumb mistakes I made.

I also needed to implement some code that Nathan showed in his example (thanks, that saved me from having to work out the logic) for picking the proper elements once it reached the end of an array.

Attached is the updated code. (I dropped an instance on itself to just show the icon I made symbolizing what it is doing.)

nathand · ‎12-09-2016

A bit off-topic but I have to ask. Why use a pre-allocated array and a shift register rather than an auto-indexing tunnel? While I haven't benchmarked it, I would expect them to be equally efficient (LabVIEW will pre-allocate the array at the tunnel output for you). I think the auto-indexing output results in cleaner code (fewer primitives, one less wire running across the loop) so I'd be interested to hear a preference for the shift register.

Ben · ‎12-09-2016

That is a good point nathand.

the size of the output buffer should be the same as the number of iterations and in modern versions of LV the auto-indexing is slightly faster.

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

RavensFan · ‎12-09-2016

@nathand wrote:

A bit off-topic but I have to ask. Why use a pre-allocated array and a shift register rather than an auto-indexing tunnel? While I haven't benchmarked it, I would expect them to be equally efficient (LabVIEW will pre-allocate the array at the tunnel output for you). I think the auto-indexing output results in cleaner code (fewer primitives, one less wire running across the loop) so I'd be interested to hear a preference for the shift register.

I commented on that earlier. I think in this case, it doesn't matter at all because the final size of the array is known either way. Normally, I do just use auto-indexing output tunnels. But because I know preallocating is generally recommended to prevent memory reallocations on ever growing array (more for other array buillding situations, probably not this situation), I decided to lean to the pre-allocated array.

I'll try the other way as well and report back.

Ben · ‎12-09-2016

@RavensFan wrote:

...

I'll try the other way as well and report back.

Please do!

I am betting on the auto-indexing being just a tad faster.

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

LabVIEW

Algorithm used for Sort 1D Array

Re: Algorithm used for Sort 1D Array

Re: Algorithm used for Sort 1D Array

Re: Algorithm used for Sort 1D Array

Re: Algorithm used for Sort 1D Array

Re: Algorithm used for Sort 1D Array

Re: Algorithm used for Sort 1D Array

Re: Algorithm used for Sort 1D Array

Re: Algorithm used for Sort 1D Array

Re: Algorithm used for Sort 1D Array

Re: Algorithm used for Sort 1D Array