Missing sample detection

xl600 · ‎07-24-2018

I have data that is sampled at roughly regular intervals. It's coming in from a network and I time stamp each packet. The jitter is small (< 0.1dt typically), but every now and then it spikes (< about .8dt) or some packets are never sent. The data is packed such that I can create a timestamp (Labview RT) on the UDP packet when received and the sample data (Multiple channels) are packed into each packet. That allows me to align the timestamps perfectly to the data. Network sender clock and Labview RT clocks are sync'd via IRIG. dt is usually around 0.01s.

I'd like to be able to detect where missing packets are in the recorded data stream and fill those in with the last known value and the expected timestamp.

My first thought is attached. It takes in the timestamp data (Packed into a WFM with the dt set to what the incoming data was sent at) then determines the sample-sample delta (Jitter). Anything > dt(1+threshold) is considered to have a missed sample(s) preceding it. This 'kinda' works but seems like a gross over simplification of how to approach this, especially considering the rare cases where the jitter is large, but not actually missing samples. Are there any VIs that might be useful here?

Thanks,

XL600

Bob_Schor · ‎07-24-2018

Thank you for attaching some code. I see you are using LabVIEW 2016, and gather that you are running LabVIEW RT on some remote Target to which you connect via TCP/IP. Is this correct? Is the Target on the same network? What is your sampling speed? Are you sampling continuously, or in bursts? [I'm trying to get an idea of the data stream, how big, how much, etc.].

I absolutely would not use UDP in conjunction with LabVIEW RT. You have no control over the packets, and are more-or-less forced to use some post-processing (like "Order by Timestamps", which, of course, only works if your data have timestamps).

I had a fairly modest data acquisition routine that saved data continuously, 24 16-bit "analog" channels collected at 1KHz on a PXI platform (I put "analog" in quotes, as several channels were derived from digital values that were also updated at 1KHz and saved as though they were analog data). I used Network Streams to "stream" the data to the Host PC (which was busy showing me the data as it "passed through" on its way to disk, and also providing Front Panel control so I could Pause, or change the view of the data, or other "occasional" things). I put in a "Clock" signal to detect if I ever missed saving data -- never did. I relied on the RT side, which had the accurate crystal clock, to ensure that the data was "time-accurate". As I recall, I sent data across the network 50 points at a time (or for 24 channels, 2400 bytes at a time), which made for a very convenient 20-points-second update of my displays (I only displayed the first point, or sometimes the average, as who can "see" a 1KHz signal?).

Bob Schor

xl600 · ‎07-24-2018

Three computers actually. One is a special device that outputs UDP packets at 100Hz (Or whatever it's configured to). The second is my RT PXIe system (8135RT controller) that picks up the UDP packets on a dedicated interface. I then connect to the RT box with a laptop. But all the laptop does is post-process the data captured by the RT system. The RT box just timestamps and logs the UDP packets whenever they arrive.

I do have timestamps and there is no network congestion (Point to point network). The UDP scheme cannot be changed because the source of the packets is a pretty major piece of hardware that has been in use for over 20 years. It's not Labview.

My RT code does not miss data (In any testing I've done so far). All received packets are captured with both their embedded timestamps and the timestamp I put onto them when I receive them. I compare both to ensure my clock skew isn't excessive (Never is). I generate the sample timestamp array based on the source's embedded timestamp as a result since it doesn't include the UDP transport latency (Which is generally 50us-70us).

So, what I wind up with is a 'sample' stream which can usually be considered a WFM with a steady dt (t0 is just the first packet's timestamp). But when the source skips packets, which it does rarely, those missing packets appear as a compression in the assumed WFM if they aren't re-inserted with some sort of NaN or replicated data. Detecting the position and calculating the number of missing samples is the basic question (In the presence of the jitter).

Kevin_Price · ‎07-24-2018

EDIT: everything below the line was written before I saw the OP's response in msg #3. That posting addressed pretty much all the advice and caveats I wrote up. I'm leaving it below in case some other reader can benefit from the general advice, but it's clear to me that the OP doesn't need it.

---------------------------------------------------------------

It helps that you start with a system that syncs the sender and receiver clocks via IRIG. Given that, timing information should be inserted into the data packet at the *sending* end. In general, timing info should attach to data as near to the data source as possible rather than somewhere else downstream. (In general, one also needs a way to either sync or at least correlate the sender's and receiver's times, but that's already covered for your situation.)

If you cannot make the sender put timing info in the packet, there's a limit to what you can do at the receiving end. When it appears that a sample was missed, it will be at best difficult to know in the moment whether or not one actually was. And when you do miss one it'll be impossible to know (for sure) which one.

Mainly, I'm emphasizing that efforts on the sending end will pay much greater dividends much more easily than efforts on the receiving end, IF that's an available option. Else, limit your expectations. You won't be able to prove the correctness of whatever tweaks you do to the data or timestamps anyway, so keep that in mind as you decide how much effort you put into the estimation algorithm.

-Kevin P

CAUTION! New LabVIEW adopters -- it's too late for me, but you *can* save yourself. The new subscription policy for LabVIEW puts NI's hand in your wallet for the rest of your working life. Are you sure you're *that* dedicated to LabVIEW? (Summary of my reasons in this post, part of a voluminous thread of mostly complaints starting here).

Kevin_Price · ‎07-24-2018

Are you saying that the *embedded* timestamp has significant jitter?

If not, it should be pretty easy to detect when samples are missed and how many.

If so, then *perhaps* you could consider treating the embedded timing info as another stream of variable data. Instead of assuming constant dt (despite contrary evidence in the embedded timing info) and forcing the data into a waveform, you could instead pass along the timing info as a distinct data stream. So for every 100 data samples you pass along, you also pass along the 100 embedded timestamps. Defer dealing with any discrepancies or missing data until you get downstream from the RT system, maybe even wait for long-after-the-fact post-processing.

This is analogous to things I've done when doing variable interval measurements with counters while saving to TDMS. Variable timing doesn't fit into a waveform datatype, so instead I write the variable timestamp info as a separate datastream.

-Kevin P

CAUTION! New LabVIEW adopters -- it's too late for me, but you *can* save yourself. The new subscription policy for LabVIEW puts NI's hand in your wallet for the rest of your working life. Are you sure you're *that* dedicated to LabVIEW? (Summary of my reasons in this post, part of a voluminous thread of mostly complaints starting here).

Bob_Schor · ‎07-24-2018

@Kevin_Price wrote:

Variable timing doesn't fit into a waveform datatype, so instead I write the variable timestamp info as a separate datastream.

That works, until you lose some part of the timestamp stream!

We don't know much about the data being sent. One thing you can try, particularly if you are streaming multiple channels at once, is to add a "clock" channel, something that increments each "tick" and thus counts 1, 2, 3, 4, and so on. Now, if a packet goes missing and you see 34, 35, 42, 43, 44, you know "where the hole in the data" is located, and how big a hole it is. [There's probably some other good reason this scheme won't work, I suppose ...]. Hmm -- this sounds like adding enough "extra" information that you can recover, or at least identify, data loss. Sounding like Checksums, RAID, error-detection encoding ...

Bob Schor

xl600 · ‎07-24-2018

No, it has some jitter. If not missing data, it's usually in the us range in fact.

I do use the timing stream as a separate stream, but I have a lot of other code which merges the UDP based WFM data into other data coming from multiple DAQmx cards (AIN channels). The entire design really does need the UDP data to look like regular dt based WFM data for ease of post processing (Don't want to have to spread unique handling throughout the entire design).

I wasn't actually attempting to embed the correction into the RT process itself. I log everything to TDMS files in their raw state (Including the timing data) on the RT side. Then I use a pile of non RT code to read out the TDMS files. That read process is where the correction takes place so that the output of the read is just a WFM no matter what the source data type was. Most of the time, users want to just plot and process the WFM data without regard to the timing data. But every now and then, someone wants to do detailed timing analysis using the actual timing data. I'm thinking if I can't be 100% sure the missing sample injection process is perfect, then I will have to just tell them they can't do timing analysis is there are any detected missing samples. There's ways to prevent missing samples but it restricts how we operate the source equipment. If the sample injection process is pretty good (Resulting in alignment +/- a few samples), that's probably fine for almost all of the testing we would normally do.

xl600 · ‎07-24-2018

A source counter may be a possibility (I'll have to consult the local guru's). I was trying to use just the source timestamps, but that darn jitter and non-integer math makes it tricky.

Kevin_Price · ‎07-24-2018

Just thinking out loud here, quick musings...

It seems useful at *some* stage to have a defined method to tag the distinction between "true" waveforms (such as the DAQmx AI channels) and "kinda sorta" waveforms (such as your UDP stream). For example, this could be embedded as a TDMS channel property.

There will be pros and cons that aren't mine to evaluate whether to fill in missing data via interpolation or using something NaN values or some other algorithm. But I do think it may be important to tag the channels that have been manipulated so that downstream processors *can* be forewarned. (Some may choose to ignore the warning, but that'll be on them.)

At some point, I recall using a waveform datatype to hold a stream of variable-interval timestamps too, also in service of uniform datatypes for logging and messaging. t0 played its normal role, I don't recall how I set dt -- I probably used either NaN or a negative number to help forcibly indicate that it wasn't real, and then Y was an array of doubles that represented relative seconds from t0. Then, in *many* respects, the variable-interval time channel could be handled the same way as any other constant-interval waveform.

Overall, if some users want to do detailed timing analysis, I think you'd better pass through the original raw values from the packets -- both timing info and data. For convenience you can also construct the cleaner, mostly redundant, but partly fictional datastreams. By retaining the true raw data, you can defer the processing methodology to those users who care most and who may have different preferences than you or each other. Meanwhile, you've also served the majority of use cases by constructing the convenience waveforms (and tagging them as being manipulated).

I'm a big proponent of retaining raw data whenever feasible. Go ahead and do some inline processing for convenience, but keep the raw data too.

-Kevin P

CAUTION! New LabVIEW adopters -- it's too late for me, but you *can* save yourself. The new subscription policy for LabVIEW puts NI's hand in your wallet for the rest of your working life. Are you sure you're *that* dedicated to LabVIEW? (Summary of my reasons in this post, part of a voluminous thread of mostly complaints starting here).

xl600 · ‎07-24-2018

That's exactly what I already do. In fact, I create an index TDMS (Not a TDMS index...) which describes all the channels, their WFM properties, base data types, and system data types (Origins) for all TDMS files (Including spanned data sets lasting days). It's monstrously complex but has been working really well.

dt for this UDP stuff is actually calculated by the source system (Takes an average over time) then sends that to my RT system. I could calculate the same thing of course, but it was already doing it.

The users 'request' data in specific forms from my. I can export the raw data or processed data in various formats (csv, tdms, or a few proprietary ones). I'd like to add mat output, but I don't yet have a super reliable way to do it (Working slowly on that). Sounds like we're pretty well in sync

LabVIEW

Missing sample detection

Missing sample detection

Re: Missing sample detection

Re: Missing sample detection

Re: Missing sample detection

Re: Missing sample detection

Re: Missing sample detection

Re: Missing sample detection

Re: Missing sample detection

Re: Missing sample detection

Re: Missing sample detection