How can you crossfade (fade in/out) arrays?

madgreek · ‎03-30-2007

Kevin you are 100% correct about the first 2 points....

about If the 256-sample chunk evaluates as voiced, then an evaluation of period length and a compression/expansion factor determines how to sub-divide and merge those samples. Whatever is done, the result will still be a 256-sample output, at least for the case of pitch-shifting alone.

The compression/expansion factor does not play any role at the sub-division of the chunk right now. The sub-division has only to do with what size the period is in that chunk and yes the result would still be 256 (or very close) at the sub-division. (There is a small remainder at the end of each chunk after been sub-divided which is what i am trying to correct right now)

compression/expansion comes into play later on when we try to merge the sub-chunks with the windowing.

About Milq's code, some vi's changed in Labview 8.2 from my version and i have a hard time to rebuild it (i dont know exactly what result should i expect to be sure i did it 100% correct)

Kevin_Price · ‎03-30-2007

Honing in ever closer...

The compression/expansion factor does not play any role at the sub-division of the chunk right now. The sub-division has only to do with what size the period is in that chunk

Understood. Sub-division size (and thus also the # of sub-divisions) is determined only by the "period" in that chunk.

and yes the result would still be 256 (or very close) at the sub-division. (There is a small remainder at the end of each chunk after been sub-divided which is what i am trying to correct right now)

But the desired final end result will be 256 samples, right? If the calculated period doesn't divide evenly into the 256 samples (which will probably be very typical), I don't see any real problems with trimming or padding a few samples either way. You just tell me the # of samples you'd like *ideally*, even if that # is something like 31.4

compression/expansion comes into play later on when we try to merge the sub-chunks with the windowing.

Understood. First either delete or duplicate sub-chunks, then use something like windowing to perform the cross-fade where they overlap. At the end of all this windowing / merging stuff, the final result will be 256 samples.

About Milq's code, some vi's changed in Labview 8.2 from my version and i have a hard time to rebuild it (i dont know exactly what result should i expect to be sure i did it 100% correct)

Yeah, a bunch of the icons changed for built-in functions. Please do as much as you can, and highlight (arrows, comments, circles, whatever) the regions where you aren't sure you translated right. That'll give me a great head start!

-Kevin P.

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

madgreek · ‎03-30-2007

Now we are on the same page...

About the sub-division....right now i am trying to trim those extra samples (i had them zero-padded in my code so every chunk to be 256 exactly but they will introduce noise at the end) and add them at the next chunk, so i have only to deal with integer number of sub-chunks.

What i mean is, if lets say one chunk divides into 5 sub-chunks and 15 samples are left at the end, i am trying to add those 15 samples at the beginning of the next chunk therefore having a chunk (256+15) and divide that into the period length (whatever that is). The remaining number of samples (if any) of the new chunk are to be moved to the next chunk and so forth...

i have attached Milq's code in 7.0...i think i have pretty much found every VI except one (i made a note) and some different colored wires, but those i think are not important.

madgreek · ‎03-30-2007

Kevin

i have the 2D array of clusters you asked me this morning....Remember though about the slight problem i am working on with the remaining number of samples....

madgreek · ‎03-31-2007

Kevin

If you tried to run my Td-Psola VI is wrong.

I rechecked the way i break down the voiced chunks according to the period and sometimes it pads the last sub-chunk and sometimes it does not. I dont know why this is happening. I am trying to fix this but its harder than it sounds like

Kevin_Price · ‎04-01-2007

Uh oh, confusion again...

1. You posted "pp_array.vi" which contained 97 elements. To span the entire set of samples in 'hijacked.wav', these 97 elements must each characterize a chunk with size 512 rather than size 256 as the recent posts have specified. Ok, easy enough to adjust to that.

2. You posted "Untitled.vi" which has an array of clusters which looks way different than what I was expecting.

A. Data size is back to 97 array elements by 256 data samples per array element. This only spans 1/2 the samples in 'hijacked.wav'.
B. The label "Hanning{X}" suggests that you have already applied a Hanning window - which I did specifically did *NOT* want.
C. When I graphed the data, it appears to be freq domain rather than time domain. This might explain why you applied a Hanning window first.
D. I will further guess that the reduction of chunk size from 512 to 256 is due to a single-sided FFT being applied.
E. The fact you're giving me freq domain data when I expected time domain data tells me there's a crucial misunderstanding here. I have been thinking all along that the sub-chunks and cross-fade / merging would be applied on time domain data.

3. I think the FFT stuff is needed by your algorithm to determine the fundamental freq in each 512-sample chunk, which in turn is used to break 512-sample chunks into an appropriate # of sub-chunks.
Big Tip: note that the .wav sound data is centered around 128 rather than around 0. To the FFT algorithm, this looks like a DC offset. I'd highly recommend that you subtract off this DC offset before doing your FFT. (You should probably calculate the mean rather than assuming it's exactly 128.)

4. I *think* that I can just use "pp_array.vi" as a map that characterizes each of the 97 chunks containing 512 samples each from the 'hijacked.wav' file. Have you fully verified that your algorithm correctly distinguishes voiced from unvoiced? And that for "voiced" chunks, have you verified that it correctly picks out the "pitch period" or whatever it is that determines how to split into sub-chunks?
I don't think I can do much until I'm sure I can count on those values being correct. A lot depends on the algorithm you run on the freq domain data, and the fact that so far you've left such a large DC offset in there makes the results much more suspect.

Do you have any other examples with both an input .wav file and "known good" result (or set of results for different pitch ratios) to compare against?

-Kevin P.

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

madgreek · ‎04-01-2007

Hey Kevin

A) There are only 256 samples for each chunk because of the FFT. I am only using half the spectrum since the other half is a mirror.

B) I forgot to take out the Hanning window before posting the code the last time. I just used it to emphasize the chunks and be easier to visualize them.

C) It is in time domain that the merging will take place. I just used the frequency domain to apply the pitch detection algorithm which uses the FFT and get the pitch periods. Once those are estimated, we can further on move with the segmentation of the chunks in time domain. (isnt this what i have done?)

3) You are right about the first part. Can you please explain your tip a bit more?what is the 128?

4) Yes i already verified that. I have even tried few other signals and i get the same results. I have gone through every single chunk, and i even counted the number of samples the period length is manually. (i dont know how you ran my code, but if you right click the run button, go to mechanical action and choose "latch when pressed" you will see that i break down segments that have a clear peak, which is the pitch period for that chunk).

I just need to know more about this DC offset what exactly is and how to get rid of it if i have to. I am sorry Kevin but i am very novice here. If you could just explain it some more if thats possible.

I dont have any examples that use this exact method, but different ones. http://www.bernsee.com/dspdimension.com/index.html in the middle of this page you can hear some sounds after been modified if this will help a little more.

Kevin_Price · ‎04-02-2007

C) It appears to me that what you did was to apply an FFT to the 512-sample chunks and used that to figure out the "pitch periods". Then you used the pitch periods to decide how to further subdivide those chunks for the pitch-shifting algorithm. The problem (from my vantage point) is that the data you sub-divided was NOT the original time-domain data but was the freq domain result of the FFT. (Plus the data size got cut in half, which would eventually prove to be a problem as well.)

3. The magic "128" is what appears to be the DC offset of the original 'hijacked.wav' data. In other words, all the sound data appears to oscillate about the value 128 rather than oscillating about 0.

The main purpose of your FFT is to identify a distinguishing frequency. You don't care about DC levels, you don't even want them considered. However, the very first element of your FFT result is a measure of DC offset, and will tend to dominate all other apparent freq content.

What you want to do is to first find the average value from your entire dataset. (For hijacked.wav, it'll probably turn out to be somewhere near 128.) You'll want to get yourself into floating-point DBL's here since that's what the FFT wants. Then you should subtract this average from the dataset, leaving you only oscillations about 0. Only after doing those things would you apply your Hanning window and do your FFT. The result is the freq domain data you can use to determine where to subdivide the 512-sample data chunks. But the sub-division should be carried out on the TIME domain data, not on the FREQ domain data.

4. Ok, good. I saw in the meantime that the values in "pp_array.vi" seemed to match the # of rows of (wrong) freq domain data present in "Untitled.vi". The remaining question for me is whether the DC offset in your data may have influenced the results of your pitch detection algorithm.
Early results while using milqman's code to merge the sub-chunks give me the impression that some voiced parts may have been mis-identified as unvoiced. The overall end result sound was quite choppy, including both pitch-shifted tones and some original un-shifted tones. The presence of unshifted tones seems to me to point back to the algorithm used to distinguish voiced and unvoiced.

-Kevin P.

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

Ben · ‎04-02-2007

Kevin,

When you get are with this thread, could you please post a "trip report" summarizing this adventure?

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

madgreek · ‎04-02-2007

Hello Kevin. Hope you had a good weekend.

Basically what you telling me to do is this:

1) Somehow, estimate the mean DC offset (which it would be around 128) and take it out of the original data (which i am still not sure how to do that).

2) Then use the algorithm to find voiced/unvoiced chunks, as it was used till now.

3) Instead of breaking down the 256 voiced chunks i get from the FFT result, i should take original chunks of 512 as they come from the sound "Hijacked" and use the result from the FFT to break them down? If i understood this correct, this is easily done just by taking as input to the segmenation part the 512 chunks before FFT is applied to them.

4) I will go through the identification process again and check it again.

Regards

Greek

LabVIEW

How can you crossfade (fade in/out) arrays?

Re: How can you crossfade (fade in/out) arrays?

Re: How can you crossfade (fade in/out) arrays?

Re: How can you crossfade (fade in/out) arrays?

Re: How can you crossfade (fade in/out) arrays?

Re: How can you crossfade (fade in/out) arrays?

Re: How can you crossfade (fade in/out) arrays?

Re: How can you crossfade (fade in/out) arrays?

Re: How can you crossfade (fade in/out) arrays?

Re: How can you crossfade (fade in/out) arrays?

Re: How can you crossfade (fade in/out) arrays?