How can you crossfade (fade in/out) arrays?

madgreek · ‎04-02-2007

Kevin

i have read a bit more about the DC offset. I used the "AC/DC estimate vi." and found that for each chunk of 512 samples that comes into my code from the "Hijacked" (before any windowing or FFT) the DC offset is 126.96. This is what i have to subtract from each chunk before any windowing and FFT?Is this the first element in every chunk that comes into the code? Sorry if i am asking to many silly questions but this is actually the very first major code i am building in Labview and still learning the very basics

Kevin_Price · ‎04-02-2007

126.96 sounds about right, so yes, subtract this from the samples before doing windowing or FFT. Be sure the raw sample values are expressed as floating-point values (not as unsigned integers) going into the subtraction. So,

A = raw 8-bit unsigned integer values from hijacked.wav

B = values from hijacked.wav converted to 8-byte floating-point DBL's

C = result after subtracting the average DC offset level, 126.96, from B

Now, perform windowing and FFT on result C. Evaluate as before to create something like pp_array.vi containing the 97 values. Save as defaults and repost. (Thanks).

Now then, most of the voiced chunks will want to be sub-divided into lengths that don't evenly divide into 512. Suppose one comes up with a desired length of 29.1 for example. What exactly should be done in such cases? Either you can perform this step for me, or describe it clearly, or I can take my best guess. FYI, here's my best guess:

I'd work with data B or C above. Shouldn't matter which until much later when it's time to play or save the new .wav data.
Aiming for a nominal starting point of 50% overlap, I'd be trying to solve for 29.1 * (N + 1) * (0.50 overlap) = 512. That gives me N=34.19
I'll round these figures to make 34 sub-chunks with a nominal size of 29 samples each for approximately 50% overlap. Then I'll adjust overlap a bit so that the 34 sub-chunks of 29 samples each can span exactly 512 samples.
I'll rework the earlier equation to solve for overlap. 29 * (34 + 1) * (x overlap) = 512. So, overlap wants to be 50.44% or 14.63 samples.
Then I can create a 2D array of sub-chunks because I can figure out what index to use as a starting point (rounded result of i*14.63), and all will have the same length (29 samples).
From there, use / modify milqman's code to do either the deletion or duplication of sub-chunks, and then the merging.

It may be a few days from whenever you re-post the array of sub-chunk lengths until I get a chance to noodle around with this stuff. With limited time, I won't be taking great pains to optimize the solution for memory or speed, so you can expect some opportunities there.

It's a neat app, and hopefully I'll remember to go back and learn more about your pitch-detection algorithm from your early postings and the links. Or maybe you can post the code that does the Windowing/FFT/characterization algorithm? Fairly long ago I tried very naively and very unsuccessfully to invent my own method for doing things like pitch-shifting and time-shifting. It'd be neat to finally bring it all full circle.

-Kevin P.

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

madgreek · ‎04-02-2007

Kevin

Do you think it would be better if instead of having to remove each time the dc offset from each 512 chunk, to do it once and out at the whole signal "Hijacked" with a highpass Butterorth filter with a 1Hz cutoff frequency and high order?What would you reccomend?

After this, i will try and do everything you are talking about in your last post and repost it. Dont worry about when you will get back at it or optimizing it. I will try and do my best for that. As far as the pitch detection i will give you whatever papers i have acquired about it if you want.

Again i will repost the new pp-array with the correct time domain data and answer your question about the segmentation. You were absolutely right about that and thank you for letting me know. stupidity of mine....Again

madgreek · ‎04-03-2007

Kevin hold off on the values i gave you

I have just took off the DC value from the "Hijacked" and i am getting different values for voiced/unvoiced. I will get back at you with a new post, in time domain data, without the DC.

madgreek · ‎04-03-2007

Hello Kevin

A) I have just made the corrections i needed in my code. 1) I took out the DC offset. 2) The new, DC-free Time-domain raw data, are insert into the algorithm for voiced/unvoiced detection. 3) Made the corrections on my settings for correct detection of voiced/unvoiced chunks and got the chunks.

About the procedure for detecting the V/U chunks....when i first designed the algorithm, i compared its results to the results from the Cepstrum algo in Matlab, with a small sample signal. I got the same results from both procedures. Further on, i compared again the results from my algorithm with the results from the autocorrelation function in Labview using a common signal, and both algorithms gave me the same results, so i believe i am 99.99999% sure the algorithm i built for pitch detection is good, but you never know with me

B) The next step as you have already referred in your last post, is the sub-division of the voiced chunks. You said "Now, most of the voiced chunks will want to be sub-divided into lengths that don't evenly divide into 512. Suppose one comes up with a desired length of 29.1 for example. What exactly should be done in such cases?

Right now, i am sub-dividing the voiced chunks into these smaller lengths which are always an integer number (# of samples) but as you said almost always, it will not be divided equally to 512. I have a hard time to figure out why sometimes the last voiced sub-chunk is zero-padded to create another one, and some other times the chunk is just divided into integer number of sub-chunks leaving remaining samples anused.

What would be the ideal for this case i think (and please comment if you disagree) would be to divide voiced chunks into integer number of sub-chunks, and any remaining samples at the end of a chunk should be carried to the next chunk, whatever that would be and if voiced the same procedure should be repeat, if unvoiced, we just add those remaining samples to the 512.

Example, we get 5 mixed chunks, 1 unvoiced chunk, followed by 3 voiced (lengths to be divided to are 60,63, 58 respectively) followed by another unvoiced. The very first unvoiced stays as it is, 512. The next voiced chunk gets divided into 8 sub-chunks of 60 each, giving us 480 total therefore 32 remain unused. If those 32 are added to the next chunk, which is voiced again, therefore expanding to 32+512=544 and gets divided into 8 sub-chunks of 63 each giving a total of 504, leaving 40 anused. Again the same, the last 40 to be added to the next chunk, which is voiced 40+512=552 and gets divided into 9 sub-chunks of 58 giving a total of 522. The last 30 unused samples just added to the next chunk which is unvoiced, therefore going to 30+512=542. This way, i think we only have to deal with integer number of sub-chunks, and still no data are staying unused or new ones created. I have attached a graph to visualize this.

What do you think Kevin?Is this possible?Do you have something easier to implement?This is just my idea for right now. You are more experienced here and you can see further down the road for possible difficulties with this one. Do you think what you suggest in your last post would be more easy to implement?

P.S. I didnt attach my code because i want to try and do the sub-division first.

Kevin_Price · ‎04-04-2007

B) I'm reluctant to get too opinionated about what's the "best" way to handle things. I know LabVIEW pretty well, but I don't know speech-processing theory. Still, for what it's worth, I'll give you my thoughts as a "man on the street."

I agree with you that zero-padding just *feels* wrong. Your idea to save the extra samples and treat them as part of the next chunk seems intutitively sensible to me, though with a couple subtle possible issues.

1. Passing 30 or so voiced samples into the next unvoiced chunk unchanged. All the other voiced samples from that chunk were pitch-shifted, but these aren't. What will be the audible effect? [Offhand guess -- probably not significant. The original breakdown into chunks with a constant size of 512 samples was somewhat arbitrary and chosen for FFT calculation efficiency. No special reason that time-domain processing must proceed in such constant-sized chunks.

2. Passing 30 or so voiced samples into the next voiced chunk where the dominant freq of each is different. Offhand guess on effect very similar to previous.

I think the main downside of what I proposed is that it may slightly restrict the available range of compression / expansion. Probably only a few % in theory, and that may fall out in the wash due to the inherent integer-nature and implied rounding involved in some of these steps.

One last key question for clarification. I referred back to "Pitch and Time scale.png" that you posted early in this thread. That appears to show that the original division into sub-chunks should account for 50% overlap. Based on the formula I proposed recently, if the sub-chunks were supposed to have size = 60, I would define 16 regions which overlap one another by 50%, i.e., 30 samples. Your posting just now referred to 8 sub-chunks of size 60 that span 480 samples, thus implying *no overlap*.

Isn't it necessary to start from 50% overlap? Then sub-chunk ranges can be either deleted or duplicated, causing as little as 0% or as much as 75% overlap?

-Kevin P.

P.S. It doesn't really matter much when you subtract the DC Offset. I'd probably just do it once on the entire array of samples before breaking anything down, but the end result will be the same either way.

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

madgreek · ‎04-04-2007

Kevin

1) You are right, i dont believe an audible effect will exist for the same exact reason you state also, the FFT size is somewhat arbitrary.

2) Again i agree with you here.

The last graph i posted was hust to visualize the new sub-division i was talking about in my example. The initial overlap has to be 50% and from then on we should somehow increase/decrease it for the scaling.

The way i was thinking the overlap part (and this comes i guess from my Labview inexperience) is that i would have to sub-divide the voiced chunks and then somehow overlap them using a window that spans the length of two of them, while you are trying to overlap the sub-chunks from the beginning and just have to multiply each overlapped sub-chunk with the same length window. Is this how you thinking to do it?Maybe i got it wrong.

Kevin thank you for all your help here. I really appreciate it.

Milqman · ‎04-04-2007

Given voiced segments that are:
2 periods long (so that the first half of one is precisely the same data as the second half of the previous)
zero-padded such that central peaks are centered (and the total size of the array is equal to the largest period in that voiced chunk)

then:
my first loop windows the segments
my second loop determines duplication and omission
my third loop -attempts- to push them together in a happy way

It seems like the work you guys have accomplished since I pardoned myself from the thread has been mostly preconditioning data to get ready for the steps I already outlined.

Am I right?

madgreek, I think you owe Kevin a beer, or maybe a gift card for his next fan replacement 😉

~milq

madgreek · ‎04-04-2007

Hey Milq :

Long time no see:smileyhappy...hope you are doing good....

Basically yes, we are dealing with the preconditioning of the segments and trying to find the easiest and more efficient way to do this thing....since this is the hardest part of the whole algorithm i believe....

i owe to both of you a few rounds of beer or whatever else you guys like

....i have also started gathering some cool papers for pitch and time scaling for Kevin since he told me he has been interested on this for some time now...You are both very cool guys....actually the coolest i met here ....still being around after almost 70 posts in a thread...maybe the biggest one this room ever seen

madgreek · ‎04-07-2007

Hey Kevin

This thread is getting bigger and bigger

I have created another PP array as before with the new Time Domain data, free of the DC offset. I have managed to zero pad the last sub chunk instead of doing what i described in my last post...i couldnt do it...

Happy Easter to you and your family

LabVIEW

How can you crossfade (fade in/out) arrays?

Re: How can you crossfade (fade in/out) arrays?

Re: How can you crossfade (fade in/out) arrays?

Re: How can you crossfade (fade in/out) arrays?

Re: How can you crossfade (fade in/out) arrays?

Re: How can you crossfade (fade in/out) arrays?

Re: How can you crossfade (fade in/out) arrays?

Re: How can you crossfade (fade in/out) arrays?

Re: How can you crossfade (fade in/out) arrays?

Re: How can you crossfade (fade in/out) arrays?

Re: How can you crossfade (fade in/out) arrays?