that's the usual trade-off: time vs. memory consumption…
(I wondered it's even faster when allocating more array wires in between.)
Curiously, interlacing is faster than my typecasting (~19ms , or about 8ms when the loop is parallelized).
I am sure we can squeeze a little bit more out of it, but going from 5 minutes to 10ms is quite good, IMHO ;))
The first estimate of getting 1000x improvement was low. We got about 30000x! 😮
not getting so much faster, but there also is a DecimateArray for each InterleaveArray:
Atleast it fits the postage stamp… 😄
Preallocating the array and not reshaping seems to help quite a bit.
It surprised me that the concatenating tunnel is about 20x slower, even though the compiler could probably figure exactly what to do based on the build array with two scalars inside the loop.
Here's what I probably would do in the end: