Number of DSP Slices for wide multiplications

tobiy · ‎01-31-2021

It is well known that wide FXP multiplications (i.e. wider than 18x25 bit for DSP48E1 units) have to rely on several DSP Slices. As a result, the cascaded calculation takes longer and so either needs pipelining or a slower SCTL clock.

A made the following tests:

One SCTL with two FXP controls in it, one multiplication sign (set to truncate, wrap and coerce to 48 bit output width), and a single FXP indicator with the same 48 bit output.

I set the bit width of the operands going into the multiplication to various sizes and compiled. Compilation target is the sbrio9607 which has DSP48E1 units with 18x25 multipliers. However, the compilation results give me inconsistent numbers of DSP slices (see below). Is there a list somewhere which lists the largest possible Multiplications given a certain number of DSP slices ? In general, how can the unexpected results below be explained ?

18x25 bit -> 1 DSP (fine)

20x25 bit -> 1 DSP (should be 2)

25x25 bit -> 2 DSP (fine)

25x35 bit -> 2 DSP (fine)

17x40 bit -> 2 DSP (fine)

18x40 bit -> 2 DSP (fine)

25x36 bit -> 3 DSP (should be 2)

18x44 bit -> 3 DSP (should be 2)

17x47 bit -> 3 DSP (should be 2)

18x47 bit -> 3 DSP (should be 2)

18x48 bit -> 3 DSP (should be 2)

18x49 bit -> 3 DSP (should be 2)

17x50 bit -> 3 DSP (should be 2)

vardanium · ‎03-11-2021

Hi. Have you looked at this document at NI? I had a quick look, it explains somehow, but I'm not sure whether it explains the specific calculations for the needs of your application.

tobiy · ‎03-11-2021

Hi vardanium,

I don't think that this document addresses the question. My question specifically is on non-pipelined single cycle multiplication using "tiling" of DSP slices.

However, that document is interesting and shows some workarounds. The DSP48 Node is interesting as it allows to implement the tiling at a low level, possibly also skipping some other overhead.

On a separate note, the document shows how much overhead is produced by some implementations using standard functions when compared to e.g. the Xilinx IP functions. Unfortunately I have quite the aversion against such high level "blackbox" blocks, whose inner working could change slightly every patch.

Real-Time Measurement and Control

Number of DSP Slices for wide multiplications

Number of DSP Slices for wide multiplications

Re: Number of DSP Slices for wide multiplications

Re: Number of DSP Slices for wide multiplications