From Friday, April 19th (11:00 PM CDT) through Saturday, April 20th (2:00 PM CDT), 2024, ni.com will undergo system upgrades that may result in temporary service interruption.

We appreciate your patience as we improve our online experience.

Real-Time Measurement and Control

cancel
Showing results for 
Search instead for 
Did you mean: 

Number of DSP Slices for wide multiplications

It is well known that wide FXP multiplications (i.e. wider than 18x25 bit for DSP48E1 units) have to rely on several DSP Slices. As a result, the cascaded calculation takes longer and so either needs pipelining or a slower SCTL clock.

 

A made the following tests:

One SCTL with two FXP controls in it, one multiplication sign (set to truncate, wrap and coerce to 48 bit output width), and a single FXP indicator with the same 48 bit output.

 

I set the bit width of the operands going into the multiplication to various sizes and compiled. Compilation target is the sbrio9607 which has DSP48E1 units with 18x25 multipliers. However, the compilation results give me inconsistent numbers of DSP slices (see below). Is there a list somewhere which lists the largest possible Multiplications given a certain number of DSP slices ? In general, how can the unexpected results below be explained ?

 

18x25 bit -> 1 DSP (fine)

20x25 bit -> 1 DSP (should be 2)

 

25x25 bit -> 2 DSP (fine)

25x35 bit -> 2 DSP (fine)

17x40 bit -> 2 DSP (fine)

18x40 bit -> 2 DSP (fine)

 

25x36 bit -> 3 DSP (should be 2)

18x44 bit -> 3 DSP (should be 2)

17x47 bit -> 3 DSP (should be 2)

18x47 bit -> 3 DSP (should be 2)

18x48 bit -> 3 DSP (should be 2)

18x49 bit -> 3 DSP (should be 2)

17x50 bit -> 3 DSP (should be 2)

0 Kudos
Message 1 of 3
(1,068 Views)

Hi. Have you looked at this document at NI? I had a quick look, it explains somehow, but I'm not sure whether it explains the specific calculations for the needs of your application. 

0 Kudos
Message 2 of 3
(990 Views)

Hi vardanium,

 

I don't think that this document addresses the question. My question specifically is on non-pipelined single cycle multiplication using "tiling" of DSP slices.

 

However, that document is interesting and shows some workarounds. The DSP48 Node is interesting as it allows to implement the tiling at a low level, possibly also skipping some other overhead.

 

On a separate note, the document shows how much overhead is produced by some implementations using standard functions when compared to e.g. the Xilinx IP functions. Unfortunately I have quite the aversion against such high level "blackbox" blocks, whose inner working could change slightly every patch.

0 Kudos
Message 3 of 3
(988 Views)