I get a problem when I'm trying to do a realtime cross correlation in fpga, I'm using a lot of dsp48e1 to do this.
I already did this using a tree adder style, but the resource cost is too high, almost every addition need a set of SR, and I end up with 95% SR with 99% DSP usage, which is not a option since I need to add more function to this device. (The DSP usage is OK for me, I can live with that)
So I checked the document Xilinx provided, I noticed I can actually using DSP48E1 to do multiply and addition at the same time, it would save lot of resource since we are using DSP48E1 to do the addition instead of using LUT.
This is one eighth of the function(I got 8 sample from ADC each cycle), and I'm really sure that the DSP PCIN is connected to DSP PCOUT in every
LabVIEW FPGA: The compilation failed due to a Xilinx error.
Details: ERROR: [Place 30-365] The following macros could not be placed: SmallBlockWindow/theVI/n_Timed_Loop_43_Diagram/n_SubVI_WholeXCrossUsingXIP_systolic_vi_8217/n_SubVI_B_8xCrossSingleAIValue_vi_5360/n_SubVI_B_8xCross_vi_771/n_DSP48E1_First_2238_Diagram/GenDSP48E1.dsp48e1Instance (DSP48E1) SmallBlockWindow/theVI/n_Timed_Loop_43_Diagram/n_SubVI_WholeXCrossUsingXIP_systolic_vi_8217/n_SubVI_B_8xCrossSingleAIValue_vi_5437/n_SubVI_B_8xCross_vi_771/n_DSP48E1_First_2238_Diagram/GenDSP48E1.dsp48e1Instance (DSP48E1)
The total BRAM utilization is 2.075, the total DSP utilization is 99.74 and the total URAM utilization is 0 A possible reason is high utilization of BRAMs, DSPs, URAMs, or RPMs. Please check user constraints to make sure design is not over-utilized in the constraint areas (if any) keeping in mind some macros require a number of consecutively available sites
I did some searching, It might because I'm using too much DSP48, but my question is why I'm able to use same amount of DSP48 just for multiply, but when I using the multiply-adder mode it just fail to do the job?