 elsayed3
		
			elsayed3
		
		
		
		
		
		
		
		
	
			04-17-2012 04:18 PM
Some examples include CLIPs. Obviously, I cannot read the .ngc file. It would be nice to make the actual hdl files available with the example. The examples are meant as educational tools, and not as protected IP. I am not looking to convert to an edif file. The actual hdl code is important to learn in my case.
I am looking specifically at the 10-Tap 8-bit Camera with DRAM example, and I would like to see the source code of the CLIPs used.
04-18-2012 06:32 PM
What is your reason for needing to see the HDL files?
04-18-2012 06:46 PM
I want to know exactly what the author of the example had in mind. What is this "packer" vi that converts 80-bit data to 256 bit data? Also, it is designed so that it writes to all of the DRAM, using it as a FIFO. I want to change that.. I would like it to be more like memory instead of fifo, in that it isnt just a buffer.
04-19-2012 04:01 PM
Hello elsayed3,
We can't release the VHDL associated with LabVIEW FPGA code. This would not be the preferred method of modifying the code anyway. It would be much better to just modify the LabVIEW FPGA code in this example and re-compile it.
Why do you want the FPGA code to act as memory instead of a circular buffer? The purpose of this example is to stream image data from FPGA onto the host VI. The packing DRAM that's converting data types is only doing the packing for maximum efficiency of memory usage.
If you wanted to, for example, save a single image on the FPGA and do some processing with it, then you would need to set up the the FPGA code to stop overwriting data in the DRAM.
04-19-2012 05:06 PM
You guessed it right, I want to do some image processing on the FPGA. What I am doing now is save another copy of the image in block memory (not using the DRAM) to be able to access specific pixels to do a sobel convolution. However, I need the code to be as efficient as possible because I am trying to do real time image processing for high fps, and having duplicates is not too efficient. They are running in parallel, so there shouldn't be much effect on timing, but I will be adding (many) more features, and perhaps even some instances of picoblaze processors, so I don't want to be using more space on the fpga than is necessary.
04-20-2012 07:22 PM
If you're doing image processing on the FPGA, consider that it isn't meant to handle large arrays and process them very well. Rather, it's best at performing point-by-point image processing, which does not rely on too much heavy shifting or large amounts of data. This means it depends on exactly what kind of image processing you need to do directly on the FPGA. If it's something heavy such as Fourier Transforms in which single operations require observing many of the pixels in the image, you might be better off doing this off the FPGA.
05-21-2012 02:44 PM
Reason behind my question (skip if you do not care):
I do want to handle large amounts of data. That is why I want to use the DRAM to store the data. The FPGA code itself will use very small arrays (maximum 9 U8 elements). I want to do 3x3 convolution. So I want to store the image in memory. When enough part of the image is in memory, I want to start reading from the memory, but I want to read specific addresses, not simply read it all back in a FIFO style. So, I would store 3 lines from the image, then start reading 3x3 areas from this stored image. I envision that if I switch the DRAM FIFO into a memory, this would be doable. What I mentioned earlier about my using block memory to store parts of the image did not work, as there is simply too much to store and not enough block memory on the FPGA to do this. However, the DRAM memory is plentiful.
The example does a good job of simply streaming the data. But I want to analyze this data on the FPGA. I realized that there is no real need for the DRAM to act as a buffer. I just tried to wire the outputs of Pack 80 to 256 to the inputs of Pack 256 to 64, and eliminated the DRAM completely, and it works perfectly. I guess the DRAM was added in the example just to demonstrate. So now I can stream the image directly to the DMA FIFO, bypassing the DRAM. What I want to do now is store the image in the DRAM to do some analyzing. I can't really do any analyzing if I store it in a FIFO format, I want it to be addressable. But I want to be as efficient / fast as possible, so I want to make use of all the 128 bits width of each DRAM bank by writing / reading 256 bits at a time. That is why I still want to write the data from Pack 80 to 256 CLIP to the DRAM, but I want to know how the image would be formatted there. If I use the final output of the Pack 256 to 64 bit CLIP, I would only be writing / reading 64 bits at a time, which would not be most efficient / fast, especially since reading the memory takes more than one clock cycle. So, now that I have clarified my reasoning behind asking for details about these clips, can someone please answer the real question below. I do give Kudos!
The actual question:
I want to know what the Pack 80 to 256 CLIP does. What I deduced is that it takes 80 bits from the FIFO as input. It would wait some clock cycles until it has enough data (stored in an internal buffer) then output 256 bits of data. What happens if there is not enough bits? For example, I could take just a single 40x1 size image. That converts to 40*8 = 320 bits, which is not divisible by 256. So the first time "Pack 80 to 256\Output valid" is asserted, 256 bits would be read from that CLIP. However, what happens with the remaining (320-256=) 64 bits? Is there some way that the CLIP "knows" only to feed data through "Pack 80 to 256\Data Out 0"? What about the other 3 outputs? How does the next CLIP (Pack 256 to 64 bits) know to ignore these 3 outputs that are either floating or have garbage data?
I need to know which Data Out contains which pixels and in what order, at which clock cycles / execution iteration, etc. I don't need the VHDL code, I do not want to modify it in any way. But perhaps more detailed description. I still want to think of it as a black box, but I want to know what the inputs/outputs are in detail. Sort of like a spec sheet for the CLIP. I understant that no such thing probably exists, which is why I asked for the code itself intially.
Thanks
 Wordimont
		
			Wordimont
		
		
		
		
		
		
		
		
	
			05-30-2012 09:51 AM
If there is not enough input data to create a whole output word then the packer keeps waiting until there is. So, like you said, if you input a 40-byte image then the 80 to 256 packer will output one 32-byte word (256 bits) and will still have the remaining 8 bytes inside of it. When you input the next 40-byte image, it will output a second 32-byte word and will have 16 bytes remaining inside of it. After the third input image it will have 24 bytes remaining in it. Then finally, on the fourth image, it will output two 32-byte words, since the first 8 bytes of the input image will complete the partial word already stored in the packer, then the remaining 32 bytes of the input image will be output as another word. So the basic rule is that the amount of input data needs to be a multiple of the packer's output size (32 bytes for the 80 to 256 packer), otherwise data will be waiting in the packer.
As far as the data ordering goes, suppose you input the following 80-bit values in the order shown:
Upper 16 Lower 64
  0x0004  0x0003000200010000
  0x0009  0x0008000700060005
  0x000E  0x000D000C000B000A
  0x0013  0x001200110010000F
0x0018 0x0017001600150014
0x001D 0x001C001B001A0019
0x0022 0x00210020001F001E
...
The packer will output the following 256-bit values in the order shown:
Data Out 3 Data Out 2 Data Out 1 Data Out 0
0x000F000E000D000C 0x000B000A00090008 0x0007000600050004 0x0003000200010000
0x001F001E001D001C 0x001B001A00190018 0x0017001600150014 0x0013001200110010
...
The other packers behave similarly.
Behavior of the packers aside, if you want to do Sobel convolution then it's usually much more efficient in block RAM because you normally only have to buffer a few image lines in order to obtain the 3x3 window of pixels needed for the convolution. There is an example of a Sobel edge detector here. However, if I remember correctly, this example cheats a little bit on the math and doesn't saturate properly. If you just want image buffering for getting the 3x3 window, there's an example of that here. You can use the DRAM to do the same thing, but it adds complexity and isn't necessary unless you need to access a large amount of memory.