3D Separable Kernel Some question

Hi,

In the SDK examples there is 2D Separable kernel available. Now i want to expand this to a 3D Separable Kernel. I tried many different ways, but still no solutions :unsure:

This is the first time i"m playing with CUDA so much, and some questions shows up.

It is possible to expand the 2D Separable kernel? Yes, i think.

I programmed a benchmark already, so i can see that there is a difference between my code and CUDA.

Row Convolution no problem, because he is reading row data 1 2 3 4 5 6 7 8 9 10 … etc.

Column Convolution problems, because he thinks my column of the Z-axis is under the other data? :wacko: I willl explain it

Slice 1 Slice 2

1 2 3 4 – – 3 4 7 9

4 5 6 7 – – 4 8 3 1 ===> I want that CUDA do a separate column convolution on every slices, not one image.

8 9 1 2 – – 4 6 8 1

Question time: I Can seperate the DATA_ Z but i need then to make more kernels? Do i lose many speedup because of this? I cant test it out because I got no 3D Separable Kernel convolution yet

At the moment my program is thinking, that slice1 and slice2 are one Slice.

Slice 1

1 2 3 4

4 5 6 7

8 9 1 2

3 4 7 9

4 8 3 1

4 6 8 1

I was thinking to use the Z - dimension, but if i look at the code of the separable convolution. I think its impossible to do, correct me if i’m wrong?

Can someone explain me why you need less ID for the ColumnConvolution

blockGridRows x threadBlockRows ==> 77824 = (2,256) * (152,1,1) ?

blockGridColumns x threadBlockColumns ==> 12288 (16,6)*(16,8) ?

Best regards,

Jorn