Hi,
In the SDK examples there is 2D Separable kernel available. Now i want to expand this to a 3D Separable Kernel. I tried many different ways, but still no solutions :unsure:
This is the first time i"m playing with CUDA so much, and some questions shows up.
It is possible to expand the 2D Separable kernel? Yes, i think.
I programmed a benchmark already, so i can see that there is a difference between my code and CUDA.
Row Convolution no problem, because he is reading row data 1 2 3 4 5 6 7 8 9 10 … etc.
Column Convolution problems, because he thinks my column of the Z-axis is under the other data? :wacko: I willl explain it
Slice 1 Slice 2
1 2 3 4 – – 3 4 7 9
4 5 6 7 – – 4 8 3 1 ===> I want that CUDA do a separate column convolution on every slices, not one image.
8 9 1 2 – – 4 6 8 1
Question time: I Can seperate the DATA_ Z but i need then to make more kernels? Do i lose many speedup because of this? I cant test it out because I got no 3D Separable Kernel convolution yet
At the moment my program is thinking, that slice1 and slice2 are one Slice.
Slice 1
1 2 3 4
4 5 6 7
8 9 1 2
3 4 7 9
4 8 3 1
4 6 8 1
I was thinking to use the Z - dimension, but if i look at the code of the separable convolution. I think its impossible to do, correct me if i’m wrong?
Can someone explain me why you need less ID for the ColumnConvolution
blockGridRows x threadBlockRows ==> 77824 = (2,256) * (152,1,1) ?
blockGridColumns x threadBlockColumns ==> 12288 (16,6)*(16,8) ?
Best regards,
Jorn