Hi again,
I’ve tested cuda texturing perf vs DX11 texturing perf:
CUDA : 14 fps (RGBA32F, 3200 texture lookups/(pixel*frame), 200 writes per frame, 512x512 pixels)
DX11 : 30 fps (RGBA32F, 3200 texture lookups/(pixel*frame), 200 writes per frame, 512x512 pixels);
Note: each thread/shader reads 16x the (almost) same pixel location. Exes attached. Source attached for CUDA (Modified SimpleDX11 texture sample.)
Note2 : I’ve tried using the function cudaFuncSetCacheConfig() without successful results.
Note3 : Cuda is also slower with R32F, but the difference is smaller.
Note4 : Same results when the texture is allocated with DX11 and shared (through cudaArray).
Note5 : GTX 570, Win7 x64, Cuda 4.0.13, 270.61
Is this normal?
Octavian
DX11TexturingTest.rar (154 KB)
simpleD3D11Texture.rar (235 KB)
UPDATE : My test was biased as I was only using the first component in the DX version.
float val;
for(int i=0;i<16;i++)val += t0.SampleLevel(s0lr, tc+float2(i*.00001f,0),0)/16.f;
return val;
Had to be replaced with
float4 val = float4(0,0,0,0);
for(int i=0;i<16;i++)val += t0.SampleLevel(s0lr, tc+float2(i*.00001f,0),0)/16.f;
return val;
When replaced, I achieve similar (14-15 fps) in both DX and Cuda. My bad :P
But my actual probem is ping-pong performance using Cuda textures. I thought I had narrowed it down to texture lookups, but it seems not!
Will post code soon!
Ping Pong test:
512x512 RGBA32F ctexa0, ctexa1;
for(int i=0;i<200;i++)
{
ctexa1 = ctexa0*.9998f;
ctexa0 = ctexa1*.9998f;
}
Cuda : 58fps
DX11 : 80fps
PS : I’m working on a navier stokes fluid sim (2D, 3D) and I’m wandering if Cuda will give me the best (fastest) solution!
Octavian
DX11PingPong.rar (327 KB)
simpleD3D11Texture_PingPongCuda.rar (474 KB)