Newbie question: compute capability <1.1 or == 1.3 ? overlapping kernel execution with device/hos

Hi there,

sorry, maybe this question seems very stupid, but I am new to CUDA, so I tried
the examples in the SDK. And I am a bit confused:

Here is what ./deviceQuery tells me:

There is 1 device supporting CUDA

Device 0: “GeForce GTX 280”
Major revision number: 1
Minor revision number: 3
Total amount of global memory: 1073479680 bytes
Number of multiprocessors: 30
Number of cores: 240
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.30 GHz
Concurrent copy and execution: Yes


Press ENTER to exit…

Ok, that’s ok.
But then I use ./simpleStreams.

memcopy: 22.09
kernel: 18.53
non-streamed: 39.16 (40.62 expected)
8 streams: 34.93 (21.29 expected with compute capability 1.1 or later)


Press ENTER to exit…

According to the result (34.93), I should have compute capability < 1.1.
But ./deviceQuery tells me 1.3.
Any ideas?

Thank in advance.

I think you’re misinterpreting the output. It said “1.1 or later”. 1.3 is later/larger than 1.1.

Hi there,
thanks for your reply, and I know, I have 1.3, so I am later than 1.1.

The result tells me:

8 streams: 34.93 (21.29 expected with compute capability 1.1 or later)

That means: the real value is 34.93. But if I’d use 1.1 or later (I do), then I would have 21.29.

No. It means that at least 21.29 is expected with 1.1 or later. Hence the test PASSED. There’s nothing wrong with any of the output that you posted.

Ah, thank you very much. But maybe they should write it
down this way: (>= 21.29 expected …
But thanks again.

Oh, I have another question: maybe you could help me with that, too:

In the example “simpleStreams” in the SDK there is
a line in the kernel:

g_data[idx] += *factor; // non-coalesced on purpose, to burn time

Why is that memory access non-coalesced?

Thanks in advance.