Unable to access the entire allocated space

IGNORE THIS POST. It was a stupid bug in my code :(

I will try explaining the algo again. I would like to write a pattern to all the memory that I can get.

In order to do that I allocated as many 1MB pieces of memory I could. Then I create a 512 thread block that write to 4 bytes at one time. So we need 1MB/(512*4) runs of this thread block to just complete writing to a 1MB pieces. Now since there are many 1MB pieces of memory I choose to write 50 of them at a single kernel launch.

So my grid is (50, 1MB/(512*4BYTES) and my thread block is (512).

Regards,
Suresh Kumar.

Can you guys do a small favour ? Compile the code on your machine and post the results. Btw if your system crashes during the run, please run the code with a lower numOfThreadBlocksForExecution (look for the variable numOfThreadBlocksForExecution and set it to a lower value like 10 or even 1) . I can comfortably run with 100. I’m using 50 in the code with a value of 200 my system blue screen because of vista’s 2 sec rule.

The result I got are

Name: GeForce GTX 260
Total Global memory: 939524096
Total constant memory: 65536
Shared Memory Per block: 16384
Register Per Block: 16384
Warp Size: 32
MultiProcessorCount: 27
Device Major and Minor: 1 3
Num of allocated blocks 721 Block size 1048576
Block Start 0 Block End 49
Block Start 50 Block End 99
Block Start 100 Block End 149
Block Start 150 Block End 199
Block Start 200 Block End 249
Block Start 250 Block End 299
Block Start 300 Block End 349
Block Start 350 Block End 399
Block Start 400 Block End 449
Block Start 450 Block End 499
Block Start 500 Block End 549
cudaThreadSync Failed: writepattern 30unknown error

Thanks. I found the bug in my code.