Hi,
I was trying to improve performance of my program using nsight compute. For a particular kernel it recommends to increase the blocksize in multiples of 32 between 128 and 256. When i run the program with blocksize 32 the program runs without issues. But when i increase the blocksize to 256 the program crashes after few iterations. How can i resolve this issue.
The kernel i was trying to improve is find_ba_max_pd in do_abstract_all in
When i launch with blocksize 32 find_ba_max_pd[math.ceil(len(nz_ba_pre_hor)/32),32](nz_ba_pre_hor_d,ba_size_pre_hor_d,bound_data_ordered_d,ba_max_pd_pre_d,shape_d)
it doesn’t crash.
But when i launch with blocksize 256 find_ba_max_pd[math.ceil(len(nz_ba_pre_hor)/256),256](nz_ba_pre_hor_d,ba_size_pre_hor_d,bound_data_ordered_d,ba_max_pd_pre_d,shape_d)
it crashes.