I have about 10000 8*8 matrices to take the inverses of. I have implement the function first and now am working on optimizations. The first optimization I am looking at is changing from generic mallocs to mallocPitch. I had a question about mallocPitch that I have seam to gotten wrong because the way I tried doesn’t output the correct results. First question how to you designate host memory to fit in the correct form to preform a cudaMemcpy2D operation for it to the device. Second if I am going to have to copy the array from global mem to shared mem, what is the best way to structure the shared mem to optimize performance? Any help would be greatly appreciated.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
2D array & Memory space Mostly about cudaMallocPitch & cudaMemcpy2D | 1 | 1485 | October 15, 2009 | |
cudamallocpitch and cudamemcpy2d | 1 | 1032 | October 3, 2010 | |
cudaMallocPitch | 5 | 4500 | October 5, 2010 | |
Can't get copyDeviceToHost to work with cudaMemcpy2D | 0 | 3629 | November 13, 2009 | |
trouble with cudaMemcpy2D I cant get a matrix to copy into 2D pitched memory | 1 | 922 | July 13, 2009 | |
Padding in Pitch memory | 2 | 4003 | October 16, 2009 | |
cudaMemcpy2D To Host | 6 | 3445 | June 8, 2012 | |
Problem with 2D memory copy using pitch | 6 | 6499 | November 20, 2011 | |
problem with cudaMallocPitch and cudaMemcpy2D | 5 | 6364 | April 22, 2009 | |
cudaMemcpy2D example? | 5 | 19590 | February 1, 2012 |