I intend to implement singular value decomposition in Cuda… I have completed the C code but I am having too many issues in implementing it in Cuda… To start off with, I wish to do the foll…

```
for(i=k;i<=Row-1;i++)
{
p=i*Row+j;
Q[p]=Q[p]-2.0* t *R[i*Col+k];
}
```

Just as a test, I called a kernel in order to calculate Q[p]=Q[p]-2.0* t *R[i*Col+k]; onto the device and then performed a cudaMempy from device to the host… But it doesnt work, the values do not change… I even tampered with my idx values, but it doesnt seem to work …

Could someone plz help me out here…