Hi,
I have the same code and the same GTX 1080ti,but the performace vary a lot when running on Linux and Windows repectively,is that normal? Or how can I improve the performace on Windows.
Windows:
Type Time(%) Time Calls Avg Min Max Name GPU activities:
34.24% 90.191ms 5 18.038ms 17.822ms 18.511ms GsDenoise(float*, float*, int, int)
25.24% 66.492ms 5 13.298ms 3.5114ms 19.926ms findMinMaxRow(float*, unsigned short*, int, int)
23.67% 62.348ms 5 12.470ms 9.7317ms 15.070ms [CUDA memcpy HtoD]
14.52% 38.238ms 5 7.6476ms 7.4774ms 7.8828ms rawToGray(unsigned short*, float*, int, int)
1.18% 3.0995ms 5 619.90us 531.51us 777.41us nomalization(float*, unsigned short*, float*, int, int)
1.15% 3.0195ms 5 603.91us 495.99us 907.68us GsFit(unsigned short*, float*, float*, int, int)
0.00% 10.144us 6 1.6900us 896ns 2.6560us [CUDA memcpy DtoH]
Linux:
Type Time(%) Time Calls Avg Min Max Name GPU activities:
79.02% 22.419ms 5 4.4838ms 4.2993ms 4.5552ms [CUDA memcpy HtoD]
9.35% 2.6529ms 5 530.59us 517.90us 544.94us findMinMaxRow(float*, unsigned short*, int, int)
8.24% 2.3386ms 5 467.72us 465.81us 469.87us GsDenoise(float*, float*, int, int)
1.59% 451.82us 5 90.364us 90.051us 90.978us nomalization(float*, unsigned short*, float*, int, int)
1.12% 317.32us 5 63.464us 60.354us 74.754us rawTypeTransform(unsigned short*, float*, int, int)
0.33% 94.595us 6 15.765us 4.4480us 21.601us [CUDA memcpy DtoH]
0.33% 94.243us 5 18.848us 18.209us 19.168us GsFit(unsigned short*, float*, float*, int, int)
0.01% 2.1760us 1 2.1760us 2.1760us 2.1760us EvaluateError(float*, float*, int, int)