Hello,
I am doing a study of the performance of Matrix-Vector and Matrix-Matrix multiplication on the NVIDIA GPU and wanted to know if I am correctly computing the GFLOPs for my card because I am getting LARGE values for simple Matrix-Vector multiplication. The strange GFLOPs come into play somewhere between a matrix dimension of 2048x2048 and 4096x4096. For simplicity, the matrix is square and vector is a column vector (no compression format is being used).
The code that I am using for Matrix-Vector follows:
…
unsigned int timer = 0;
cutilCheckError(cutCreateTimer(&timer));
for(unsigned int i = 0; i < MAX; i++){
// Start Timer:
cutilCheckError(cutStartTimer(timer));
// Execute the kernel
matrixMul<<< grid, threads >>>(d_C, d_A, d_B, WA, WB);
// Stop Timer and Collect current TOTAL GPU Time:
cutilCheckError(cutStopTimer(timer));
totalGPUTime += cutGetTimerValue(timer);
}
// Destroy Timer:
cutilCheckError(cutDeleteTimer(timer));
totalGPUTime = totalGPUTime/(double)MAX;
printf(“GPU Processing Time: %f (ms)\n”, totalGPUTime);
double GFLOPs = (double)(2.0WAHBHB)/(double)(10241024*1024);
GFLOPs /= totalGPUTime;
printf(“GPU GFlops = %f\n”, GFLOPs);
…
MAX is 1 and WA, HB, HB are the WIDTH of A Matrix, and is Height of B Vector.
Also, any matrix dimension greater than 4584x4584 causes the program to crash - I assume this is because the number of threads is greater than can be handled by my GPU?
The hardware configuration of NVIDIA GPU follows:
GPU NVIDIA:
GL Vendor: NVIDIA Corporation
GL Renderer: Quadro NVS 290/PCI/SSE2
GL Version: 3.0.0
Video Memory Installed: 256 MB
Interface Type: PCIe x16
Technology: DDR2 SDRAM 64-bit
Max. Resolution (external): 2560x1600 / 60 Hz
RAMDAC Clock Speed: 350 MHz
Driver Version:
ALU Instructions: 16384
TEX Instructions: 16384
TEX Indirections: 16384
MAX_TEXTURE_IMAGE_UNITS: 32
MAX_TEXTURE_COORDINATES: 32
Thanks for any hints/information :D