I was working on the third program in “The PGI Accelerator Programming Model on NVIDIA GPUs Part 1”, but I can only use either the nvida gpu accelrator or the host cpu, since I have a 32 bit system.
I tried to run it just on the host and to look at the code in PGprof. Well I did but I noticed that it was only executing on one core on my 4 core machine. This is strange. The best comparison would be to run on a four cores and comare the speed to the spped of execution on the gpu accelerator.
How do I get the program to run wih four cores. What compiler command should I use.