Please provide the following info (check/uncheck the boxes after clicking “+ Create Topic”):
Software Version
DRIVE OS Linux 5.2.0
DRIVE OS Linux 5.2.0 and DriveWorks 3.5
NVIDIA DRIVE™ Software 10.0 (Linux)
NVIDIA DRIVE™ Software 9.0 (Linux)
other DRIVE OS version
other
Target Operating System
Linux
QNX
other
Hardware Platform
NVIDIA DRIVE™ AGX Xavier DevKit (E3550)
NVIDIA DRIVE™ AGX Pegasus DevKit (E3550)
other
SDK Manager Version
1.5.0.7774
other
Host Machine Version
native Ubuntu 18.04
other
Hi,
Bought Jetson AGX xavier, installed ubuntu 18.04lts, tried fp32/fp64 performance of 1 giga times of fp32/fp64 add. The following is my kernal function, there are 1000 times kernal calls and each kernal call has about 1 millon times of fp32 or fp64 add. There are 2 runs, one with float “vtype” and another is double “vtype”, checked the ptx code and there is only f32.add vs f64.add difference. For the performance, the 1 giga times of fp32 add run takes about 3.16 second, and the 1 giga times of fp64 add takes about 31.55 second to finish. The fp64 add is about 10x slower than fp32 add, but according Jetson agx xavier spec the fp32/fp64 gflops is 1:2.
So is there any options to enable fp64 fast performance?
Thanks
Henry
--------- kernal function
global void
vectorAdd(vtype* const A) //–> vtype is float or double
{
const auto i = blockDim.x * blockIdx.x + threadIdx.x;
auto a = A[i];
for (auto j = numAddsPerThread / 10; j >0 ; --j) {
a += 1.;
a += 1.;
a += 1.;
a += 1.;
a += 1.;
a += 1.;
a += 1.;
a += 1.;
a += 1.;
a += 1.;
}
A[i] = a;
}