RAM Perfomance TegraX2 T186 on Feature Extraction

I’m currently performing a series of tests on the TegraX2, where I’m using the board to extract features using the first layers of pre-trained DNN.

The test scenario is the following:
*pre-trained VGG16 model
*input image size of 224x224x3
*Tensorflow 2.0 as backend.

The first layer produces 12 MB of data, as for the second one. The third layer, precisely a pool layer reduce the amount of data to 3 MB.

So, I noticed that the performances are highly correlated to the amount of data produced, since computing the network until the third layer takes one-third of the time compared to the computation time of the first two layers.

I’m asking if there is any information available regarding the memory time access, the memory latency and the bus specification of the TegraX2 that I can use to understand this behavior.

Thanks,

GCM

Hi,

The access time differs from the memory type TensorFlow used.

For general memory behavior, you can check the following CUDA sample for information:
/usr/local/cuda-10.0/samples/1_Utilities/bandwidthTest
/usr/local/cuda-10.0/samples/1_Utilities/p2pBandwidthLatencyTest
/usr/local/cuda-10.0/samples/1_Utilities/UnifiedMemoryPerf

For more precisely about your use case, you can directly profile the app with our Nsight profiler.
https://developer.nvidia.com/nsight-systems

Thanks.