Hi,
Trying to benchmarking a portion of my network on the DLA engine. I fail to understand the profiler’s output.
Any idea how come the “business logic” of the network is so fast:
Layer [2]: [{(Unnamed Layer* 0) [Convolution],(Unnamed Layer* 1) [Scale],(Unnamed Layer* 2) [Activation],(Unnamed Layer* 3) [Pooling],(Unnamed Layer* 4) [Convolution]}]: 0.005312ms
However the “output” takes so much time? Is it just an error in the profiler output?
Layer [4]: [output from nvm]: 13.2621ms
This is the relevant output:
Blockquote
--------------- Layers running on DLA:
(Unnamed Layer* 0) [Convolution], (Unnamed Layer* 1) [Scale], (Unnamed Layer* 2) [Activation], (Unnamed Layer* 3) [Pooling], (Unnamed Layer* 4) [Convolution],
--------------- Layers running on GPU:
--------------- Timing {(Unnamed Layer* 0) [Convolution],(Unnamed Layer* 1) [Scale],(Unnamed Layer* 2) [Activation],(Unnamed Layer* 3) [Pooling],(Unnamed Layer* 4) [Convolution]}(31)
Tactic 548835008419 is the only option, timing skipped
0: [(Unnamed Layer* 0) [Convolution]], type: kCONVOLUTION, precision: kHALF, inputs: 1, outputs: 1
Convolution: Dims: 3 x 3, getNbOutputMaps: 64, Stride: 1x1, Padding: 1x1, Dilation: 1x1
Input tensor: input, kFLOAT, Dims: 3[64, 512, 512]
Output tensor: (Unnamed Layer* 0) [Convolution]_output, kFLOAT, Dims: 3[64, 512, 512]
1: [(Unnamed Layer* 1) [Scale]], type: kSCALE, precision: kFLOAT, inputs: 1, outputs: 1
Mode: 0(0, 1, 2)
Shifts: kFLOAT, count: 0
Scales: kFLOAT, count: 1
Powers: kFLOAT, count: 0
Input tensor: (Unnamed Layer* 0) [Convolution]_output, kFLOAT, Dims: 3[64, 512, 512]
Output tensor: (Unnamed Layer* 1) [Scale]_output, kFLOAT, Dims: 3[64, 512, 512]
2: [(Unnamed Layer* 2) [Activation]], type: kACTIVATION, precision: kFLOAT, inputs: 1, outputs: 1
Type: 0 (Relu: 0
Alpha: 1.123, Beta: 4.213
Input tensor: (Unnamed Layer* 1) [Scale]_output, kFLOAT, Dims: 3[64, 512, 512]
Output tensor: (Unnamed Layer* 2) [Activation]_output, kFLOAT, Dims: 3[64, 512, 512]
3: [(Unnamed Layer* 3) [Pooling]], type: kPOOLING, precision: kFLOAT, inputs: 1, outputs: 1
Pooling:
Type : kMAX
Padding Mode: kEXPLICIT_ROUND_DOWN
Window : 3, 3
Stride : 2, 2
Padding : 1, 1
Pre Padding : 2: 1, 1,
Post Padding: 2, 1, 1,
getBlendFactor: 0
getAverageCountExcludesPadding: 1
Input tensor: (Unnamed Layer* 2) [Activation]_output, kFLOAT, Dims: 3[64, 512, 512]
Output tensor: (Unnamed Layer* 3) [Pooling]_output, kFLOAT, Dims: 3[64, 256, 256]
4: [(Unnamed Layer* 4) [Convolution]], type: kCONVOLUTION, precision: kHALF, inputs: 1, outputs: 1
Convolution: Dims: 3 x 3, getNbOutputMaps: 2, Stride: 2x2, Padding: 1x1, Dilation: 1x1
Input tensor: (Unnamed Layer* 3) [Pooling]_output, kFLOAT, Dims: 3[64, 256, 256]
Output tensor: output, kFLOAT, Dims: 3[2, 128, 128]
[RVLayerProfiler - …]: Layer [1]: [input to nvm]: 6.11952ms
[RVLayerProfiler - …]: Layer [2]: [{(Unnamed Layer* 0) [Convolution],(Unnamed Layer* 1) [Scale],(Unnamed Layer* 2) [Activation],(Unnamed Layer* 3) [Pooling],(Unnamed Layer* 4) [Convolution]}]: 0.005952ms
[RVLayerProfiler - …]: Layer [3]: [input copy finish]: 0ms
[RVLayerProfiler - …]: Layer [4]: [output from nvm]: 13.2621ms
[RVLayerProfiler - …]: Layer [5]: [output copy finish]: 0.003808ms
I can send a test code if relevant.
Thanks
Eyal