[0x562861cde620]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 168 idx: 30 time: 8.47e-07
-------------- The current device memory allocations dump as below --------------
[0]:34359738368 :HybridGlobWriter in reserveRegion: at optimizer/common/globWriter.cpp: 246 idx: 14 time: 8.8563e-05
[0x302000000]:16777216 :HybridGlobWriter in reserveRegion: at optimizer/common/globWriter.cpp: 246 idx: 9 time: 0.000347891
[0x7fced4000000]:4831839232 :DeviceActivationSize in reserveNetworkTensorMemory: at optimizer/common/tactic/optimizer.cpp: 4603 idx: 8 time: 0.007254
[WARNING] Requested amount of GPU memory (34359738368 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[WARNING] Skipping tactic 4 due to insuficient memory on requested size of 34359738368 detected for tactic -4420849921117327522.
Try decreasing the workspace size with IBuilderConfig::setMaxWorkspaceSize().
[ERROR] 10: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node conv1/convolution.)
[ERROR] Unable to create engine
2022-09-29 19:10:38,263 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
[0x55b83cb5d2e0]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 15 time: 1.15e-07
[0x55b83f1783b0]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 22 time: 1.64e-07
[0x55b8417a7380]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 25 time: 8.4e-08
[0x55b83be4eeb0]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 6 time: 7.8e-08
-------------- The current device memory allocations dump as below --------------
[0]:34359738368 :HybridGlobWriter in reserveRegion: at optimizer/common/globWriter.cpp: 246 idx: 14 time: 9.3234e-05
[0x302000000]:16777216 :HybridGlobWriter in reserveRegion: at optimizer/common/globWriter.cpp: 246 idx: 9 time: 0.000369565
[0x7fdf28000000]:4831839232 :DeviceActivationSize in reserveNetworkTensorMemory: at optimizer/common/tactic/optimizer.cpp: 4603 idx: 8 time: 0.00733354
[WARNING] Requested amount of GPU memory (34359738368 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[WARNING] Skipping tactic 4 due to insuficient memory on requested size of 34359738368 detected for tactic -4420849921117327522.
Try decreasing the workspace size with IBuilderConfig::setMaxWorkspaceSize().
[ERROR] 10: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node conv1/convolution.)
[ERROR] Unable to create engine
2022-09-30 09:49:48,633 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.
Thanks
Can you set -b to 1 as well?
More, how about fp16 mode?
BTW, for RTX5000, its GPU Memory is16 GB. So it cannot meet the GPU memory (34359738368 bytes) which mentioned in the log.
root@df5f93481c11:/workspace# converter -h
usage: converter [-h] [-e ENGINE_FILE_PATH]
[-k ENCODE_KEY] [-c CACHE_FILE]
[-o OUTPUTS] [-d INPUT_DIMENSIONS]
[-b BATCH_SIZE] [-m MAX_BATCH_SIZE]
[-w MAX_WORKSPACE_SIZE] [-t DATA_TYPE]
[-i INPUT_ORDER] [-s] [-u DLA_CORE]
input_file
Generate TensorRT engine from exported model
positional arguments:
input_file Input file (.etlt exported model).
required flag arguments:
-d comma separated list of input dimensions(not required for TLT 3.0 new models).
-k model encoding key.
optional flag arguments:
-b calibration batch size (default 8).
-c calibration cache file (default cal.bin).
-e file the engine is saved to (default saved.engine).
-i input dimension ordering -- nchw, nhwc, nc (default nchw).
-m maximum TensorRT engine batch size (default 16). If meet with out-of-memory issue, please decrease the batch size accordingly.
-o comma separated list of output node names (default none).
-p comma separated list of optimization profile shapes in the format <input_name>,<min_shape>,<opt_shape>,<max_shape>, where each shape has `x` as delimiter, e.g., NxC, NxCxHxW, NxCxDxHxW, etc. Can be specified multiple times if there are multiple input tensors for the model. This argument is only useful in dynamic shape case.
-s TensorRT strict_type_constraints flag for INT8 mode(default false).
-t TensorRT data type -- fp32, fp16, int8 (default fp32).
-u Use DLA core N for layers that support DLA(default = -1, which means no DLA core will be utilized for inference. Note that it'll always allow GPU fallback).
-w maximum workspace size of TensorRT engine (default 1<<30). If meet with out-of-memory issue, please increase the workspace size accordingly.