I just tried TensorRT8.6 EA. This problem goes away, but I didn’t find such fix in the TRT8.6 release notes. Anyway, the next problem using the trt8.6 is an error below:
[05/05/2023-18:30:11] [TRT] [V] =============== Computing costs for
[05/05/2023-18:30:11] [TRT] [V] *************** Autotuning format combination: -> Float(64,1), Float(64,1), Float(64,1), Float(64,1), Float(64,1), Float(64,1), Float(64,1), Float(64,1), Float(64,1), Float(64,1), Float(64,1), Float(64,1), Float(64,1), Float(64,1), Float(64,1), Float(64,1), Float(64,1), Float(64,1), Float(64,1), Float(64,1), Float(64,1), Float(64,1), Float(64,1), Float(64,1), Float(64,1), Float(64,1), Float(64,1), Float(64,1) ***************
[05/05/2023-18:30:11] [TRT] [V] --------------- Timing Runner: {ForeignNode[transformer.layers.0.attention.rotary_emb.inv_freq...Cast_11544]} (Myelin[0x80000023])
[05/05/2023-18:30:15] [TRT] [V] Skipping tactic 0 due to insufficient memory on requested size of 257698037760 detected for tactic 0x0000000000000000.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[05/05/2023-18:30:15] [TRT] [V] {ForeignNode[transformer.layers.0.attention.rotary_emb.inv_freq...Cast_11544]} (Myelin[0x80000023]) profiling completed in 4.39308 seconds. Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[05/05/2023-18:30:16] [TRT] [V] Deleting timing cache: 1 entries, served 27 hits since creation.
[05/05/2023-18:30:16] [TRT] [E] 10: Could not find any implementation for node {ForeignNode[transformer.layers.0.attention.rotary_emb.inv_freq...Cast_11544]}.
[05/05/2023-18:30:16] [TRT] [E] 10: [optimizer.cpp::computeCosts::3873] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[transformer.layers.0.attention.rotary_emb.inv_freq...Cast_11544]}.)
The bytes are too much for the GPU memory, how can I fix this ? this is actually a popular language model.