FP16 builder does not work, DLA does not accept anything, How to accelerate Deep Learning?

Hello,

I can summarize my problem as such: I cannot convert any layers to FP16 due to kSTRICT_TYPES of FP16 flags having no effect except when specifically called for specific layers. Mixed precision + kSTRICT_TYPES, which type is chosen? - #7 by spolisetty
Since I cannot convert layers to FP16 I cannot use DLA on any layers, not just shuffle, but conv layers too. I try to use DLA on any possible layer, not specific layers. All of the attempted conversions to DLA falls back to GPU usage. (Though not all due to failure of FP16 conversion, but some due to this sort of errors DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer Add_259)

builder->allowGPUFallback(true);
config->setFlag(BuilderFlag::kFP16);
builder->setFp16Mode(true);
config->setDefaultDeviceType(DeviceType::kDLA);
config->setDLACore(useDLACore);
config->setFlag(BuilderFlag::kSTRICT_TYPES);
mEngine = std::shared_ptr<nvinfer1::ICudaEngine>(builder->buildEngineWithConfig(*network, *config), InferDeleter());

Is there a documentation on which types of layers are friendly to run with half mode?