Is it possible to generate .tfrecorfs for tlt training directly without using intermediate kitti format?

Actually you can. But seems that you will need more effort. Firstly, you can write script to dump the tfrecords you have generated in order to see its feature. Then based on your data, you should generate the new tfrecords which contain the same feature.
So, suggest you using tlt-dataset-convert.
The effort for you is just converting your label to kitti format. And the TLT training pipe supports training only for class and bbox coordinates. That means, take below typical KITTI format as an example,
car 0.00 0 0.00 587.01 173.33 614.12 200.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Only car, x1, y1,x2, y2 are needed when you write script to convert to KITTI format.
For example,

with open(new_label, ‘a’) as j:
j.write(“{} 0.0 0 0.0 {:.5f} {:.5f} {:.5f} {:.5f} 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n”.format(class_name,x1, y1,x2,y2))