Running inferences on H200 with block storage

I am planning to run Gen AI based ChatBots application on HGX or SMCI 5U servers (e.g., AS -5126GS-TNRT) directly connected to block storage. The idea is: receive the incoming chat text in a application, which retrieves the data from RDBMS running on Block, passes whole bundle to LLMs running on server, gets the formatted reply back and relays it back to user. I need some expert opinion on this approach. Then for the question on small amount of shared drive, I plan to create a volume on block and expose it as NFS.