Innova-2 Flex for AI?

requires more than 64GB of RAM … no FPGA board

The Alveo U200 has 64GB of DDR4 and DPUCADF8H supports it. Also, AWS F1 Instances.

I think I should use the RAM and CPU on the host and
interact with Gemmini on FPGA through the RoCC interface

When you tested the demo project you got complete transfer times on the order of 100,000 nanoseconds.

** Avg time device /dev/xdma0_c2h_0, total time 163964 nsec,
** Avg time device /dev/xdma0_c2h_0, total time 107604 nsec,
** Avg time device /dev/xdma0_h2c_0, total time 118067 nsec,

This is due to the latency of PCIe and software/driver overhead. PCIe bandwidth is high but latencies are not great. DDR4 has complete transfer times on the order of 100ns. Your system will be very slow.

I have successfully built Gemmini hardware using this command:

cd chipyard/generators/gemmini
./scripts/build-verilator.sh

You built a Gemmini system for the Verilator simulator. This is a much better idea than trying to use the MNV303611A-EDLT. Simulate your software running on RISC-V+Gemmini. Simulation should be the first step in hardware design.