How to use k8s to build up GPU cluster and setup the load balance (I don't know which forum section I can post)

My use case is that I capture RTSP stream decode and infer.
If I have two PC with two GPUs, I have used k8s to build a cluster . I have already written an app with the same process as above(capture RTSP stream decode and infer). And the app has http API to add or delete the RTSP stream. But it seem that there is no load balancer to manage the load of two GPUs. The load is always on the machine 1.
How can I let the k8s allocate the load base on GPUs load? For example, 1 GPU just can process 30 streams.
Or any mistake of my ideal? I’m green on this using case of cluster.
I don’t know which forum section I can post, you can move the post to the correct forum section.