AR Rendering in Cloud - Realtime multiuser server solution

Can we render Virtual Objects in live video and manipulate those virtual objects so each participant receives a copy of different manipulated virtual object.
Send Video frames, data from accelerometer, gyro and gps to server and render virtual object in cloud and share the experience with others. A shared realtime multi user server solution.

You can do that, but the latency might be unpleasant if the video and virtual objects are combined on the server. Even with local (on the users device) compositing the latency is a challenge. But in general it would be possible, if you build the whole infrastructure with latency in mind.
Do you have a specific use-case in mind? Specific devices you target?

1 Like

Thank you for your time and consideration. I couldn’t reply earlier due to certain engagements, sorry for that.
I had latency issue in mind, but I am not certain of numbers like what would be the latency like in real if I can get an approx from someone who has done it or researched on it. Our requirement so far is not to have realtime interaction from consumers (receivers) but we will act as transmitters (tv channel).
Can you guide me regarding infrastructure. right now what I think is I have to send video buffers to servers along with data from accelerometer, gyro & gps with timestamp and than reconstruct the AR session(s) up in cloud. And I believe I have to write my own AR rendering framework for it. I couldn’t find any SDK or platform already built for it. I want to explore CloudXR if it supports this kind of cloud rendering. As I believe it is doing similar thing with VR. I am open any kind of constructive comments, suggestions and opinions.


As you mentioned accelerometer and gyros I assume the users can move the device / virtual camera and look around the virtual objects. This is already a kind of interaction. In VR motion to photon is often quoted to best be below 20ms. To get there, the HMD runtimes (and in your case the software on the users devices) will want some reprojection (aka TimeWarp / SpaceWarp). Here I would consider combining the video stream with the virtual objects on the device instead of blending those in the cloud. A higher latency can be fine depending on the application.
I’m not an expert with the full stack: video encoding, quality of service, streaming, sensor fusion, motion prediction, scalable server infrastructure, etc. So I don’t have all the answers. You either need a larger team of experts or use available software stacks (and even then it might be a larger task). Looking at CloudXR would be one option when it becomes available.
What are the devices you want to target for the user? What AR frameworks exist on those? How many users do you need to support at the same time? How complex is the rendering?

I have a prototype on device (mobile) where I render objects on phone and broadcasts it. Why do I need to render objects in cloud is that I want to change rendered objects each user to receive a different rendered object. e.g if I place a red ball as augmented object the consumers will receive something of their liking like a yellow box or blue triangle.
I have achieved this proof of concept on device but the problem is that isn’t scalable. As I have to broadcast multiple sessions and maintain certain scenes which is too much for devices.
The other approach would be to broadcast data from one device to other device and preform the heavy lifting there, but I doubt as receiving devices would have to perform computations again. If we try to scale it would be too much useless computation and network consumption. Also every deice would need a fast internet connection.
Which brings me to deduce that server rendering is better for such kind of work.
I am trying to target mobile devices and common web browsers.
Are there AR frameworks for cloud, I don’t know about this.
Users could be like multiple e,g youtube
rendering still depends upon users model. but I don’t expect it to be complex 3d models for now.

any kind of feedback is highly appreciated.

If the 3D part isn’t complex, why not send just the scene description to the server and and from there to the clients but with 100% local rendering? That would use the local head/camera pose and thus be lowest latency. The architecture would be more like that of a traditional multiplayer game, so there are already existing solutions and lots of literature if you want to build it your own.
I don’t see the need for cloud rendering yet.