A new hybrid approach to photorealistic neural avatars

Neural avatars have emerged as a new technology for interactive remote presence. Amongst other things, they are expected to influence video conferencing, mixed reality frameworks (e.g. remote appearances at physical meetings), and 2D or 3D gaming and metaverse applications. At the moment, they are limited to either cartoon representations of the speaker (e.g. Mesh avatars for Microsoft Teams) or experimental prototypes of photorealistic neural rendering of speakers, like NVIDIA Maxine video compression and Meta pixel codec avatars.

In both categories, the limitations in the rendering of intricate speakers’ expressions, gestures and body movements are severely limiting the value of remote presence and visual communication.

When faced with such poor rendering that does not reflect reality, many users may simply prefer not to use any video at all. This is because we all evolved to be very astute observers of human faces, gestures and body movements. We use delicate expressions, hand gestures and body movements to convey trust and meaning, interpret human emotion, the speaker’s experience on the topic discussed, and many other things. It is widely accepted that the majority of human communication is non-verbal, so all these details matter a lot.

The iSIZE team has been working for almost 2 years on this problem and has identified a new way to introduce advanced AI tools to facilitate such remote presence applications.

