Watch Meta’s New ‘Audio To Expression’ Quest SDK Feature In Action

Meta shared a clip showing the new Audio To Expression feature introduced in Quest v71 in action.

Audio To Expression is an on-device AI model that generates plausible facial muscle movements from only microphone audio input, providing estimated facial expressions without any face tracking hardware.

The earlier Oculus Lipsync SDK from 2015 offered this capability for lip movement only, and Meta’s Audio To Expression is its official replacement, offering cheeks, eyelid, and eyebrow movement too. And remarkably, Meta claims Audio To Expression actually uses less CPU than Oculus Lipsync.

Audio To Expression supports Quest 2, Quest 3 and Quest 3S. It also technically supports Quest Pro, though that headset has face tracking sensors that developers can leverage to represent the true facial expressions of the wearer, not just an estimate.

In the embedded clip, you can see the stark difference between Audio To Expression and the old Oculus Lipsync SDK, given the same input.

Meta Audio To Expression (left) vs the old Oculus Lipsync SDK (right), with the same input.

As well as improving the realism of avatars driven by non-Pro Quest owners in social VR and multiplayer games, Audio To Expression can also be used for NPC faces, which could be useful for smaller studios and independent developers that can’t afford to use facial capture technology.

Meta’s own Meta Avatars don’t yet support Audio To Expression (they still use the Oculus Lipsync SDK), though they do have simulated eye movement, wherein the developer tags each virtual object in the scene by its level of visual saliency, as well as occasional blinking for added realism, so they aren’t just limited to lip movement.

Developers can find the Audio To Expression documentation here: Unity / Unreal / Custom Engines.

This article was originally published on uploadvr