Laying the Groundwork for a Generalized Psychovisual Preprocessing Framework for Video Encoding

iSize News

At this year’s SMPTE Annual Technical Conference (ATC), iSIZE CTO, Yiannis Andreopoulos, delivered a presentation entitled Toward Generalized Psychovisual Preprocessing For Video Encoding, looking at how deep perceptual preprocessing has emerged as a new way to enable further bitrate savings across several generations of video encoders without breaking standards or requiring any changes in client devices.  

SMPTE ATC is a flagship event for the media and entertainment sector, with the SMPTE standardization committee itself celebrating its 100th anniversary this year at this annual landmark event. This year, speakers and panellists covered a range of topics, including advances in acquisition technologies, IP infrastructures, virtual production, media in the cloud, and sustainability. 

During his session, Andreopoulos spoke about the requirements for a technology to become a generalized psychovisual preprocessing framework for video encoding and described how an existing solution by iSIZE can be practically deployable for video-on-demand, live, gaming and user-generated content. 

While traditional approaches, like encoder-specific perceptual tuning, offer some visual quality improvement, they need to be applied for each encoding and in most cases deliver improvements on one metric (e.g., VMAF) at the detriment of all other perceptual quality scores. Improvements are also only relevant within certain regimes of bitrate/quality, where they are not always quantifiable or visible. This makes finding the best rate-perception-distortion-complexity trade-off a grand challenge for the video engineering and AI communities. 

Deep perceptual preprocessing has recently emerged as a new way to enable further bitrate savings across several generations of video encoders without breaking standards or requiring any changes in client devices. It lays the foundations toward a generalized psychovisual preprocessing framework for video encoding. 

iSIZE has shown promising results using state-of-the-art AVC, HEVC and VVC encoders, delivering average bitrate (BD-rate) gains of 11% to 17% with respect to three state-of-the-art reference-based quality metrics (Netflix VMAF, SSIM and Apple AVQT), as well as the recently proposed non-reference ITU-T Rec. P.1204 metric. The runtime complexity of this model on CPU has been shown to be equivalent to a single x264 medium-preset encoding. On GPU hardware, the proposed approach achieves 260fps for 1080p video (below 4ms/frame), thereby enabling its use in very-low latency live video or game streaming applications.  

Andreopoulos also spoke about how iSIZE’s deep perceptual preprocessing offers compounded gains to any encoder-specific perceptual quality optimization, e.g., within AV1 & VVC. The single-pass nature of the iSIZE solution, along with decoupling from specific encoder standards and vendor implementations, allows for easy deployment on custom hardware or high-performance CPU/GPU clusters and current implementation complexity already allows for real-time operation under GPU or multi-CPU environment. iSIZE plans to highlight further optimizations in the months ahead. These promising results make iSIZE’s approach a more practically deployable one for video-on-demand, live, gaming and user-generated content than traditional solutions. 

To download the presentation and watch the virtual session head to the SMPTE website here.