Validating Our Deep Perceptual Precoder With Independent Viewer Preference Testing On Amazon MTurk

iSize Technical Articles

By Dr Yiannis Andreopoulos and Dr Vasileios Giotsas,  iSize Technologies


As discussed in a previous article, a core technical objective of iSize is to find the optimal way to preprocess (or precode – in our nomenclature) any input video into a (typically) smaller pixel stream, in order for a video encoder to achieve the best video quality at a given bitrate, or save the maximum amount of bits for a given perceptual quality level. We are especially interested in this problem given that perceptual quality can now be measured with advanced perceptual quality metrics from the literature, e.g., using the fusion of multiple quality metrics (akin to VMAF of Netflix) or via other advanced metrics

Given our very encouraging results in this domain, some of which were reported in our IBC 2019 paper and are now commercially available in our platform at, we are always looking for ways to independently validate our deep perceptual precoder framework based on independent viewing tests “in the wild”, i.e., viewers watching our precoded and encoded video on their devices. To this end, we used the Amazon Mechanical Turk service to ask independent MTurk workers from around the world to evaluate full HD video encoded with a state-of-the-art HEVC encoder with and without the use of our patent-pending deep perceptual precoding. In the discussion that follows, we present our setup and the associated results.

Experimental Setup

To evaluate whether our approach brings perceptual quality improvement at the same bitrate or comparable quality at 40% lower bitrate that the HEVC encoder, we asked users to watch two HEVC encodings of the same video in split screen, and tell us which one (if any) they prefer: one of them is the HEVC encoding after processing the original video with iSize precoder, while the other is the original video encoded with the same HEVC encoder (no iSize precoding). Each test corresponded to one of following bitrate combinations:

  • iSize+HEVC at 5mbps, HEVC at 5mbps → can we offer noticeable quality improvement for FHD video encoded at typical low/medium bitrates?
  • iSize+HEVC at 5mbps, HEVC at 8.5mbps → can we achieve comparable or superior quality when offering 40% saving for medium-bitrate encoding?
  • iSize+HEVC at 8.5mbps, HEVC at 14mbps → can we achieve comparable or superior quality when offering 40% saving for high-bitrate encoding?

with the HEVC settings corresponding to the “slow” preset of the x265 encoder and a video buffer verifier (VBV) encoding recipe allowing for content-adaptive variable bitrate (VBR) encoding (max rate tuned at 5/8.5/14mbps – according to each case – and CRF parameter set to 19).

Top of the page image shows the user interface of the test. After watching the two videos playing in parallel, users were asked if they preferred the visual quality of the left side, the right side, or if they had no preference, using the buttons below the video player. The video playback started automatically in full-screen and users were able to pause and seek if they wanted to inspect individual video frames more carefully. Since the playback of both video encodings was concurrent, buffering time due to bitrate difference was not a factor in user preferences.

To ensure our visual quality setup is robust, we took the following measures:

  • Users were able to select only after watching the entire video. The buttons were activated only when the playback ended and the playback time was equal or greater than the video duration.
  • To prevent the use of devices with small screen sizes, which may miss significant visual detail, we detected and blocked the use of mobile devices, such as smartphones, based on user agent strings and screen resolutions.
  • Users were not aware of which video encodings were being streamed, therefore they were not biased by the different encoding options.
  • Each user watched only one bitrate combination of the same video to make sure that the content of a video and memory of their previous preference will not affect their selection.
  • To avoid potential biases with users preferring a specific side, for each bitrate combination we created two videos with iSize+HEVC being either on the left or right and then collected the same number of ratings for both cases. For example, for the video title “touchdown pass” for the iSize+HEVC 5m/HEVC 8.5m combination we collected 40 ratings with the iSize+HEVC 5m version was on the left side of the video, and another 40 ratings with the iSize+HEVC 5m version on the right side. This was done by splitting the videos into two halves (since we want to show two concurrent videos in 1080p resolution we can only show half side of each video at any given viewing) and mixing which part showed iSize+HEVC and the native HEVC encoding.

In addition to the above measures, we created two qualification tests based on which we considered users as “trustworthy”. The first test displayed the same video encoding on both sides and we required the user to indicate no preference, since the visual content of both sides was identical. The second test compared a lossless encoding against an encoding of low quality with pronounced artifacts, and we required the user to indicate preference to the lossless encoding. Users that did not pass these tests (shown to them unknowingly) were not taken into account in our results.


We have deployed the above experiment in AWS Mechanical Turk and, for each title, we solicited 240 valid measurements, 80 measurements for each bitrate combination. Figure 1 illustrates the results for each video, while Figure 2 shows the corresponding VMAF-bitrate plots. Table 1 lists the exact scores, including the VMAF difference for each bitrate combination.

Figure 1: User ratings per video title. “iSize” refers to iSize+HEVC encoding, while “Codec” refers to HEVC encoding, both done under the same encoder and encoding recipe





Figure 2: VMAF-bitrate plots per video title.


Table 1: User preference (%) per video title and bitrate combination, and VMAF difference (ΔVMAF = VMAFiSize+HEVC – VMAFHEVC)

Discussion and Conclusions

The results show that for every video and for every bitrate combination except one (“crowd run” at 5m/8.5m), viewers had a strong preference for the iSize+HEVC result, despite the fact that the 5m/8.5m and 8.5m/14m cases correspond it being at 40% lower bitrate than the HEVC encoding.

We observe good correspondence between ΔVMAF and the crowdsourced preferences: when ΔVMAF increases at or above 4.5 points, the iSize+HEVC approach begins to be the clear majority in the viewers’ preference. This agrees with independent reports that ΔVMAF ≥ 6 is above the Just Noticeable Difference (JND) threshold, where the vast majority of viewers are expected to notice quality difference between two videos. Interestingly, in the single case where viewers did not prefer the iSize+HEVC case, ΔVMAF is also found to be negative (“crowd run” at 5m/8.5m, ΔVMAF=-3.3). The iSize+HEVC solution was not the clear majority in only two other cases (“tractor” 5m/8.5m and “touchdown pass” 5m/5m), both of which had ΔVMAF below the JND value. However, HEVC was not the preferred choice for neither of these two cases; instead, about 30% of the viewers selected the “no preference” option. Moreover, for both of the “tractor” and “touchdown pass” sequences, Figure 3 shows that VMAF values were above 96, which makes quality difference arguably too small to differentiate with HEVC being at 40% higher bitrate.

It is appreciated that the amount of crowdsourced scores and the number of videos used for this first assessment cannot provide for strong statistical significance. However, given we ensured that:

  • visual scoring was protected against random scorers;
  • all raters had to watch the entire video before scoring and presentation of sequences and left-right placement was randomized to avoid conditioning;
  • viewing devices were controlled for screen resolution and avoidance of mobile screens,

we find the averages of Table 1 quite informative. Specifically,

  • For the “same-bitrate” 5mbps-vs-5mbps test, on average, 70.2% of the raters preferred the iSize+HEVC result; the VMAF difference of 6.8 points also confirms that our precoding approach is expected to be above the JND: only 10.2% of the raters preferred the HEVC result.
  • For the “40% bitrate saving” cases (iSize+HEVC at 5mbps vs. HEVC at 8.5mbps and the equivalent at 8.5mbps vs. 14mbps), on average, 52.3% to 64.4% of the raters preferred the iSize+HEVC result vs. only 10.6% to 20.4% for the HEVC result. The VMAF difference of 2.8-2.9 points also shows that our result is perceptually somewhat superior, but not above the JND. However, this indicates that, on average, the iSize+HEVC approach will be perceptually comparable or somewhat better than HEVC, while offering 40% rate reduction under the exact same encoding conditions.

These observations motivate us to explore even higher bitrate reductions in our future work and other codecs like VP9, AV1, VVC, and beyond. This is especially interesting when considering that these benefits are delivered today, on existing devices and players, with no need for client-side customization. We are also confident that we are experiencing only the initial phase of what is possible with deep perceptual precoding: as our deep precoding models improve over time with more data and better neural network designs and encoders become more and more “intelligent” SNR-bitrate machines that can handle complex texture and motion patterns, deep perceptual precoding can offer increased gains.

To allow for further independent assessment, we have made the sequences and left-right viewing available in our portfolio page, and we encourage you to have a look. We also encourage interested readers to try for themselves at and send us your thoughts and comments at We definitely believe we can improve further, and would be very interested to hear your thoughts and observations!