Intelligent Live Video & Image Upscaling 

iSIZE technology is a fast and superior quality upscaling solution for videos and images, that uses the latest machine learning algorithms, proprietary learning methods and a library of trained models to upscale videos & images to degree unseen before – fast, bandwidth efficient & live on the client device.

Our proprietary technology is backed by artificial intelligence and is producing superior results to any outstanding traditional non-AI upscaling methods, bridging the technology gap between the rapid development of viewing devices and the lack of higher quality content, mainly due to bandwidth constraints.

Our core innovation lays in achieving the sophistication of advanced deep neural networks at the speed of simple linear filters. Our technology is deployable today in existing solutions with minor or no upgrade in hardware. We managed to achieve all this with a very compact yet experienced team that includes a good mixture of very senior research and development staff with international experience and junior developers who have commitment to fast delivery.

Since 2015, there has been significant activity in intelligent image and video upscaling from several teams internationally. This report quantifies the performance of the iSize video frame upscaling technology against the state-of-the-art in the field.

How We Do It

Our main areas of technical expertise are machine learning and an in-depth understanding of video codecs. In real-time image and video processing, speed is of the essence to avoid video lag; however traditional machine learning approaches tend to be very slow.

We solved the performance problem with two innovations:

1. Reducing the complexity of our machine learning models for more efficient implementation at both the CPU and GPU level.
2. Integrating the codec with intelligent upscaling to create speed and efficiency gains, allowing our technology to run in real-time without lag.

Overall Flow & Memory Breakdown 

iSize Upscaling Technology can be used as software solution for multi-core platform or cloud based upscaling as a service.

Technical Comparison: iSIZE vs State-of-the-Art (March 2018)

In recent commercial-facing demos (e.g., NVIDIA’s GTC conference in Munich in Fall 2017), iSize’s machine-learning based video upscaling solutions have achieved substantial quality improvement in SD to HD and HD to 4K video with minimal execution time overhead versus all conventional upscaling solutions in smart TVs today.

To showcase our performance against the state-of-the-art in the most comprehensive manner, Figure 1 shows the PSNR-vs-speed characteristics of our solution in reference to the best-performing methods of Track 1 of the NTIRE2017 competition [1], organized within the 2017 Computer Vision and Pattern Recognition (CVPR) conference  http://cvpr2017.thecvf.com/. The figure also includes the high-performing and highly-cited FSRCNN framework of Dong et al. [3] from CVPR 2016. We indicate the Pareto line formed by the solid line joining all solutions that are closest to the (unachievable) optimal for the NTIRE2017 dataset: 35dB at 10ms/frame (top left of Figure 1), as well as the distance of each solution from this optimal point (dotted lines). The detailed performance characteristics of all solutions are summarized in Table 1, which also includes the GPU hardware used for each approach.

The solution offered by iSize is:

  • the method on the Pareto line that is closest to the (unachievable) optimal point
  • 30-fold to more than 6000-fold increase in speed, with only 0.18dB to 1.04dB loss in average PSNR against the state-of-the-art (HIT-ULSee to SNU_CVLab1)
  • 2.88dB increase in average PSNR with only 4.3ms overhead against bicubic upscaling


Figure 1. PSNR vs. upscaling time for various solutions of Track 1 (“bicubic downscaling followed by ×2 upscaling” [1]) of the NTIRE2017 competition [1] and iSize’s upscaler. All PSNRs are reported on the Y channel.


Table 1. Summary of PSNR, speed (including memory transfers) and GPU specs for all solutions on the Pareto line of Figure 1.

Standalone Test: SD->HD & HD->4K Video Upscaling 

While the NTIRE2017 competition forms are very useful benchmark for the state of affairs in image upscaling, the highly-textured image content used in the competition may not fully represent the typical conditions found in real-world video sequences.

Therefore, in a second test, we downscaled sixteen 4K and twenty-two HD uncompressed video sequences from xiph.org (using Matlab’s imresize() function) in order to assess upscaling performance on standard video sequences. The results of Table 2 show the performance of bicubic and Lanczos filter (under OpenCV’s implementation, similar results were obtained with various Lanczos kernels, edge-adaptive filters and other polyphase interpolators), as well as:

  • the SNU_CVLab2 solution that was the winning solution of the NTIRE2017 competition;
  • the well-known FSRCNN method from CVPR 2016 [3].

The iSize upscaler offers more than 1.2dB average improvement against linear filters, while achieving runtime per frame that approaches that of bicubic or Lanczos upscaling. Across the different sequences, improvement ranges from 0.6dB to 2.2dB, depending on the content of each video sequence. Our solution outperforms the NTIRE2017 winning solution. The closest competitor to our approach is FSRCNN [3]: for the SDàHD upscaling, FSRCNN offers 0.32dB higher PSNR to our proposal, albeit at more than 130-fold increase in execution time per frame. Importantly, FSRCNN is outperformed by the iSize upscaler for the HDà4K upscaling tests.

Table 2. Summary of PSNR, percentile reduction of mean squared error against bicubic upscaling, and execution time per frame for: linear upscalers, the NTIRE2017 competition winner and iSize. Unless noted otherwise, all methods were executed on GPU. Execution times include both GPU time and data transfers. The system spec was: Intel Core i5-3570K, 16 GB RAM, NVidia GTX 1070[2].

Given that the video sequences used for the results of Table 2 can contain many frames with limited texture content, the average PSNR difference shown by iSize and FSRCNN against linear upscalers is not as high as for the case of the highly-textured NTIRE2017 images. However, beyond the standard upscaling that is optimized for PSNR, iSize’s unique IP for upscaler “boosting” leads to significant quality improvement, as shown by the visual examples of the next page. This further visual quality improvement incurs 35% overhead (on average) in comparison to the iSize timing results reported in Table 1 and Table 2. The iSize upscaler boosting can be tuned adaptively (from none to some), thereby allowing for tuning of our solution according to the type of content, upscaled resolution and display settings used by the client device.

 


[1] The “Bicubic” (classic) track is meant to facilitate the easy deployment of recent proposed methods for the task for example-based single-image super-resolution. It assumes that the degradation operators are the same as commonly used in the recent super-resolution literature. For obtaining the low res images, the Matlab function “imresize” is used, with default settings (bicubic interpolation) and downscaling factor 2. This was tested on a large newly collected dataset -DIV2K- of images with a large diversity of contents. More information can be found on the competition page [1].

[2] CPU: https://ark.intel.com/products/65520/Intel-Core-i5-3570K-Processor-6M-Cache-up-to-3_80-GHz
RAM: https://www.kingston.com/dataSheets/KHX2133C11D3K4_16GX.pdf
GPU: https://www.asus.com/Graphics-Cards/EX-GTX1070-O8G/

References

  • Timofte, et al., “NTIRE 2017 challenge on single image super-resolution: Methods and results,” Proc. Comp. Vis. and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conf. on Comp. Vis. and Pattern Recognition, CVPR, IEEE, 2017, https://goo.gl/TQRT7E.
  • Lim, et al. “Enhanced deep residual networks for single image super-resolution,” Proc. Comp. Vis. and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conf. on Comp. Vis. and Pattern Recogn., CVPR, IEEE, 2017, https://goo.gl/PDSTiV.
  • Dong, et al., “Accelerating the super-resolution convolutional neural network,” Proc. 2016 IEEE Conf. on Comp. Vis. and Pattern Recognition, CVPR, IEEE, 2016, https://goo.gl/Qa1UmX.