Current Trends in Video coding
The UK Section second meeting outside London was held at BSkyB Uplink site at Chilworth – about 75 miles from London just north of Southampton.
Over forty people mainly drawn from the 20 miles around –but including some members from London - met over refreshments and then were given tours of the BSkyB Facilities – where the UK’s largest broadcaster uplinks 200 plus Channels and also plays out 80 linear channels and 35 NVOD linear playouts from its Southern Broadcast Centre, and monitors its entire output including the Sky GO services delivered by OTT Internet.
Following a brief introduction by Chris Johns, SMPTE UK Chair and BSKYB’s Chief Engineer in Broadcast Strategy, groups were then hosted by the local staff on a site tour.
Members saw the RF cabins with their microwave routing and Power Amplifiers as well as the many dishes – and then moved back up the chain seeing the Encoder farms and the Playout server systems before visiting the Broadcast Centre itself where channel play out (TX) and monitoring (PCC) of all BSkyB output on any platform takes place.
During the tour we learnt how this facility works with the BSkyB main site at Osterley – in the west of London – which plays out the live channels and controls some of the facilities at Chilworth, and the second uplink site at Fair Oak 10 miles to the east which are totally de-staffed and how each acted as an operational reserve for each other.
After the tours and a brief discussion over refreshments, Phil White Head of the 25 strong coding algorithm team at Ericsson Television (a SMPTE Platinum Sustaining member) gave a compressive account of Developments in Video Coding
Phil White Started by reminding us that the earliest Real-time MPEG2 coder in 1992 was 19 inch rack mounted unit occupying 13 Rack units – 530mm – 11 inches... and probably gave acceptable SD pictures at a bit rate of 8 Bit/sec - which now sounds a high data rate now but gave the possibility of multiple TV channels in in the RF spectrum taken by a single analogue channel. And there was a lot of progress – by 1998 the same function was in 6U – half the volume.
In the same way HD bit rate requirements have decreased with even increasing complexity of the coder.... MPEG2 was at 18 Bit/sec in the early 2000s, but with the take up of MPEG 4 AVC/ H.264 and its generations have taken this down to say 6 M bits/sec for a third generation coder which was in use in 2010. He pointed out that because the algorithm describes how a Decoder handles the bit stream there was great opportunity for the encoder to be improved while not requiring any changes in the decoders in people’s homes. He also pointed out that what an acceptable bit rate is depends on what the source programme is and how it will be viewed, so picture quality was a commercial /artistic decision not just technical.
White then went onto remind us of how Video coding of Moving pictures works - by prediction of the next picture from the previous (using motion vectors) - and then “filling the gaps” with Residuals. These can be far more compactly coded if a Transform into the frequency domain is made – as this puts most of the information in one corner – towards the lower frequencies. Finally the transform values can be quantised (which may also reduce the effect of the higher frequencies). But with so many possible options for how each macroblock can be coded – to code efficiently is an optimisation problem to choose the mode decisions and quantitation choices. Thus most modern encoders use Rate Distortion Optimisation , where a number of parallel codec paths (one for each mode decision) are processed. Each option (mode decision) is evaluated for closeness to the input frame (minimum distortion) and for the number of bits which will be required after Entropy encoding (to form the output bitstream)
This obviously increases the Processing power needed in the encoder – so for the approximately 50% reduction in bitrate between MPEG2 and AVC/ H.264 – an increase of processing power about 8 fold is required, and then to move to HEVC/ H.265 problem another 10 fold increase is required.
He pointed out that with an increase in processing power – improvement can be made to the legacy coding as well.
To put HEVC/H.265 into context he covered the key parameters of the three coding systems
MPEG2 and AVC both have 16*16 macroblock size. For inter (between pictures) prediction MPEG 2 allows subdivision into 16*8 for interlace only, whilst AVC has 3 symmetric subdivisions 16*8, 8*16 and 8*8 (sub partitions down to 4x4 are allowed but not widely used for broadcast). HEVC can have Coding Units (equivalent of a macroblock) from 64*64 down to 8*8 with a hierarchical Quad tree structure. In HEVC inter prediction partitions can also vary from 64*64 down to 8*8.
MPEG2 has only one intra (prediction within a picture) mode – whilst AVC has 9 modes and HEVC 35 modes. Finally MPEG2 has only 8*8 Transforms while AVC has 8*8 and 4*4 and HEVC has 4 Transform sizes from 32*32 to 4*4.
White then looked at each of the three main additional features of HEVC/H.265
Firstly the Macroblock of the older encoders is replaced by the Coding Unit (CU) - the larger block sizes mean large areas can be efficiently coded – but the Hierarchical tree structure allows smaller block sizes to be used where there is higher frequency information while not forcing adjacent blocks to be of reduced size. Each CU can be independently coded as Inter or any of the Intra modes.
Secondly the Prediction units (PU) – These are subdivisions within a CU which are used to predict the current CU from the contents of other pictures. This inter prediction is known as Motion Compensation, each PU has its own Motion Vector which describes where, in a given reference picture, the prediction comes from.
Thirdly there at Transform Units (TU) – Within each CU there is a quad tree structure which describes which size transforms are used. This structure is independent to the PU structure.
Moving from how it works to how well it works White produced the result from various early experiments.
The first experiment compared the HEVC reference model with a broadcast AVC encoder both optimised to produce best PSNR performance (i.e. what is the error on a pixel by pixel basis) – where the comparison was done in purely PSNR terms. Here HEVC could offer about a 33% bitrate saving on a 360 pixel wide picture to a 54% improvement for a 4k Picture. Phil White pointed out that there were quite wide error bars on these results.
He then looked at a comparison of an AVC encoder optimised for Psycho-visual optimisation with a version of the HEVC reference model which had had psycho- visual improvements added – here the comparison was done in terms of JND which is an automated approximation to human perceived quality. The measurements showed there was about a 36% bitrate saving on both resolutions – but when real human viewers were used the savings was shown to be over 50% for UHDTV - this is probably due to the JND metric not being trained on HEVC distortions.
Moving onto the uses of HEVC – there were lots of applications where having more efficient coding (which means fewer bits for the same quality) – enabled the business cases to be more compellingly profitable. Clearly many commercial uses with large numbers of existing decoders will not immediately adopt HEVC but there are uses where this is not the case – such as in increasing the reach of DSL delivered services to new subscribers and enabling more content more easily on Mobile phones (where HEVC can be decoded in software on high end phones). But there were also the new markets of Tablets like iPad – with some broadcasters sending different streams to the second screen. Likewise if a country had not converted to DVB-T2 as the RF Modulation – then HEVC could be used in the greenfield site situation.
In the professional area – DSNG links –where spectrum is expensive and may be difficult to obtain – the Bit rate savings of HEVC code enable more uses.
As one use of HEVC is on Mobile phones – Phil White outlined the important difference between Broadcast and unicast operation – and pointed out that 4G did have a LTE -Broadcast mode which could be switched on for an event where many in a location such as a stadium could be simultaneously viewing – including when a few Cells are linked as an SFN.
From delivery to the small screen – he then looked at the use of HEVC in the immersive experience of UHD TV.
Phil White posed the question “But to what standards?” – he showed that there was very noticeable colour banding with an 8 bit system – and thus 10 bits will need to be used in emission (which HEVC permits) but to make the moving pictures seem realistic a wider Colour Gamut – for instance BT2020 rather than BT 709 used in HD, and then higher frames rates – but at least all pictures will be progressively scanned!.
All of these need to be realised and standardised before the public get a system - which included everything from the camera to the coder, and will required even more processing power in the coder – probably about 10 times more than the existing HD coders (which require 8 times more than AVC)
On that challenging note the meeting moved to questions and discussions, covering in the first instance the issues around dynamic range and emitted bit depth – where Acquisition may use 16 bit, and how gamma may / should become linear. Phil White confirmed that even al low bit rates a sequence coded 10 bit always looked better or the same – and never worse – than the same sequence coded at 8 bit.
There was discussion on the bitrate penalty of higher frame rates – but these often had different GOP structures - for instance 4 frames at low frame rate becomes 7 Frames at double the frame rate which still has the I frames the same time apart – Tim Borer mentioned some work done by the BBC which showed that there was about a 20% increase in bitrate for doubling the Frame rate. But with more complex coding the bit variability was always going to be greater – an I frame in any system is about the same data volume , but then the I B and P frames are smaller and their relative sizes being very coder implementation (and configuration) and input material sensitive.
This brought on the topic of Pre-processing and noise reduction giving better results and the amount by which the transform and quantisation on the coder can reduce High Frequency noise. Roderick Snell said that if the motion estimation can be optimised to ignore noise gave a good improvement.
The matter of test material – particular at higher spatial and temporal resolution was discussed and the advantages of good links between the broadcasters and the manufacturers emphasized.
The Discussion also covered the professional use of JPEG2000 and AVC Intra, The fact that decoders were also increasing (slightly) in complexity, coder latencies, and the MPEG-LA patent pool.
In drawing the meeting to a close Chris Johns, Chairman of the UK Section posed the question, – “will there still be the 50% improvements in Codec efficiency in the iteration after HEVC?”
Phil White said that there is much within HEVC which will need to be optimized he is not looking at the next generation quite yet!
Members continued in informal discussion for many minutes after the close of formal business
The UK section thanks BSkyB for providing the very secure venue and a glorious spread of Refreshments, and the work of their staff both in doing the tours of the facility and in supporting the practicalities of the visit..