I recently read a few rather extraordinary marketing claims from OnLive about their new server-side gaming technology. Since they don’t really make much technical information available, one is left to speculate what they mean by “1 ms” latency, especially when it is directly compared to “500 ms to 750 ms lag” in video conferencing. I’m sure that statement makes sense from someone’s perspective, just not from a video compression perspective.
I sat around over a beer last evening and wrote the following to the Schrödinger mailing list, because someone asked. Rather than answering the question, I decided to talk andomly about how low-latency video encoding works.
One key point about low latency video encoding is that the output bits that represent the pixel have to exist somewhere in the bitstream between the time the encoder gets the pixel from the camera, and N ms later, where N is the latency.
One method of very low-latency compression works on a scanline basis. An example is the low-delay profile of Dirac. A camera reads out a few scan lines (say, 16), the encoder compresses them, and then sends those bits out over ethernet or ASI or whatever. The latency is on the order of a few scan lines, say 16*2 + a small number. Why 16*2? Because it takes 16 lines to read in the 16 line chunk, then spends the
time that it takes to read in the next chunk to encode the first chunk and send it out over the wire. Simultaneously, the decoder reads in the data and decodes. Then during the third set of 16 lines, the decoder scans out the uncompressed lines. So the decoder scans out line 0 as the camera is scanning out line 32. Real encoders need a bit of extra time for synchronization, so 32 is ideal. Of course, in a real system there is network latency, but we’ll make someone else worry about that. 32 lines works out to be abous 1 ms for 1080p at 30 frames per second, depending on exactly the system you’re using. Compression ratios are purposefully low, since you can’t spread around worst-case bits at all, and because this kind of compression is only really useful for studio work.
Note that camera that has a few-scanline latency start at USD 10,000 and an encoder/decoder pair for DiracPro is about USD 4,000, IIRC. This is not the kind of technology you roll out in a consumer product.
Another method is similar, but using an entire frame instead of a few scan lines. In this case, you get a theoretical latency of 2 frames, or about 60 ms for 30 fps video. I’ve seen companies advertising encoder/decoder pairs that claim 70 ms latency (of course, without any network latency), and I can pretty much believe this number. Again, you can’t get away with cheap hardware — my DV camera has an internal latency somewhere between 90 and 120 ms, and HDV cameras are much worse.
In a frame-based low-latency system, it’s much more realistic to use motion compensation, in which you use the previous one or two frames as reference pictures. Since the general point of using motion compensation is to decrease the bit rate, this causes compression artifacts immediately after scene changes that clear up after a few
frames, and is very characteristic of the technique.
Due to the way that Dirac puts together pictures, the non-low-delay profiles of Dirac has a approximate latency of 4 pictures for a simple implementation, although you can decrease this to nearly 2 pictures with more complex algorithms. Schroedinger implements the simple algorithm, and with suitable modifications (it does not do
this by default) you can get close to 4 frames latency. Schro’s implementation of Low-Delay Profile is also 4 frames, since it uses the same code.
Entropy Wave has implementations of the more complex algorithm for Simple and Intra profiles, as well as an actual low delay implementation of Low-Delay profile, with latencies that are very near the theoretical latencies. These are not open source. Unfortunately, since all the code that currently can use these codecs is frame based, there’s very minor advantage over Schroedinger unless you write a bunch of custom
code.
It should be obvious at this point that the “1 ms” number has very little to do with video compression, and a lot more to do with how game engines work.