<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Passing on the Left &#187; video</title>
	<atom:link href="http://www.schleef.org/blog/category/video/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.schleef.org/blog</link>
	<description>Just another WordPress weblog</description>
	<lastBuildDate>Thu, 12 Nov 2009 04:39:46 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Theora on TI C64x+ DSP and OMAP3</title>
		<link>http://www.schleef.org/blog/2009/11/11/theora-on-ti-c64x-dsp-and-omap3/</link>
		<comments>http://www.schleef.org/blog/2009/11/11/theora-on-ti-c64x-dsp-and-omap3/#comments</comments>
		<pubDate>Thu, 12 Nov 2009 04:24:13 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[entropywave]]></category>
		<category><![CDATA[gstreamer]]></category>
		<category><![CDATA[video]]></category>

		<guid isPermaLink="false">http://www.schleef.org/blog/?p=43</guid>
		<description><![CDATA[For the last several months, Entropy Wave has been making Theora work on the TI C64x+ DSP as a project for Mozilla Corp.
The goal behind porting to the C64x+ is to run on OMAP3 SoC from TI, which has an ARM Cortex A8 core and also has a C64x+ DSP coprocessor.  This SoC (System [...]]]></description>
			<content:encoded><![CDATA[<p>For the last several months, <a href="http://entropywave.com/">Entropy Wave</a> has been making <a href="http://theora.org/">Theora</a> work on the <a href="http://www.ti.com/">TI</a> C64x+ DSP as a project for <a href="http://www.mozilla.com/">Mozilla Corp</a>.</p>
<div class="wp-caption alignnone" style="width: 610px"><a href="http://schleef.org/misc/bbb-theora-bb.jpg"><img title="Theora playback on Beagle Board" src="http://schleef.org/misc/bbb-theora-bb.jpg" alt="An Ogg/Theora video of Big Buck Bunny being played back on a Beagle Board via the C64x+ DSP coprocessor" width="600" height="400" /></a><p class="wp-caption-text">An Ogg/Theora video of Big Buck Bunny being played back on a Beagle Board via the C64x+ DSP coprocessor</p></div>
<p>The goal behind porting to the C64x+ is to run on <a href="http://focus.ti.com/general/docs/wtbu/wtbuproductcontent.tsp?templateId=6123&amp;navigationId=11989&amp;contentId=4682">OMAP3</a> SoC from TI, which has an ARM Cortex A8 core and also has a C64x+ DSP coprocessor.  This SoC (System on Chip) is best known as being the base behind Nokia&#8217;s N series of mobiles (including the N900), the Motorola Droid, Palm Pre, and the <a href="http://beagleboard.org/">Beagle Board</a>.  The DSP coprocessor is commonly used for audo and video processing, including video encoding and decoding, and TI makes codecs available for MPEG-4 video decoding, AAC decoding, etc.  Having Theora decoded on the DSP fits into Mozilla&#8217;s <a href="https://wiki.mozilla.org/Fennec">Fennec</a> project, making Firefox with video useful on a mobile platform.</p>
<p>One of the engineering reasons behind having a separate processor for media handling is that it separates real-time tasks (media decoding) from non-real-time tasks, such as running web browser software.  From the standpoint of software running on the ARM, the video decoder looks and acts just like a hardware video codec.  The DSP on the OMAP3 is even more compelling for video decoding because attached to the DSP are several units that accelerate motion vector copying, VLC decoding, and loop deblocking.  Unfortunately, these pieces are not publicly documented by TI, so the current Theora port (which is open source) is unable to use them.  A future Entropy Wave project will likely add support for these acceleration units which would allow the performance of the Theora decoder to be similar to TI&#8217;s MPEG-4 codec, which <a href="http://felipec.wordpress.com/2009/10/13/new-project-gst-dsp-with-beagleboard-demo-image/">can do 800&#215;480 playback</a> (possibly more?).  As it looks now, the resulting code would necessarily be closed source until such a time when TI wishes to make the specifications public.</p>
<p>As it currently stands, the Theora decoder plays 640&#215;360 24fps at slightly more than 100% speed on average.  This isn&#8217;t quite good enough to call it &#8220;real time&#8221;, since some frames take longer than the allotted time to decode, but it&#8217;s pretty close and the results are good.  Additional speed improvements in libtheora would require internal changes, which would be a project in itself.  One clear area for improvement is that the DSP spends a substantial part of its time idle, because the host code is serialized with the DSP processing.  Fixing this is likely to put the above case firmly into the &#8220;real time&#8221; category.  Given that 640&#215;360 is larger than the iPhone display resolution and almost as large as the N900 resolution, it&#8217;s clearly good enough, even if it is less than TI&#8217;s hardware accelerated MPEG-4.</p>
<p>On the Entropy Wave site is a <a href="http://code.entropywave.com/leonora-beagle-board-demo/">page</a> describing the demo, including where to download images and how to compile source code.</p>
<p>A big thanks to the people that laid the foundations for this work, especially <a href="http://felipec.wordpress.com/2009/10/13/new-project-gst-dsp-with-beagleboard-demo-image/">Felipe Contreras</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.schleef.org/blog/2009/11/11/theora-on-ti-c64x-dsp-and-omap3/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>YCbCr Gamut Checking</title>
		<link>http://www.schleef.org/blog/2009/10/07/ycbcr-gamut-checking/</link>
		<comments>http://www.schleef.org/blog/2009/10/07/ycbcr-gamut-checking/#comments</comments>
		<pubDate>Thu, 08 Oct 2009 07:04:19 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[gstreamer]]></category>
		<category><![CDATA[video]]></category>

		<guid isPermaLink="false">http://www.schleef.org/blog/?p=36</guid>
		<description><![CDATA[I recently added a pattern to GStreamer&#8217;s videotestsrc that can be used to check YCbCr to RGB conversion is being done correctly as part of video output.  It is the result of a clever hack &#8212; some YCbCr values, when converted to RGB, are out of range, so as part of the conversion process, they [...]]]></description>
			<content:encoded><![CDATA[<p>I recently added a pattern to GStreamer&#8217;s videotestsrc that can be used to check YCbCr to RGB conversion is being done correctly as part of video output.  It is the result of a clever hack &#8212; some YCbCr values, when converted to RGB, are out of range, so as part of the conversion process, they are clamped to the nearest RGB value.  The pattern generator creates a checkerboard pattern of a color (say, red) and a YCbCr value that upon correct conversion will result in the same color.  Thus the pattern should be invisible.  Usefully, these out-of-gamut YCbCr values are preserved by video codecs, so I can present to you a Theora video demonstrating this:</p>
<p><video src="http://code.entropywave.com/test-media/gamut/gamut-theora-bt470.ogv">your browser doesn&#8217;t support the video tag.  Download Firefox</video></p>
<p>Firefox does the conversion correctly, so it&#8217;s unlikely you&#8217;ll see the pattern.  However, some video display drivers still get this wrong, so you might see the pattern when playing the video in a standalone program that uses XV.  For those of you with working kit, I created a demonstration video that simulates a bad conversion:</p>
<p><video src="http://code.entropywave.com/test-media/gamut/gamut-theora-simulated-breakage.ogv">your browser doesn&#8217;t support the video tag.  Download Firefox</video></p>
<p>Sometimes it&#8217;s possible to see the pattern very faintly due to rounding in even a correct conversion.  This is unavoidable because the RGB->YCbCr->RGB round trip is lossy.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.schleef.org/blog/2009/10/07/ycbcr-gamut-checking/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
<enclosure url="http://code.entropywave.com/test-media/gamut/gamut-theora-bt470.ogv" length="167625" type="video/ogg" />
<enclosure url="http://code.entropywave.com/test-media/gamut/gamut-theora-simulated-breakage.ogv" length="178487" type="video/ogg" />
		</item>
		<item>
		<title>Cog in gst-plugins-bad</title>
		<link>http://www.schleef.org/blog/2009/09/19/cog-in-gst-plugins-bad/</link>
		<comments>http://www.schleef.org/blog/2009/09/19/cog-in-gst-plugins-bad/#comments</comments>
		<pubDate>Sat, 19 Sep 2009 20:05:30 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[orc]]></category>
		<category><![CDATA[video]]></category>

		<guid isPermaLink="false">http://www.schleef.org/blog/?p=24</guid>
		<description><![CDATA[I finally moved my collection of Orc-based GStreamer plugins (codename &#8220;Cog&#8221;) into gst-plugins-bad, since they&#8217;re moved on from being an experiment.  Orc is a runtime compiler for a simple cross-platform assembly-like language that specifically targets SIMD instructions for several processors.  Orc is very effective inside it&#8217;s domain, which is small but growing.
One such application that [...]]]></description>
			<content:encoded><![CDATA[<p>I finally moved my collection of Orc-based GStreamer plugins (codename &#8220;Cog&#8221;) into gst-plugins-bad, since they&#8217;re moved on from being an experiment.  <a href="http://www.schleef.org/blog/2009/05/31/orc-040/">Orc</a> is a runtime compiler for a simple cross-platform assembly-like language that specifically targets SIMD instructions for several processors.  Orc is very effective inside it&#8217;s domain, which is small but growing.</p>
<p>One such application that is covered is chroma subsampling and color matrixing for video, semi-incorrectly referred to as &#8220;colorspace conversion&#8221; in GStreamer.  There has been a colorspace element in Cog (cogcolorspace) for some time, but I never really bothered to do any speed comparisons between it and the default GStreamer colorspace element (ffmpegcolorspace), which is based on code copied from FFMpeg.  However, recently I did, and was somewhat surprised (although I shouldn&#8217;t have been) that cogcolorspace is the same speed as, or much faster than, ffmpegcolorspace for almost all operations.  (Please note that the FFMpeg code was forked a long time ago and heavily modified, so it does not reflect FFMpeg itself, only GStreamer&#8217;s ffmpegcolorspace.)</p>
<p>This is a scatter plot of the run time (in ms) for converting 1000 frames of 320&#215;240 video between a variety of uncompressed video formats:</p>
<p><a href="http://www.schleef.org/blog/wp-content/uploads/2009/09/colorspace-time-scatterplot.png"><img class="alignnone size-full wp-image-25" title="Colorspace element execution time scatter plot" src="http://www.schleef.org/blog/wp-content/uploads/2009/09/colorspace-time-scatterplot.png" alt="Colorspace element execution time scatter plot" width="463" height="288" /></a></p>
<p>The axes are execution time (in ms), with cogcolorspace on the horizontal axis and ffmpegcolorspace on the vertical axis.  The green line represents same execution time, thus for points below the line, ffmpegcolorspace was faster, for those above, cogcolorspace was faster.  Most of the points clustered around the green line are statistically the same as the green line, since my timing method is quite crude.  Things to observe from this graph are that 1) many cases are very similar in speed, indicating that both ffmpegcolorspace and cogcolorspace are using similar code paths, 2) some cases, cogcolorspace is a <em>lot</em> faster, probably indicating that there isn&#8217;t an assembly fast path in ffmpegcolorspace for that conversion, and 3) a few cases (which, not coincidentally, are the most heavily used cases) ffmpegcolorspace is slightly faster than cogcolorspace.</p>
<p>The conclusions to draw from this are that 1) by writing very generic code with Orc, you can get very similar results to hand-crafted assembly code, and 2) a developer can cover a lot more cases with a small amount of work, and 3) there are a few cases where special-case Orc code would be beneficial.</p>
<p>This is only the low quality mode that cogcolorspace supports, which is similar or identical in quality to ffmpegcolorspace.  Higher-quality conversion is also implemented in most cases, and is only slightly slower in speed.  This is the real advantage of Orc &#8212; Orc takes care of huge number of combinations of options, and produces good SIMD code for all of them.</p>
<p><img src="file:///tmp/moz-screenshot.png" alt="" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.schleef.org/blog/2009/09/19/cog-in-gst-plugins-bad/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Orc-0.4.0</title>
		<link>http://www.schleef.org/blog/2009/05/31/orc-040/</link>
		<comments>http://www.schleef.org/blog/2009/05/31/orc-040/#comments</comments>
		<pubDate>Sun, 31 May 2009 23:33:42 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[liboil]]></category>
		<category><![CDATA[orc]]></category>
		<category><![CDATA[video]]></category>

		<guid isPermaLink="false">http://www.schleef.org/blog/2009/05/31/orc-040/</guid>
		<description><![CDATA[Lately, I&#8217;ve been working on a side project called Orc as a replacement for liboil.  Liboil&#8217;s first major problem has always been that it doesn&#8217;t scale well &#8212; every software package that wanted to use liboil typically required several new liboil functions, and then someone would need to actually write assembly code for those functions [...]]]></description>
			<content:encoded><![CDATA[<p>Lately, I&#8217;ve been working on a side project called <a href="http://cgit.freedesktop.org/~ds/orc">Orc</a> as a replacement for <a href="http://liboil.freedesktop.org/wiki/">liboil</a>.  Liboil&#8217;s first major problem has always been that it doesn&#8217;t scale well &#8212; every software package that wanted to use liboil typically required several new liboil functions, and then someone would need to actually write assembly code for those functions on several architectures.  My original plan was to develop a critical mass of functions, and then additions would be &#8220;simple&#8221;.  This never happened.  The second major problem is that liboil&#8217;s compilation is terribly fragile.  Thousands of lines of inline assembly code that depends on specific compilers, compiler versions, libtool internals, and random snippets of code such as &#8220;if $user != msmith&#8221; do not lead to a maintainable project.</p>
<p>Orc is now to the point where it can not only reproduce about 90% of the code that is currently in liboil, but also generate 90% of the code that <em>should</em> be in liboil, but nobody ever wrote.  At runtime.  And the Orc language allows you to describe your own liboil-style functions.  At runtime.  Or, you can also use it like a normal compiler, converting Orc language source into N different assembly source files for every possible vector instruction set combination.</p>
<p>A large part of the decoding path in Schroedinger has been converted to optionally use Orc, where speed is either slightly faster or 20-30% faster than the previous liboil code.  The real benefit is that takes only a few minutes to convert code that took weeks to develop originally.  A side project of mine, <a href="http://cgit.freedesktop.org/~ds/cog">Cog</a>, has turned into a showcase for Orc, with demonstrations of video processing <a href="http://gstreamer.net/">GStreamer</a> elements, such as format and colorspace conversion and scaling.  I&#8217;ve found that since it is so easy and fast to create vectorized code, it now becomes possible to offer additional features to users, such as quality vs. speed tradeoffs.</p>
<p>Orc can generate code for MMX and SSE on x86 and x86_64, and Altivec on PowerPC, as well as NEON for ARM and c64x+DSP code.  The NEON and c64x+ backends are not currently open source.</p>
<p><a href="http://www.schleef.org/orc/download/">Download 0.4.0</a>.  <a href="http://www.schleef.org/orc/documentation/">Online documentation</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.schleef.org/blog/2009/05/31/orc-040/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Entropy Wave</title>
		<link>http://www.schleef.org/blog/2009/04/27/entropy-wave/</link>
		<comments>http://www.schleef.org/blog/2009/04/27/entropy-wave/#comments</comments>
		<pubDate>Mon, 27 Apr 2009 19:28:44 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[entropywave]]></category>
		<category><![CDATA[video]]></category>

		<guid isPermaLink="false">http://www.schleef.org/blog/2009/04/27/entropy-wave/</guid>
		<description><![CDATA[I see Christian outed my new company, Entropy Wave.  The mission of the new company is to create video post-production tools using open media technology for a wide range of users, including high-end studios, professional video editors, and hobbyists.  Most of our products will be based on open-source code, including projects I&#8217;ve been heavily involved [...]]]></description>
			<content:encoded><![CDATA[<p>I see <a href="http://blogs.gnome.org/uraeus/2009/04/27/transmageddon-07-released/">Christian</a> outed my new company, <a href="http://entropywave.com/">Entropy Wave</a>.  The mission of the new company is to create video post-production tools using open media technology for a wide range of users, including high-end studios, professional video editors, and hobbyists.  Most of our products will be based on open-source code, including projects I&#8217;ve been heavily involved with such as <a href="http://gstreamer.freedesktop.org/">GStreamer</a>, <a href="http://diracvideo.org/">Schroedinger</a>, Orc, and various <a href="http://xiph.org/">Xiph</a> projects.</p>
<p>Existing and upcoming products include:</p>
<ul>
<li>A GStreamer-based <a href="http://entropywave.com/products/entropy-wave-media-sdk/">Media SDK</a> that allows developers to rapidly create and deploy applications on major platforms (Windows, Linux, OS/X)</li>
<li>QuickTime plugins for DiracPro (SMPTE VC-2)</li>
<li>A <a href="http://entropywave.com/products/entropy-wave-encoder/">video encoder application</a> geared toward content producers putting video on the web</li>
<li>A capture application compatible with <a href="http://www.numediatechnology.com/">Numedia</a>&#8217;s line of DiracPro hardware encoders</li>
</ul>
<p>In addition, Entropy Wave can provide support and custom development services in a variety of areas including open media.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.schleef.org/blog/2009/04/27/entropy-wave/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Notes about low latency video</title>
		<link>http://www.schleef.org/blog/2009/03/27/notes-about-low-latency-video/</link>
		<comments>http://www.schleef.org/blog/2009/03/27/notes-about-low-latency-video/#comments</comments>
		<pubDate>Fri, 27 Mar 2009 21:36:28 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[video]]></category>

		<guid isPermaLink="false">http://www.schleef.org/blog/2009/03/27/notes-about-low-latency-video/</guid>
		<description><![CDATA[I recently read a few rather extraordinary marketing claims from OnLive about their new server-side gaming technology.  Since they don&#8217;t really make much technical information available, one is left to speculate what they mean by &#8220;1 ms&#8221; latency, especially when it is directly compared to &#8220;500 ms to 750 ms lag&#8221; in video conferencing.  I&#8217;m [...]]]></description>
			<content:encoded><![CDATA[<p>I recently read a few rather extraordinary marketing claims from OnLive about their new server-side gaming technology.  Since they don&#8217;t really make much technical information available, one is left to speculate what they mean by &#8220;1 ms&#8221; latency, especially when it is directly compared to &#8220;500 ms to 750 ms lag&#8221; in video conferencing.  I&#8217;m sure that statement makes sense from someone&#8217;s perspective, just not from a video compression perspective.</p>
<p>I sat around over a beer last evening and wrote the following to the Schrödinger mailing list, because someone asked. Rather than answering the question, I decided to talk andomly about how low-latency video encoding works.</p>
<p>One key point about low latency video encoding is that the output bits that represent the pixel have to exist somewhere in the bitstream between the time the encoder gets the pixel from the camera, and N ms later, where N is the latency.</p>
<p>One method of very low-latency compression works on a scanline basis. An example is the low-delay profile of Dirac.  A camera reads out a few scan lines (say, 16), the encoder compresses them, and then sends those bits out over ethernet or ASI or whatever.  The latency is on the order of a few scan lines, say 16*2 + a small number.  Why 16*2? Because it takes 16 lines to read in the 16 line chunk, then spends the<br />
time that it takes to read in the next chunk to encode the first chunk and send it out over the wire.  Simultaneously, the decoder reads in the data and decodes.  Then during the third set of 16 lines, the decoder scans out the uncompressed lines.  So the decoder scans out line 0 as the camera is scanning out line 32.  Real encoders need a bit of extra time for synchronization, so 32 is ideal.  Of course, in a real system there is network latency, but we&#8217;ll make someone else worry about that. 32 lines works out to be abous 1 ms for 1080p at 30 frames per second, depending on exactly the system you&#8217;re using.  Compression ratios are purposefully low, since you can&#8217;t spread around worst-case bits at all, and because this kind of compression is only really useful for studio work.</p>
<p>Note that camera that has a few-scanline latency start at USD 10,000 and an encoder/decoder pair for DiracPro is about USD 4,000, IIRC. This is not the kind of technology you roll out in a consumer product.</p>
<p>Another method is similar, but using an entire frame instead of a few scan lines.  In this case, you get a theoretical latency of 2 frames, or about 60 ms for 30 fps video.  I&#8217;ve seen companies advertising encoder/decoder pairs that claim 70 ms latency (of course, without any network latency), and I can pretty much believe this number.  Again, you can&#8217;t get away with cheap hardware &#8212; my DV camera has an internal latency somewhere between 90 and 120 ms, and HDV cameras are much worse.</p>
<p>In a frame-based low-latency system, it&#8217;s much more realistic to use motion compensation, in which you use the previous one or two frames as reference pictures.  Since the general point of using motion compensation is to decrease the bit rate, this causes compression artifacts immediately after scene changes that clear up after a few<br />
frames, and is very characteristic of the technique.</p>
<p>Due to the way that Dirac puts together pictures, the non-low-delay profiles of Dirac has a approximate latency of 4 pictures for a simple implementation, although you can decrease this to nearly 2 pictures with more complex algorithms.  Schroedinger implements the simple algorithm, and with suitable modifications (it does not do<br />
this by default) you can get close to 4 frames latency.  Schro&#8217;s implementation of Low-Delay Profile is also 4 frames, since it uses the same code.</p>
<p>Entropy Wave has implementations of the more complex algorithm for Simple and Intra profiles, as well as an actual low delay implementation of Low-Delay profile, with latencies that are very near the theoretical latencies.  These are not open source.  Unfortunately, since all the code that currently can use these codecs is frame based, there&#8217;s very minor advantage over Schroedinger unless you write a bunch of custom<br />
code.</p>
<p>It should be obvious at this point that the &#8220;1 ms&#8221; number has very little to do with video compression, and a lot more to do with how game engines work.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.schleef.org/blog/2009/03/27/notes-about-low-latency-video/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
