I finally moved my collection of Orc-based GStreamer plugins (codename “Cog”) into gst-plugins-bad, since they’re moved on from being an experiment. Orc is a runtime compiler for a simple cross-platform assembly-like language that specifically targets SIMD instructions for several processors. Orc is very effective inside it’s domain, which is small but growing.
One such application that is covered is chroma subsampling and color matrixing for video, semi-incorrectly referred to as “colorspace conversion” in GStreamer. There has been a colorspace element in Cog (cogcolorspace) for some time, but I never really bothered to do any speed comparisons between it and the default GStreamer colorspace element (ffmpegcolorspace), which is based on code copied from FFMpeg. However, recently I did, and was somewhat surprised (although I shouldn’t have been) that cogcolorspace is the same speed as, or much faster than, ffmpegcolorspace for almost all operations. (Please note that the FFMpeg code was forked a long time ago and heavily modified, so it does not reflect FFMpeg itself, only GStreamer’s ffmpegcolorspace.)
This is a scatter plot of the run time (in ms) for converting 1000 frames of 320×240 video between a variety of uncompressed video formats:

The axes are execution time (in ms), with cogcolorspace on the horizontal axis and ffmpegcolorspace on the vertical axis. The green line represents same execution time, thus for points below the line, ffmpegcolorspace was faster, for those above, cogcolorspace was faster. Most of the points clustered around the green line are statistically the same as the green line, since my timing method is quite crude. Things to observe from this graph are that 1) many cases are very similar in speed, indicating that both ffmpegcolorspace and cogcolorspace are using similar code paths, 2) some cases, cogcolorspace is a lot faster, probably indicating that there isn’t an assembly fast path in ffmpegcolorspace for that conversion, and 3) a few cases (which, not coincidentally, are the most heavily used cases) ffmpegcolorspace is slightly faster than cogcolorspace.
The conclusions to draw from this are that 1) by writing very generic code with Orc, you can get very similar results to hand-crafted assembly code, and 2) a developer can cover a lot more cases with a small amount of work, and 3) there are a few cases where special-case Orc code would be beneficial.
This is only the low quality mode that cogcolorspace supports, which is similar or identical in quality to ffmpegcolorspace. Higher-quality conversion is also implemented in most cases, and is only slightly slower in speed. This is the real advantage of Orc — Orc takes care of huge number of combinations of options, and produces good SIMD code for all of them.
