I finally moved my collection of Orc-based GStreamer plugins (codename “Cog”) into gst-plugins-bad, since they’re moved on from being an experiment. Orc is a runtime compiler for a simple cross-platform assembly-like language that specifically targets SIMD instructions for several processors. Orc is very effective inside it’s domain, which is small but growing.
One such application that is covered is chroma subsampling and color matrixing for video, semi-incorrectly referred to as “colorspace conversion” in GStreamer. There has been a colorspace element in Cog (cogcolorspace) for some time, but I never really bothered to do any speed comparisons between it and the default GStreamer colorspace element (ffmpegcolorspace), which is based on code copied from FFMpeg. However, recently I did, and was somewhat surprised (although I shouldn’t have been) that cogcolorspace is the same speed as, or much faster than, ffmpegcolorspace for almost all operations. (Please note that the FFMpeg code was forked a long time ago and heavily modified, so it does not reflect FFMpeg itself, only GStreamer’s ffmpegcolorspace.)
This is a scatter plot of the run time (in ms) for converting 1000 frames of 320×240 video between a variety of uncompressed video formats:
The axes are execution time (in ms), with cogcolorspace on the horizontal axis and ffmpegcolorspace on the vertical axis. The green line represents same execution time, thus for points below the line, ffmpegcolorspace was faster, for those above, cogcolorspace was faster. Most of the points clustered around the green line are statistically the same as the green line, since my timing method is quite crude. Things to observe from this graph are that 1) many cases are very similar in speed, indicating that both ffmpegcolorspace and cogcolorspace are using similar code paths, 2) some cases, cogcolorspace is a lot faster, probably indicating that there isn’t an assembly fast path in ffmpegcolorspace for that conversion, and 3) a few cases (which, not coincidentally, are the most heavily used cases) ffmpegcolorspace is slightly faster than cogcolorspace.
The conclusions to draw from this are that 1) by writing very generic code with Orc, you can get very similar results to hand-crafted assembly code, and 2) a developer can cover a lot more cases with a small amount of work, and 3) there are a few cases where special-case Orc code would be beneficial.
This is only the low quality mode that cogcolorspace supports, which is similar or identical in quality to ffmpegcolorspace. Higher-quality conversion is also implemented in most cases, and is only slightly slower in speed. This is the real advantage of Orc — Orc takes care of huge number of combinations of options, and produces good SIMD code for all of them.


I looked at the graph before reading the text and was thinking “wow – that line of best fit doesn’t really match the data”. Orc sounds very cool
ffmpegcolorspace has fairly optimised C code, but still no assembler/simd stuff at all – it’ll be nice to get the cog bits used more broadly as quickly as we can
our ffmpegcolorspace element is a pure C one, there are no hand crafted assembler parts.
Have you compared to libswscale? It is faster than FFmpeg’s imgresample and imgconvert stuff from days of yore and has deprecated said old APIs for resampling and converting between colourspaces. Though the API isn’t very nice.