The git repository for Orc has moved to code.entropywave.com, where it will also likely obtain an actual web page soon. code is a new website for open-source and free software projects sponsored by Entropy Wave.
Archive for the ‘orc’ Category
Orc moved to code.entropywave.com
Friday, October 2nd, 2009Cog in gst-plugins-bad
Saturday, September 19th, 2009I finally moved my collection of Orc-based GStreamer plugins (codename “Cog”) into gst-plugins-bad, since they’re moved on from being an experiment. Orc is a runtime compiler for a simple cross-platform assembly-like language that specifically targets SIMD instructions for several processors. Orc is very effective inside it’s domain, which is small but growing.
One such application that is covered is chroma subsampling and color matrixing for video, semi-incorrectly referred to as “colorspace conversion” in GStreamer. There has been a colorspace element in Cog (cogcolorspace) for some time, but I never really bothered to do any speed comparisons between it and the default GStreamer colorspace element (ffmpegcolorspace), which is based on code copied from FFMpeg. However, recently I did, and was somewhat surprised (although I shouldn’t have been) that cogcolorspace is the same speed as, or much faster than, ffmpegcolorspace for almost all operations. (Please note that the FFMpeg code was forked a long time ago and heavily modified, so it does not reflect FFMpeg itself, only GStreamer’s ffmpegcolorspace.)
This is a scatter plot of the run time (in ms) for converting 1000 frames of 320×240 video between a variety of uncompressed video formats:
The axes are execution time (in ms), with cogcolorspace on the horizontal axis and ffmpegcolorspace on the vertical axis. The green line represents same execution time, thus for points below the line, ffmpegcolorspace was faster, for those above, cogcolorspace was faster. Most of the points clustered around the green line are statistically the same as the green line, since my timing method is quite crude. Things to observe from this graph are that 1) many cases are very similar in speed, indicating that both ffmpegcolorspace and cogcolorspace are using similar code paths, 2) some cases, cogcolorspace is a lot faster, probably indicating that there isn’t an assembly fast path in ffmpegcolorspace for that conversion, and 3) a few cases (which, not coincidentally, are the most heavily used cases) ffmpegcolorspace is slightly faster than cogcolorspace.
The conclusions to draw from this are that 1) by writing very generic code with Orc, you can get very similar results to hand-crafted assembly code, and 2) a developer can cover a lot more cases with a small amount of work, and 3) there are a few cases where special-case Orc code would be beneficial.
This is only the low quality mode that cogcolorspace supports, which is similar or identical in quality to ffmpegcolorspace. Higher-quality conversion is also implemented in most cases, and is only slightly slower in speed. This is the real advantage of Orc — Orc takes care of huge number of combinations of options, and produces good SIMD code for all of them.

Orc-0.4.0
Sunday, May 31st, 2009Lately, I’ve been working on a side project called Orc as a replacement for liboil. Liboil’s first major problem has always been that it doesn’t scale well — every software package that wanted to use liboil typically required several new liboil functions, and then someone would need to actually write assembly code for those functions on several architectures. My original plan was to develop a critical mass of functions, and then additions would be “simple”. This never happened. The second major problem is that liboil’s compilation is terribly fragile. Thousands of lines of inline assembly code that depends on specific compilers, compiler versions, libtool internals, and random snippets of code such as “if $user != msmith” do not lead to a maintainable project.
Orc is now to the point where it can not only reproduce about 90% of the code that is currently in liboil, but also generate 90% of the code that should be in liboil, but nobody ever wrote. At runtime. And the Orc language allows you to describe your own liboil-style functions. At runtime. Or, you can also use it like a normal compiler, converting Orc language source into N different assembly source files for every possible vector instruction set combination.
A large part of the decoding path in Schroedinger has been converted to optionally use Orc, where speed is either slightly faster or 20-30% faster than the previous liboil code. The real benefit is that takes only a few minutes to convert code that took weeks to develop originally. A side project of mine, Cog, has turned into a showcase for Orc, with demonstrations of video processing GStreamer elements, such as format and colorspace conversion and scaling. I’ve found that since it is so easy and fast to create vectorized code, it now becomes possible to offer additional features to users, such as quality vs. speed tradeoffs.
Orc can generate code for MMX and SSE on x86 and x86_64, and Altivec on PowerPC, as well as NEON for ARM and c64x+DSP code. The NEON and c64x+ backends are not currently open source.
