Vash has been in development for just over one and a half years, moving between a personal side project, a low intensity skunkworks project, and a test platform for new technologies. While the majority of my time on it has been spent trying to find the ideal parameters to create images that are the right combination of visually pleasing, distinctive, and memorable, I also spent quite a bit of effort searching for Vash's Goldilocks platform.
The problem is twofold. On one hand, Vash's algorithm can be very computationally expensive, so we needed a fast language and platform. At the same time, Vash is a complex beast and the code really wants to make use of several different programming paradigms if it's going to be well organized. Some platforms were just too slow, some were too ugly, and others were unusably verbose. Sadly, none were "just right." We did not end up with an ideal platform: such a thing does not exist, at least not yet. This blog story is not about the platform we did end up using. Instead, it is about one of our more surprising failures.
Early in our endeavors, when it became clear that we needed more speed, we looked in the natural place to do image processing: the GPU (Graphics Processing Unit). I spent a couple of days integrating PyOpenCL (a wonderful framework that I highly recommend) with our existing codebase and rewriting the node implementations in OpenCL. Normally, we would express the computation as a tree walk, but GPUs don't really do branching, at least not at that depth. Instead, we walked the tree on the CPU to emit OpenCL code to call each function in turn and combine their results into a final image, all on the GPU.
The GPU programs that this process produced ran fast -- really, really fast. In total, it was able to generate Vash images about 100-500x faster than the equivalent C program. Sadly though, this isn't the whole story. AMD's and Nvidia's OpenCL compilers are optimizing compilers: they produce astonishingly fast code, but they take half an eternity to do so. Since we don't know what computations we are doing until we get data to hash, we can't precompile the programs -- to generate an image with this technique took about 50-100ns of GPU compute time... with about 20-60 seconds of compilation on the CPU leading up to that. Unfortunately, this turned out to be our slowest implementation, and an edifying lesson in hidden costs.
Of course, there are other ways to use the GPU that would not have this frontloaded overhead, even if they are less optimal. We eventually decided to do without a GPU implementation for other reasons. That, however, is a different blog post.
Thursday, July 14, 2011
Wednesday, July 13, 2011
Introducing Vash 1.0
Over the last year and a half Vash has been a side project, a component of a larger product, something rewarding to hack on, and something fun to show friends. While there are other visual hashes and seeded-random image generators out there, we think that Vash is special, and there is nobody in our office who hasn't found themselves sitting and watching Vash images cycle by.
It's distinctive, it's consistent, it's reliable, and it's long past time we got Vash out into the world where other people can play with it too.
It's AGPLv3, grab it on our downloads page or visit Vash on GitHub.
If you want a commercial license, or custom work, both are also available.
What's it good for? We have a few ideas, but I'm confident that they'll look pretty tame compared to whatever it is you're coming up with right now.
It's distinctive, it's consistent, it's reliable, and it's long past time we got Vash out into the world where other people can play with it too.
It's AGPLv3, grab it on our downloads page or visit Vash on GitHub.
If you want a commercial license, or custom work, both are also available.
What's it good for? We have a few ideas, but I'm confident that they'll look pretty tame compared to whatever it is you're coming up with right now.
Subscribe to:
Posts (Atom)