The first two weeks of uni have passed and they’ve been full on to say the least. After the first session of studio, our task was to optimise a raytracer and we were given a week to do so. My first thought was to go down the GPU route but I decided that optimising on the CPU side would probably be more fun because there would be more varied ways to go about it. The first thing I did was convert the renderable list to be SOA, which was fairly easy to do as all renderables that the raytracer was capable of drawing were spheres. Next, I converted the code for sphere-ray intersection tests to use SSE or AVX. Because the important data was already SOA, loading said data into SIMD registers was pretty straight forward. It also meant that I could test intersections of upto eight spheres at a time. These optimisations about halved the render time.
The next thing I did was spatial partitioning using a kD-tree. I’d never done spatial partitioning before but I managed to get it up and running pretty quickly. The hardest part was rearranging my SOA structures so that I could continue to check eight intersections at a time. My solution was to allow each leaf of the kD-tree to manage upto eight spheres and have it populate my SOA structures in chunks of eight. Each leaf then held an index into the SOA structures and a count as it was never guaranteed that a leaf would have exactly eight children. From memory, every second node had seven instead of eight. As a consequence, this also meant that my SOA structures had to contain padding in order to continue using aligned intrinsics (Out of about 61kB worth of SOA structures, 4k was padding). Then in my sphere-ray intersection function, I added a mask vector that masked out the padding values in the data so that I wouldn’t get sphere intersections where there shouldn’t be.
Finally I multithreaded it with an OpenMP parallel for in the main loop. After all that I managed to get rendering down to about 1% of the time. Which I’m pretty happy with. Though there are plenty of things that I could have done that I just didn’t get around to. For example most, if not all of the code relating to reflections and shadows remained untouched. Shadow calculations used the same ‘trace’ function that the reflection code used so for each pixel in shadow, there were several rays cast that were essentially thrown away. Also, each sphere had it’s own ‘Material’ but in the provided scene, there were only two different kinds of material. This meant that a large chunk of memory was devoted to duplicates. If it were modified so that spheres shared materials, then the entirety of the material data could be kept in cache.