Performance

Bounding Volume Hierarchy (BVH)

Three different BVH build methods were compared: mid point, binning SAH (Surface Area Heuristic), and full sweep SAH. The constructor allows selecting between these build methods, as well as the number of bins if using binning SAH. The default method used is sweep SAH since this unsurprisingly gives the best performance for large numbers of rays.

These build methods were profiled as part of assignment 3 using a high triange count model of a terracota warrior filling the frame (see the image above). This resulted in the following observations (bin SAH method used 12 bins):

Resolution
Mid-point Sweep SAH Bin SAH
Build (s) Render (s) Build (s) Render (s) Build (s) Render (s)
640 x 480 23 7 97 5 22 4
1280 x 960 25 29 97 21 29 23
2560 x 1920 25 119 96 83 29 89
6400 x 4800 26 733 95 524 32 583
12800 x 9600 26 2967 101 2050 30 2233

This illustrates the benefits of sweep SAH over the other methods when rendering over trivially small resolutions. These measurements are for a single sample per pixel and with higher numbers of samples per pixel, more light sources, and the addition of volumetrics, the benefit of sweep SAH will definitely be worth the higher build cost.

Lighting Hacks

The particulars of this scene demanded some massaging of the lighting system to meet the required performance levels. Two specific hacks are employed that violate physics: light hit short circuiting and volume inscatter short circuiting. Both these hacks avoid firing rays when the amount of light reaching the point would be insignificant or deteriorate the scene.

The first hack involves allowing a light to set LightHit::nolight to true if the light will either not cast light on the specific point or the radiance amount will be insignificant. This violates the physics of light and biases the final image. This bias effect could be avoided by using Monte Carlo methods, but it looks fine and meets requirements without needing this level of detail in the engine. This hack is employed to great success by the BeamAreaLight which is used in the windows for avoiding excess checks against the area lights used to cast soft window shadows (which would not even introduce light into the scene). It does this by short circuiting all light hits falling outside a certain angle from the normal. This was very simple to implement and reduces the number of rays fired by billions (e.g. in a 1920x1440 render with 70 samples per pixel and 30 volume march steps, the number of rays not fired was just over 4.9 billion) for no visual cost.

The second hack involves allowing a light to set LightHit::noinscatter to true if the light should not inscatter off volumes. This violates the physics of light, but is necessary to achieve the desired final look of the scene. Without this hack, the candles placed in the alcoves for lighting the altars would cause highlights on the volume intended only to show the light shafts from the windows. This is used only by HackPointLight which also incorporates the first hack. Personally, I find this hack much more egregious than the last, but my vision demanded it to achieve the desired artistic result.

Profiling

This is a cautionary tale about the importance of profiling code. While writing Instance I had placed inverting the transformation matrix in the intersection routine. I am absolutely certain I meant to move/remove this once I had the Instance system working since it is a complete waste to do every intersection routine, but I forgot until I used valgrind to profile my code:

This was from a 640x480 render with 1 sample per pixel and 1 volume march point (since profiling with valgrind takes forever). And that is 4 constant matrices for the windows being inverted about 3 million times each significantly contributing to 3.6 billion matrix indexing calls.

Computing the inverse prior to using it while intersecting more than halved the rendering time. From 10 hours with 60 samples per pixel to 5 hours with 70 samples per pixel with the same number of march points. That improvement was almost entirely due to this change (some other minor improvements were also included in this change). I am very glad I profiled my code and wish I had done it much sooner. I believe that I should never trust myself to implement something without taking a second look at it removed from a deadline.

Memory

A concerted effort is made to not be a savage. All memory allocated with new is released with delete and automatic variables are preferred where possible. Some destructors may need to be reviewed since they might not be triggered correctly due to missing base class virtual destructors, but care is taken where many allocations are concerned (e.g. ~BVH needs to recursively deconstruct all nodes and its destructor is called).