Paul Engine

Progress Report: Deferred Rendering and Render Passes in Paul Engine

Date published: 27/07/2025

Profiling

You might have noticed the various calls to PE_PROFILE_FUNCTION() scattered around the code. This is a simple scope timer that can give us an idea as to how long we are spending in each section of the code. I won't lie, I'm not expecting performance to be great here, but let's see how we did. The GPU timings are what is reported when inspecting the frame in RenderDoc, including the time spent in the draw call(s) and other GPU commands contained in the render pass.

And before you get upset, these performance figures are in no way meant to represent an actual analysis of which rendering technique is more performant. Furthermore, the deferred renderer used does not actually take advantage of the fact that it can render more dynamic lights and stays within the same limits used in the forward renderer.

Deferred Pipeline
Render Pass CPU Time Taken (ms) GPU Time Taken (ms)
Geometry Pass 0.583 0.79667
Direct Lighting Pass 0.058 0.56832
Indirect Lighting Pass 0.098 0.18022
Total 0.739 1.54521

Zooming in on the geometry pass, 0.279ms was spent in the SubmitMesh scope and 0.280ms was spent in Renderer::EndScene(). I already have some optimisations in mind for the process of submitting meshes in the Renderer class, and I mentioned some ways the described deferred renderer could be taking a performance hit in terms of iterating over a mixed loop of meshes which may or may not be compatible with the renderer. Since implementation performance isn't my main concern right now (focussing mainly on performance impacted by design), this is fine for now.

Now let's compare to our forward renderer:

Forward Pipeline
Render Pass CPU Time Taken (ms) GPU Time Taken (ms)
Scene 3D Pass 0.641 1.36192

Definitely some room for improvement in both of the render pipeline implementations. But how did our FrameRenderer hold up?

Frame Renderer
Function CPU Time Taken (ms)
RenderFrame (deferred) 3.973
RenderFrame (self time) 0.069
AddRenderPass range(0.001 - 0.004)

In the deferred renderer, the entire RenderFrame took 3.973ms to complete on the CPU. However, most of that time was spent in the render pass logic that we defined. As seen with the "(self time)" variation, 0.069ms was spent in the RenderFrame function itself, meaning that most of the complexity is down to the implementation of the renderer. I did mention before a few optimisations for the RenderFrame function which could help further reduce the 0.069ms spent there.

With the somewhat grisly performance laid out, it will be interesting to check in on this in the future and see how it compares with an optimised version.

If you're curious about some of the shaders used, you can see them here:

Outro

With all of that said, that marks the end of the first progress report on the development of Paul Engine. This step forward in engine progress was very important, allowing me to begin exploring deeper graphics features such as screen space reflections. As well as opening up the possibility of implementing new render pipelines, like a clustered forward renderer (also known as Forward Plus), in the future.