Frame Pacing in a Very Simple Scene

I have recently integrated the Tracy profiler into my engine and it has been a great help to be able to visualize how the CPU and GPU are interacting. Even though what is being rendered is as embarrassingly simple as possible there were some things I had to fix that weren’t behaving as I had intended. Until I saw the data visualized, however, I wasn’t aware that there were problems! I have also been using PIX for Windows, NSight, RenderDoc, and gpuview, but Tracy has really been useful in terms of presenting the information across multiple frames in a way that I can customize to see the relationships that I have wanted to see. I thought that it might be interesting to post about some of the issues with screenshots from the profiler while things are still simple and relatively easy to understand.

Visualizing Multiple Frames

Below is a screenshot of a capture from Tracy:

I have zoomed in at a level where 5 full frames are visible, with a little bit extra at the left and right. You can look for Frame 395, Frame 396, Frame 397, Frame 398, and Frame 399 to see where the frames are divided. These frame boundaries are explicitly marked by me, and I am doing so in a thread dedicated to waiting for IDXGIOutput::WaitForVBlank() and marking the frame; this means that a “frame” in the screenshot above indicates a specific frame of the display’s refresh cycle.

There is a second frame visualization at the top of the screen shot where there are many green and yellow rectangles. Each one of those represents the same kind of frames that were discussed in the previous paragraph, and the purple bar shows where in the timeline I am zoomed into (it’s hard to tell because it’s so small but there are 7 bars within the purple section, corresponding to the 1 + 5 + 1 frames visible at the level of zoom).

In addition to marking frames Tracy allows the user to mark what it calls “zones”. This is a way to subdivide each frame into separate hierarchical sections in order to visualize what is happening at different points in time during a frame. There are currently three threads shown in the capture:

  • The main thread (which is all that my program currently has for doing actual work)
  • An unnamed thread which is the vblank heartbeat thread
  • GPU execution, which is not a CPU thread but instead shows how GPU work lines up with CPU work

In order to try and help me make sure that I was understanding things properly I have color coded some zones specifically according to which swap chain texture is relevant. At the moment my swap chain only has two textures (meaning that there is only a single back buffer at any one time and the two textures just get toggled between being the front buffer or back buffer any time a swap happens) and they are shown with DarkKhaki and SteelBlue. In the heartbeat thread the DisplayFrontBuffer zone is colored according to which texture is actually being displayed during that frame (actually this is not true because of the Desktop Windows Manager compositor, but for the purposes of this post we will pretend that it is true conceptually).

I have used the same colors in the main CPU thread to show which swap chain texture GPU commands are being recorded and submitted for. In other words, the DarkKhaki and SteelBlue colors identify a specific swap chain texture, the heartbeat thread shows when that texture is the front buffer, and the main thread shows when that texture is the back buffer. At the current level of zoom it is hard to read anything in the relevant zones but the colors at least give an idea of when the CPU is doing work for a given swap chain texture before it is displayed.

Unfortunately for this post I don’t think that there is a way to dynamically modify the colors of zones in the GPU timeline (instead it seems to be a requirement that are known at compile time) and so I can’t make the same visual correspondence. From a visualization standpoint I think it would be nice to show some kind of zone for the present queue (using Windows terminology), but even without that it can be understood implicitly. I will discuss the GPU timeline more later in the post when things are zoomed in further.

With all of that explanation let me show the initial screenshot again:

Hopefully it makes some kind of sense now what you are looking at!

Visualizing a Single Frame

Let us now zoom in further to just look at a single frame:

During this Frame 326 we can see that the DarkKhaki texture is being displayed as the front buffer. That means that the SteelBlue texture is the back buffer, which is to say that it is the texture that must be modified so that it can then be shown during Frame 327.

Look at the GPU timeline. There is a very small OliveDrab zone that shows work being done on the GPU. That is where the GPU actually modifies the SteelBlue back buffer texture.

Now look at the CPU timeline. There is a zone called RenderGraphicsFrameOnCpu which is where the CPU is recording the commands for the GPU to execute and then submitting those commands (zones are hierarchical, and so the zones below RenderGraphicsFrameOnCpu are showing it subdivided even further). The color is SteelBlue, and so these GPU commands will modify the texture that was being displayed in Frame 395 and that will again be displayed in Frame 397. You may notice that this section starts before the start of Frame 396, while the SteelBlue texture is still the front buffer and thus is still being displayed! In order to better understand what is happening we can zoom in even further:

Compare this with the previous screenshot. This is the CPU work being done at the end of Frame 365 and the beginning of Frame 366, and it is the work that will determine what is displayed during Frame 367.

The work that is done can be thought of as:

  • On the CPU, record some commands for the GPU to execute
  • On the CPU, submit those commands to the GPU so that it can start executing them
  • On the CPU, submit a swap command to change the newly-modified back buffer into the front buffer at the next vblank after all GPU commands are finished executing
  • On the GPU, execute the commands that were submitted

It is important that the GPU doesn’t start executing any commands that would modify the SteelBlue swap chain texture until that texture becomes the back buffer (and is no longer being displayed). The WaitForSwap zone shows where the CPU is waiting for the swap to happen before submitting the commands (which triggers the GPU to start executing the commands). There is no reason, however, that the CPU can’t record commands ahead of time, as long as those commands aren’t submitted to the GPU until the SteelBlue texture is ready to be modified. This is why the RenderGraphicsFrameOnCPU zone can start early: It records commands for the GPU (you can see a small OliveDrab section where this happens) but then must wait before submitting the commands (the next OliveDrab section).

How early can the CPU start recording commands? There are two different answers to this, depending on how the application works. The simple answer (well, “simple” if you understand D3D12 command allocators) is that recording can start as soon as the GPU has finished executing the commands that were previously submitted that were saved in the memory that the new recording is going to reuse. There is a check for this in my code that is so small that it can only be seen if the profiler is zoomed in even further

The reason that this wait is so short is because the GPU work being done is so simple that it reached the submitted swap long before the CPU checked to make sure.

Do you see that long line between executing the GPU commands and then recording new ones on the CPU? With the small amount of GPU work that my program is currently doing (clearing the texture and then drawing two quads) there isn’t anything to wait for by the time I am ready to start recording new commands.

If you’ve been following you might be asking yourself why I don’t start recording GPU commands even sooner. Based on what I’ve explained above the program could be even more efficient and start recording commands as soon as the GPU was finished executing the previous commands, and this would definitely be a valid strategy with the simple program that I have right now:

This is a capture that I made after I modified my program to record new GPU commands as soon as possible. The WaitForPredictedVblank zone is gone, the WaitForGpuToReachSwap zone is now visible at this level of zoom, and the WaitForSwap zone is now bigger. The overlapping of DarkKhaki and SteelBlue is much more pronounced because the CPU is starting to work on rendering a new version of the swap chain texture as soon as that swap chain texture is displayed to the user as a front buffer (although notice that the commands still aren’t submitted to the GPU until after the swap happens and the texture is no longer displayed to the user). Based on my understanding this kind of scheduling probably represents something close to the ideal situation if 1) a program wants to use vsync and 2) knows that it can render everything fast enough within one display refresh and 3) doesn’t have to worry about user input.

The next section explains what the WaitForPredictedVblank is for and why user input makes the idealized screenshot above not as good as it might at first seem.

When to Start Recording GPU Commands

Earlier I said that there were two different answers to the question of how early the CPU can start recording commands for the GPU. In my profile screenshots there is a DarkRed zone called WaitForPredictedVblank that I haven’t explained yet, but we did observe that it could be removed and that doing so allowed even more efficient scheduling of work. This WaitForPredictedVblank zone is related to the second alternate answer of when to start recording commands.

My end goal is to make a game, which means that the application is interactive and can be influenced by the player. If my program weren’t interactive but instead just had to render predetermined frames as efficiently as possible (something like a video player, for example) then it would make sense to start recording commands for the GPU as soon as possible (as shown in the previous section). The requirement to be interactive, however, makes things more complicated.

The results of an interactive program are non-deterministic. In the context of the current discussion this can be thought of as an additional constraint on when commands for the GPU can start being recorded, which is so simple that it is kind of funny to write out: Commands for the GPU to execute can’t start being recorded until it is known what the commands for the GPU to execute should be. The amount of time between recording GPU commands and the results of executing those commands being displayed has a direct relationship to the latency between a user providing input and the user seeing the result of that input on a display. The later that the contents of a rendered frame are determined the less latency the user will experience.

All of that is a long way of explaining what the WaitForPredictedVblank zone is: It is a placeholder in my engine for dealing with game logic and simulation updates. I can predict when the next vblank is (see the Syncing without VSync post for more details), and I am using that as a target for when to start recording the next frame. Since I don’t actually have any work to do yet I am doing a Sleep() in Windows, and since the results of sleeping have limited precision I only sleep until relatively close to the predicted vblank and then wait on the more reliable swap chain waitable object (this is the WaitForSwap zone):

(Side note: Being able to visualize this in the instrumented profile gives more evidence that my method of predicting when the vblank will happen is pretty reliable, which is gratifying.)

The next step will be to implement simulation updates using fixed timesteps and then record GPU commands at the appropriate time, interpolating between the two appropriate simulation updates. That will remove the big WaitForPredictedVblank, and instead there will be some form of individual simulation updates which should be visible.

Conclusion

If you’ve made it this far congratulations! I will show the initial screenshot one more time, showing the current state of my engine’s rendering and how work for the GPU is scheduled, recorded, and submitted:

Devlog 2024-12-16

This is a regularly-occurring status update. More generally-relevant posts can be found under Features (see Creating a Game and Engine from Scratch for context).

This is the beginning of week 9.

What I have done

  • The texture is now uploaded using the copy command queue instead of the graphics command queue
  • Made a platform-independent interface for vertex buffers
  • Made a platform-independent interface for index buffers and am using it now to draw the simple square
  • Made a platform-independent interface for constant buffers and am using it now to draw two squares (where it is being treated as a per-draw constant buffer)
  • Added the Tracy profiler
    • Implemented an interface for CPU profiling that doesn’t require adding anything from Tracy in header files
    • Implemented an interface for GPU profiling that doesn’t require adding anything from Tracy in header files
    • Modified the PIX for Windows integration to work more like Tracy
  • Build system
    • Added a Lua function to resolve paths
      • This makes it easier and nicer to generate header #include files so that resolving environment variable can be done automatically
    • Fixed a tricky bug when adding multiple levels of filters in Visual Studio
    • Fixed a bug when several different C++ source files specified different ways of building that were independent of the general way specified for the target
    • Found a workaround to help Intellisense give proper x64 suggestions (instead of x86)
      • This didn’t affect the actual compiled or linked files, but it was a relief to finally have pointers and struct sizes reported correctly

Next Steps

I am getting stuck (classic “analysis paralysis”) on trying to figure out how I want frame updates to work. This is kind of where I got stuck previously when I was doing the “beam racing” stuff, and even though I finally gave up and moved on I am running into it again with a different context now while I am trying to figure out how to have the application tell the graphics system what to render and when. It would be easy to just do something simple and move on and maybe I should, but one of the personal goals that I’ve wanted to do with this project is to have fixed simulation timesteps that are independent from the fixed display refresh rate, and what I want to accomplish is pretty straightforward even though I’m getting hung up on details.

  • Implementing the Tracy profiler has been a big help, and I am now able to better visualize what my code is actually doing. I need to somehow get something in place, even if it’s just rudimentary, so that I can have simulation updates (that can run both slower and faster than the display refresh rate) and then choose the appropriate time to render an interpolation between two of them.
  • I still need to implement a way in the build system to compile shaders offline and then a way to load them at runtime, even if very simple
    • (Recall that I am avoiding the standard library and general-purpose memory management, which is why some of those seemingly-simple tasks involve more work than might be expected)
  • I want to add a camera and 3D transforms

Devlog 2024-12-09

This is a regularly-occurring status update. More generally-relevant posts can be found under Features (see Creating a Game and Engine from Scratch for context).

This is the beginning of week 8.

What I have done

I got very sick this week and so a lot of time was lost.

  • Fix precompiled header source linking
    • It turns out that on MSVC it’s not enough to use a precompiled header, but the source (i.e. CPP) file that created it must also be linked. This kind of makes sense in retrospect, but things were somehow working without doing that for a while.
  • Created a texture resource in an “upload” heap, mapped and copied texture data to it there, and then copied the resource to the shader-visible default heap
    • This all just followed the Microsoft “hello” sample
  • Create constant buffer in different ways and different places to move triangle mesh around
    • I think I am finally understanding how resource management works in D3D12, after an embarrassingly long time and an embarrassing amount of re-reading different documentation and forum posts
    • It’s still unclear to me what the best strategy to manage it all is, but at least that isn’t embarrassing because it seems intentionally flexible and so everyone is a little unclear because there are many different ways to approach it
    • Additionally, since it was originally released there are new techniques for “bindless” rendering and it seems like I will probably want to use those for ray tracing, and so there is still more to learn. One step at a time, though…

Next Steps

The sickness has thrown me off of my schedule and I also find that I am suffering with the mental symptom of wanting-everything-perfect-and-not-knowing-where-to-start with some of the problems that I have to tackle next. I am at the point where I just need to do force myself to do something to make progress, acknowledging that it might not be ideal and will require refactoring later. (I actually have pretty clear ideas of how I would naturally do some of this stuff but I am also trying to approach things in a purely “data-oriented” way for this project and that is contributing to some of my hesitancy because I wonder if my natural tendency is the result of old habits.)

I think there are three main things that I want to do next:

  1. Refactor some native D3D12 objects so that I can work with them in a more ergonomic (and potentially platform-independent) way, but more specifically so that I can specify things to render from the application rather than hardcode things in the graphics system
  2. Expand the constant buffer scheme so that I can have a camera and multiple objects with transforms and materials (even if these are very simple initially)
  3. Add the capability to the build system to run an arbitrary command line so that shaders can be compiled offline, both because I want to have shader errors reported at build time but also so that I can start using shader model 6.6 for the new bindless resources in a more sane way

I have also thought of a variation on breakout that seems like a simple way to incorporate some ray tracing, and so, at least for now, I think that might be my initial small proof-of-concept application to try and make, just so that I have some kind of example application rather than the current empty program that does nothing.

Devlog 2024-12-02

This is a regularly-occurring status update. More generally-relevant posts can be found under Features (see Creating a Game and Engine from Scratch for context).

This is the beginning of week 7.

What I have done

  • Finished the intentional tearing demo, which I wrote a post about
    • This was both satisfying and frustrating. I was happy to get something working that I had wanted to do for a long time but also disappointed that I didn’t get better results. I am also frustrated that I still don’t feel like I have as good of an understanding of swap chains and related flipping and timing as I would like.
    • I finally gave up and moved on and will have to revisit some of this when I have more complicated scenes being rendered, but it felt a bit like accepting defeat
  • Improved the code that waits when it happens on the Windows message queue thread
    • Right now the entire application is single threaded, but a rudimentary mechanism is in place to detect the thread (what I am currently calling the “display thread”) and do a different kind of wait so that new messages can be processed if appropriate
  • Render a triangle
    • This was mainly just following the Microsoft Hello-Triangle example, and so not particularly impressive
  • Improved the array view class
    • This is my implementation of std::span, and the initial implementation revealed some problems when dealing with COM ISomething* pointers, both because of pointer const complexity and because of inheritance
    • I wrote a bunch of tests of situations that I could think of and the class now works better. This presented a fun (thought frustrating) challenge to solve with C++ templates.
  • Add WinPixEventRuntime
    • This is the first external code/library that I have included in the project, which is a little unusual for me. Usually I start with {fmt} and Lua in order to have logging and configuration available, but in the current case I am delaying dealing with any strings and so haven’t followed my usual pattern.
    • This integration allows me to annotate captures in PIX for Windows
  • Add precompiled header support in the build system
    • Adding the ability to (force) include files wasn’t particularly difficult, and that was the primary motivator for doing this so that I could start writing and structuring code the way that I wanted to. I thought that going one more step with precompiled headers wouldn’t be too difficult but it ended up being more challenging than I had expected.
    • MSVC has some interesting constraints that I hadn’t been aware of where any files using a precompiled header have to use the same PDB file. Most notably, this means that a single PDB file is created and modified in many different steps by many different files getting compiled, which didn’t fit in well to my build system model where every task graph node (i.e. file) was the result of a single sub task. I think that I figured out a satisfactory way around this, though.
  • Add ability in the build system to query whether a sub task has a specific input
    • This allows me to decide whether a specific application needs the WinPixEventRuntime DLL
  • Add ability in the build system to query for the primary/obvious output target of a sub task
    • This allows me to decide where to stage the WinPixEventRuntime so that it is next to an application executable
  • Add ability in the build system to create files
    • This allows me to programmatically create a header file that #includes the WinPixEventRuntime headers using the current version number, sharing code with the LIB and DLL files

Next Steps

  • An obvious need is to add something to the build system to deal with shaders
    • Right now I just have a manually-copied shader source file in the application root directory that is compiled at run time, which is obviously not ideal
    • I am a little hesitant to start going down the path of figuring out asset building and loading, however, because I don’t have any string solution (recall that a goal of this project is to not use the standard library and to not use the general purpose new allocator, and so dealing with strings and paths is a big task to tackle)
    • With that being said, I could at least create a new type of build system sub task that allows a command line to be run and inputs and outputs to be specified. This would make it possible to work with the shader close to what I’m doing now but not quite so manually, and seems like an ok first step.
  • The other obvious task is to keep adding graphics features beyond the simple triangle and color-changing clears
    • I might at least continue to implement some of the D3D12 “Hello” examples, to increase my familiarity with the changes of D3D12 compared to D3D11 and to give me more time and experience to start thinking about abstractions and how to structure an actual renderer how I would want to

Syncing without VSync

I don’t remember how many years ago it was when I first read about doing this, but it was probably the following page which introduced me to the idea: https://blurbusters.com/blur-busters-lagless-raster-follower-algorithm-for-emulator-developers/. I am pleased to be able to say that I have finally spent the time to try and implement it myself:

How it Works

An application can synchronize with a display’s refresh cycle (instead of using vsync to do it) if two pieces of data are known:

  • The length of a single refresh (i.e. the refresh rate)
    • This is a duration
  • The time when a refresh ends and the next one starts (i.e. the vblank)
    • This is a repeating timestamp (a moment in time)

If both of these are known then the application can predict when the next refresh will start and update the texture that the graphics card is sending to the display at the appropriate time. If the graphics card changes the texture at the wrong time then “tearing” is visible, which is a horizontal line that separates the old previous texture above it and the new texture below it. (This Wikipedia article as a simulated example image of tearing.)

The texture that is active that the graphics card is sending to the display is called the “front buffer”. The texture that isn’t visible but that the application can generate before it is activated and being sent to the display is called the “back buffer”. There is different terminology for the act of making a back buffer into a front buffer but this post will call it “swapping”, and conceptually it can be thought of as treating the former back buffer as the new front buffer while what was the front buffer becomes a back buffer.

(What vsync does is take care of swapping front and back buffers at the appropriate time. If the application finishes generating a new texture in the back buffer and submits it in time then the operating system and graphics card will work together to make sure that it becomes the new front buffer for the display during the vblank. If the application doesn’t submit it in time and misses the vblank then nothing changes visually: The display keeps showing the old front buffer (meaning that a frame is repeated) and there is no tearing).

Why does swapping the front and back buffers at the wrong time cause tearing? Because even though swapping the front and back buffers happens instantaneously in computer memory the change from one frame to another on a display doesn’t. Instead the display updates gradually, starting at the top and ending at the bottom. Although this isn’t visible to a human eye the effect can be observed using a slow motion camera:

With that in mind it is possible to understand what I am doing in my video: I wrote a program that is manually swapping front and back buffers four times every refresh period in order to intentionally cause tearing. I am not actually rendering anything but instead just doing a very simple clearing of the entire back buffer to some single color, but by swapping a single color back buffer at the correct time in the middle of a display’s refresh I can change the color somewhere in the middle of the display.

Doing this isn’t particularly new and my results aren’t particularly impressive, but I was happy to finally find the time to try it.

Here is a bonus image where I wasn’t trying to time anything and instead just swapped between alternating red and green as quickly as I was able to:

Implementation Details

The rest of this post contains some additional technical information that I encountered while implementing this. Unlike the preceding section which was aimed at a more general audience the remainder will be for programmers who are interested in specific implementation details.

How to Calculate Time

Calculating the duration of a refresh period isn’t particularly difficult but it’s not sufficient to simply use the reported refresh rate. Although the nominal refresh rate would be close it wouldn’t exactly match the time reported by your CPU clock, and that’s what matters because that’s what you’ll be using to know when to swap buffers. In order to know what the refresh rate is in terms of the CPU clock an average of observed durations must be made. In order to calculate a duration it is necessary to keep track of how much time has elapsed between consistently repeating samples, but it doesn’t actually matter where in the refresh cycle these samples come from as long as they are consistently taken from the same (arbitrary) point in the refresh cycle. So, for example, timing after IDXGISwapChain::Present() returns (with a sync interval of 1 and a full present queue) would work, and timing after IDXGIOutput::WaitForVBlank() returns would also work.

It is more difficult, however, to calculate when the vblank actually happens.

DXGI_FRAME_STATISTICS

I finally settled on using IDXGISwapChain::GetFrameStatistics(). Using this meant that I was relying on DXGI instead of taking my own time measurements, but the big attraction of doing that is that the timestamps were then tied directly to discrete counters. Additionally, as a side benefit, after a bit of empirical testing it seemed like the sampled time in the DXGI frame statistics was always earlier than the time that I could sample myself, and so it seems like it is probably closer to the actual vblank than anything that I knew how to measure.

(The somewhat similar DwmGetCompositionTimingInfo() did not end up being as useful for me as I had initially thought. Alternatively, D3DKMTGetScanLine() seems like it could, in theory, be used for even more accurate results, but it wasn’t tied to discrete frame counters which made it more daunting. If my end goal had been just this particular demo I might have tried using that, but for my actual game engine renderer it seemed like IDXGISwapChain::GetFrameStatistics() would be easier, simpler and more robust.)

The problem that I ran into, however, is that I couldn’t find satisfactory explanations of what the fields of DXGI_FRAME_STATISTICS actually mean. I had to spend a lot of time doing empirical tests to figure it out myself, and I am going to document my findings here. If you found this post using an internet search for any of these DXGI_FRAME_STATISTICS-related terms then I hope this explanation saves you some time. (Alternatively, if you are reading this and find any mistakes in my understanding then please comment with corrections both for me and other readers.)

My Findings

The results of IDXGISwapChain::GetFrameStatistics() are a snapshot in time.

If you call IDXGISwapChain::GetLastPresentCount() immediately after IDXGISwapChain::Present() you will get the correct identifier for the present call that you just barely made, and this is very important to do in order to be able to correctly associate an individual present function call that you made with the information in the DXGI frame statistics (or, at least, it is conceptually important to do; you can also just keep track yourself of how many successful requests to present have been made).

On the other hand, if you call IDXGISwapChain::GetFrameStatistics() immediately after IDXGISwapChain::Present() there is no guarantee that you will get updated statistics (and, in fact, you most likely won’t). Instead, there is some non-deterministic (for you) moment in time after calling IDXGISwapChain::Present() where you would eventually get statistics for that specific request to present in the results of a call to IDXGISwapChain::GetFrameStatistics().

How do you know if the statistics you get are the ones that you want? You know that they are the ones that you want if the PresentCount field matches the value you got from IDXGISwapChain::GetLastPresentCount() after IDXGISwapChain::Present(). Once you call IDXGISwapChain::GetFrameStatistics() and get a PresentCount that matches the one that you’re looking for then you know two things:

  • The statistics that you now have refer to the known state of things when your submitted request to present (made by your call to IDXGISwapChain::Present() ) was actually presented
  • The statistics that you now have will not be updated again for your specific present request. What you now have is the snapshot that was made for your PresentCount , and no more snapshots will be made until another call to IDXGISwapChain::Present() is made (which means that the next time the statistics get updated they will be referring to a different PresentCount from the one that you are currently interested in).

Once you have a DXGI_FRAME_STATISTICS that is a snapshot for your specific PresentCount the important corresponding number is PresentRefreshCount. This tells which refresh of the display your request to present was actually presented during. If vsync is enabled PresentRefreshCount is the refresh of the display when your request to present was actually presented.

Once you have that information you can, incidentally, detect whether your request to present actually happened when you wanted and expected it to. This is described at https://learn.microsoft.com/en-us/windows/win32/direct3ddxgi/dxgi-flip-model#avoiding-detecting-and-recovering-from-glitches, in the “to detect a glitch” section. Although the description of what PresentCount and PresentRefreshCount is confusing to me in that document (and in other official documentation) the description of how to detect a glitch is consistent in my mind with how I have described these fields above, which helps to give me confidence that my understanding is probably correct.

Once you know the information above you can now potentially get timing information. The SyncRefreshCount refers to the same thing as PresentRefreshCount (it is a counter of display refresh cycles), and so it may be confusing why two different fields exist and what the distinction is between the two. PresentRefreshCount is, as described above, a mapping between PresentCount and a display refresh. SyncRefreshCount, on the other hand, is a mapping between the value in SyncQPCTime and a display refresh. The value in SyncQPCTime is a timestamp corresponding to the refresh in SyncRefreshCount. If SyncRefreshCount is the same as PresentRefreshCount then you know (approximately) the time of the vblank when your PresentCount request was actually displayed. It is possible, however, for SyncRefreshCount to be different from PresentRefreshCount, and that is why both fields are in the statistics struct.

To repeat: Information #1 is which display refresh your request was actually displayed in (comparing PresentCount to PresentRefreshCount) and information #2 is what the (approximate) time of a vblank for a specific refresh was (comparing SyncQPCTime to SyncRefreshCount). Derived information #3 is what the (approximate) time of a vblank was for the refresh that your request was actually displayed in.

(Side note: The official documentation here and here is very intentionally vague about when SyncQPCTime is actually measured. The driver documentation here, however, says “CPU time that the vertical retrace started”. I’m not sure if the more accessible user-facing documentation is intentionally vague to not be held accountable for how accurate the timing information is, or if the driver documentation is out-of-date. This post chooses to believe that the time is supposed to refer to the beginning of a refresh, with the caveat that I may be wrong and that even if I’m not wrong the sampled time is clearly not guaranteed to be highly accurate.)

One final thing to mention: A call to IDXGISwapChain::GetFrameStatistics() may return DXGI_ERROR_FRAME_STATISTICS_DISJOINT. One thing to note is that the values in PresentRefreshCount and SyncRefreshCount are monotonically-increasing and, specifically, they don’t reset even when the refresh rate changes. The consequence of this is that the DXGI_ERROR_FRAME_STATISTICS_DISJOINT result is very important for determining timing (like this post is concerned about). If you record the first PresentRefreshCount reported in the first successful call after DXGI_ERROR_FRAME_STATISTICS_DISJOINT was returned then you have a reference point for any future SyncRefreshCounts reported (until the next DXGI_ERROR_FRAME_STATISTICS_DISJOINT). Specifically, you know how many refresh cycles have happened with the current refresh rate.

How to Calculate Refresh Period

Calculating the refresh period using SyncRefreshCount and SyncQPCTime is not difficult: Average the elapsed time between the sampled timestamps of refreshes. I am using the incremental method described here: https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford’s_online_algorithm. This is easy to calculate and doesn’t require any storage beyond the current mean and the sample count. It can have problems with outliers or if the duration changes, though, and although I don’t anticipate either of those being issues it remains to be seen.

How to Predict VBlanks

I did some thinking about how to do this, and after some internet searching (I am not a numerical methods expert and so it took me a while to even figure out the correct search terms for what I was thinking) I found a really nice post about how to do exactly what I wanted, an incrementally-updating method for calculating the line that is the least-squares best fit for a bunch of sample points: https://blog.demofox.org/2016/12/22/incremental-least-squares-curve-fitting/. I liked this because it was a match for the incremental average function I was using, and since refresh cycles should happen regularly I figured that I could use SyncRefreshCount as the independent variable and SyncQPCTime as the dependent variable and then have a really computationally cheap way of predicting the time for any given refresh (in the past or in the future).

The good news is that this worked really well after some tweaking of my initial naive approach! The bad news is that the changes that I had to make in order to make it work well made me nervous about whether it would continue to perform well over time.

The big problem was losing precision. The SyncRefreshCount are inherently big numbers, but I already had to do some normalizing anyway so that they started at zero (see discussion above about DXGI_ERROR_FRAME_STATISTICS_DISJOINT) and so that didn’t seem so bad. SyncQPCTime, however, are also big numbers. The same trick of starting at zero can be used, and I also represented them as seconds (instead of the Windows high performance counts) and this helped me to get good results. I was worried about the long-term viability of this, however: Unlike the incremental method for the average this method required squaring numbers and multiplying big numbers together, and these numbers would constantly increase over time.

Even though I was quite happy with finding an algorithm that did what I had thought of, once I had implemented it there was still something that bothered me: I was trying to come up with a line equation, where the coefficients are the slope and the y-intercept. I already knew the slope, though, because I had a very good estimate of the duration of a refresh. In other words, I was solving for two unknowns using a bunch of sample points, but I already knew one of those unknowns! What I really wanted was to start with the slope and then come up with an estimate of the y-intercept only, and so it felt like the method I was using should be constrainable even more.

With that in mind I eventually came up with what I think, in hindsight, is a better solution even aside from precision issues. I know the “exact” duration between every vblank (we will conceptually consider that to be known, even though it’s an estimate), and for each reported sample I know the exact number of refreshes since the initial starting point (which is a nice discrete integer value), and then I know the approximate sampled time, which is the noisy repeated sample data I am getting that I want to improve in order to make predictions. What I can do, then, is to calculate what the initial starting time (for refresh count 0) would be, and incrementally calculate an average of that. This gives me the same cheap way of calculating the prediction (just a slope (refresh period) and a y-intercept (this initial timestamp)), but also a cheap way of updating this estimate (using the same incrementally-updating average method that I discussed above). And, even better, I can update it every frame without worrying about numerical problems. (Eventually with enough sample counts there will be issues with the updated value being too small, but that won’t impact the accuracy of the current average if we assume that it is very accurate by then.)

This means that I don’t have to spend time initially waiting for the averages to converge; instead I can just start with single sample that is already a reasonably good estimate and then proceed with normal rendering, knowing that my two running averages will keep getting more accurate over time with every new DXGI frame statistic that I get with new SyncRefresh information.

Devlog 2024-11-25

This is a regularly-occurring status update. More generally-relevant posts can be found under Features (see Creating a Game and Engine from Scratch for context).

This is the beginning of week 6.

What I have done

  • Initialized Direct3D 12 and cleared the window
  • Calculate timing of display refresh so that vblanks can be predicted
    • I am able to display horizontal color bars on the display without vsync enabled and with only clearing the entire window by properly timing and predicting when to swap the back buffer
    • This is something that I’ve wanted to do for years after reading about it and so it was fun to finally make it happen. I will make a dedicated post about it once I have cleaned up some remaining problems.
  • Added the ability to specify C++ compile optimizations and update the project configurations accordingly

Next Steps

  • Improve display timing
    • There are still problems that I don’t understand. Rendering without vsync isn’t important because the real engine won’t work that way and so it’s fine if the fun demo of color bars doesn’t work perfectly. What is important, however, is that the underlying timing information is correct for what I want to do with the real engine and I’m concerned with some of the behavior that I occasionally see that I don’t understand because it could lead to hard-to-diagnose problems later.
    • Even though I did the color bars as a fun project they are actually a really good way to visualize whether the timing is correct and so I think it is good to keep working on it before I undo the fun temporary hacky code and make things work the way that they are supposed to.
  • Create a separate thread for the display loop and the game logic
    • I was going to delay doing this until later, but as I’ve been working on the swap chain code I think it would be good to get this more formalized early
  • Draw a triangle
    • I had thought about adding better input next but for now I can just detect key presses using GetAsyncKeyState() in a hacky way and that means I can quickly experiment without having to go through an application layer. So I think I have changed my mind and that it is a better idea to keep working on the graphics system for the moment.

Devlog 2024-11-18

This is a regularly-occurring status update. More generally-relevant posts can be found under Features (see Creating a Game and Engine from Scratch for context).

This is the beginning of week 5.

What I have done

  • Build system changes:
    • Include directory search paths are used in the dependency hash so that if they change a file is recompiled
    • The use of standard libraries or platform libraries in compiled files that a target depends on now influences its compiled files include directory search paths
      • This is so that if some downstream target #includes a file that #includes <windows.h>, for example, that it will work as expected (meaning that specifying the static library dependency is enough)
    • Added features to help IntelliSense
      • Include directories and preprocessor defines are now added to VCXPROJ files
      • Compile arguments are now added to VCXPROJ files
        • (It turns out that this is important for IntelliSense for things like detecting what version of C++ is used; without it there are lots of false positive red squiggles when using newer features)
      • Files are now added to VCXPROJ files
        • Currently this is done by specifying a file path, along with an optional type (e.g. for C++ compiling or C++ including) and an optional logical display folder (so that files can be organized in a logical file structure in e.g. Solution Explorer).
        • I mostly like this, but one small annoyance is that for C++ files that get compiled it means that each source CPP file has to be listed twice. For now I have decided that the way it is makes sense (rather than e.g. automatically adding these files to be displayed) but it’s something I will have to keep thinking about as I get more experience.
      • Individual files can now be compiled in Visual Studio
        • This involved both adding a pattern for command lines but also adding the ability to jpmake. Previously I had a command line where a file path could be specified to generate just that file (i.e. it was an output file path), but in order to support the Visual Studio functionality the file path must be an input path.
      • Individual C++ files to compile can now have individual C++ build info, rather than always using the C++ build info of the target
        • This was already supported under the hood in jpmake, but I didn’t have a Lua interface for it
  • A new EmptyWindow application exists
    • All this does is create and display a main window, so it is the most basic application imaginable, but it was still exciting to be working on real code rather than the build system
    • I made a post about it
  • I have added many type traits
    • So far these are just clones of standard library ones that I have needed, but it has been fun to learn more about how some of this template “magic” works
    • I made a post about it
  • I added scope guard classes
  • I added custom implementations of move and forward
    • MakeMovable() and ForwardReference() are macros based on https://www.foonathan.net/2020/09/move-forward/
      • I discovered accidentally (there was some reddit or stack overflow comment about that link I think) that the macro version enforces actual moves with no copying fallback, which is something that I have wanted but wasn’t sure if or how it was implementable. After figuring out why the macro had this difference (related to const vs. non-const rvalue references) I have also added a MakeMovableIfPossible() in case I ever want it, which just does the same thing that std::move() does.
  • I added volatile load and store functions, with relaxed, acquire, or release memory ordering
  • I added the ability to allocate and free memory from the operating system
    • (It is also possible to reserve and commit memory in steps if the operating system supports it. This is something I use in jpmake, but don’t currently anticipate using in the game engine itself.)
  • I made array view and memory view classes
    • These don’t have all of the functionality that I would like, but they are a good start of a fundamental building block
    • The cMemoryView class can have a view of both const and mutable memory, and allows slices to be made for partitioning memory
  • I have added memory arena and allocator classes
    • This is just a starting point, but I do have a monotonic allocator and am able to partition and then logically allocate memory from the operating system
    • I have New()/Delete() functions for working with typed memory and Allocate()/Free() functions for working with untyped memory, all of which require an arena
      • The goal is to allocate a single big block of operating system memory when the application starts and then only work with that for the lifetime of the application by partitioning that into smaller sections and using hierarchical arenas to allocate in a controlled way (“controlled” both in terms of lifetimes and cache friendliness).
  • I have set up the machinery for creating a graphics “engine” and created a D3D12 device
    • The application lets the graphics library create a graphics engine, passing in itself and a block of memory from the operating system that is budgeted for graphics. The graphics engine stores a reference to the application who created it and returns a pointer to itself that the application can refer to. This is my attempt at dependency injection while still allowing easy access to different systems.
    • The graphics engine will have a platform-specific context, and so far I am just creating a D3D12 device to store in it as a proof of concept to show that I am able to access the platform-specific D3D functionality and allocate and store things with the new memory system.

Next Steps

  • Use D3D12 to clear the main window to some arbitrary color
  • Implement the top part of the window as one clear color and the bottom part of the window as a different clear color using ideas discussed at https://blurbusters.com/blur-busters-lagless-raster-follower-algorithm-for-emulator-developers/
    • This isn’t how I intend my engine to run (I will use VSync), but ever since reading about it I have wanted to try and do it to make sure I understood the timing involved, and I think getting this to work would show that I have the fundamentals in place before implementing how I really want my renderer to work.

Type Trait Coding Style

One of my goals while creating a custom game engine is to avoid using the C++ standard library. This post won’t discuss why, but one consequence of that goal that I have run into during this early phase of the project is that I have had to recreate some fundamental machinery that the standard library provides. By “recreate” I don’t mean that I have figured things out myself starting from nothing but first principles, but instead that I have had to look at existing documentation and implementations to understand how something works and then reimplement it in my own style. It has been fun to learn a bit more about how this corner of C++ metaprogramming works; although I have used these features (some of them frequently) I have only vaguely understood how they might have actually been implemented.

Doing this has also had the interesting consequence of presenting me with challenges to my existing coding style and I have had to expand and adapt. This post gives some examples of changes to my coding style that I have been experimenting with in order to accommodate type traits.

Existing Style

My current style uses a lowercase single letter prefix to indicate type names, meaning that if an identifier is spelled e.g. jSomething or pSomething the reader can immediately know that those names identify types, even without knowing what the j or p might mean (and, to be clear, those are just examples, and neither the j nor the p prefix exists (yet)).

Class names start with a c prefix:

class cMyClass;
class cSomeOtherClass;

Struct names start with an s prefix, base class names start with an i prefix (for “interface”, where my goal is that leaf classes are the only ones that are ever instantiated), and enumeration names start with an e prefix:

// When I use a struct instead of a class it means:
//	* The member variables are public instead of private
//	* The member variables use a different naming scheme
struct sMyStruct;
// The reader can tell that the following is a base class
// (and conceptually an abstract base class)
// just from the type name alone:
class iMyBaseClass;
// Using this enum convention makes it
// less annoying to use scope enumerations
// because the prefix makes it instantly identifiable
// and so the identifiers can be chosen accordingly:
enum class eMyState
{
	On,
	Off,
};

Type names start with a t prefix, e.g.:

// Creating a type alias:
using tMySize = int;
// In templates:
template<typename tKey, typename tValue>
class cMyMap;

(I don’t like the common convention of using single letters (e.g. T and U) in templates, and find that it makes code much harder to read for me personally, similarly to how I feel about single letter variable names. This strongly-held conviction has been challenged in some cases by working with type traits, which I discuss below.)

Type Names

Some of the type trait information that I have needed are expressions that are types. The standard uses a _t suffix for this, which is a helper type of a struct, but I have never loved this convention. In my code so far I have used a t prefix for cases like, and hidden the associated struct in a details namespace:

// tNoConst -> Type without a const qualifier
namespace Types::detail
	{
		template<typename T> struct sNoConst { using type = T; };
		template<typename T> struct sNoConst<const T> { using type = T; };
	}
template<typename T>
using tNoConst = typename Types::detail::sNoConst<T>::type;

// An example of this being used:
const int someConstVariable = 0;
tNoConst<decltype(someConstVariable)> someMutableVariable;
someMutableVariable = 0;

This use of the t prefix is not really any different from how I had already named types, and I find that it fits in naturally when used in code. Eagle-eyed readers may notice, however, that I am using a single T as a type, which I had mentioned is something that I strongly dislike and claimed that I don’t do!

In all of my previous template programming I have always been able to come up with some name that made sense in context, even if it was sometimes generic like tValue. While working on these type traits, however, I realized that there are cases (like shown above) where the type really could be any type. I considered tType, but that seemed silly. I considered tAny, and I might still end up changing my mind and refactoring the code to that (or something similar). For now, though, I have capitulated and am using just the T for fully generic fundamental type trait building blocks like the ones discussed in this post (in other code, though, I still intend to strictly adhere to my give-the-type-a-name rule).

Value Names

Some of the type trait information that I have needed are expressions that are values. The standard uses a _v suffix for this, but I have never loved this convention. In fact, I’m not sure that I really understand this convention; unlike with _t where there needs to be some underlying struct for the metaprogramming implementation to work it doesn’t seem like values need this (at least, the ones that I have recreated so far haven’t needed an underlying struct).

I did struggle a bit with how to name these, however. My existing coding convention would prefix global variables names with g_ (that will have to be discussed in a separate post), but these type trait variables feel different from traditional global variables to me. In my mind they conceptually feel more like functions than variables, but functions that I call with <> instead of (). I wanted some alternate convention to make them visually distinct from standard variables.

After some experimentation I eventually settled on keeping the v from the standard but making it a prefix instead of a suffix, and I have been pretty happy so far with the result:

// vIsConst -> Whether a type is const-qualified
template<typename T>
constexpr bool vIsConst = false;
template<typename T>
constexpr bool vIsConst<const T> = true;

// An example of this being used:
if constexpr (vIsConst<decltype(someVariable)>)
{
	// Stuff
}

This convention has added a new member to my pantheon of prefixes, but it has felt natural and like a worthy addition so far. As an additional unexpected bonus it has also given me a new convention for naming template non-type parameters:

// My new convention:
template<typename tSomeType, bool vSomeCondition>
class myClass1;

// My previous convention, which I never loved:
template<typename tSomeType, bool tSomeCondition>
class myClass2;

Having a new way of unambiguously specifying compile-time values has improved the readability of my code for me.

Concept Names

I have encountered one case where I wanted to make a named constraint and I had to think about what to name it. I don’t have enough experience yet to know whether my initial attempt is something that I will end up liking, but this is what I have come up with:

// rIsBaseOf -> Enforces vIsBaseOf
template<typename tDerived, typename tBase>
concept rIsBaseOf = vIsBaseOf<tBase, tDerived>;

// An example of this being used:
template<rIsBaseOf<iBaseClass> tSomeClass>
class cMyConstrainedClass;

I couldn’t use c for “constraint” or “concept” because that was already taken for classes. I finally settled on r for “restraint” (kind of like a mix of “constraint” and “restrict”, with a suggestion of requires) and I don’t hate it so far but I also don’t love it. It feels like it is good enough to do the job for me in my own code, but it also feels like maybe there’s a better convention that I haven’t thought of yet.

Application with an Empty Window

Behold, the main window of an application using my nascent engine:

It does as little as one might expect from the screenshot, but some of the architecture might be of interest.

Build System

This section shows parts of the jpmake project file.

Platforms and Configurations

The different possible platforms and configurations are defined as follows:

-- Platforms
------------

-- Define the platforms by name and type

platform_windows = DefinePlatform("windows", "win64")
local platform_windows = platform_windows

-- Get the specified platform to execute tasks for
platform_target = GetTargetPlatform()
local platform_target = platform_target

-- Configurations
-----------------

-- Define the configurations by name

configuration_unoptimized = DefineConfiguration("unoptimized")
local configuration_unoptimized = configuration_unoptimized

configuration_optimized = DefineConfiguration("optimized")
local configuration_optimized = configuration_optimized

configuration_profile = DefineConfiguration("profile")
local configuration_profile = configuration_profile

configuration_release = DefineConfiguration("release")
local configuration_release = configuration_release

-- Get the specified configuration to execute tasks for
local configuration_current = GetCurrentConfiguration()

I am creating global variables for each platform and configuration so that they are accessible by other Lua files and then immediately assigning them to local variables so that they are cheaper to use in this file. (I currently only have the single Lua project file, but soon I will want to change this and have separate files that can focus on different parts of the project.)

At the moment I am specifying any platform-specific information using a strategy like if platform_target == platform_windows and that works fine (there are several examples later in this post), but I am considering defining something like isWindows = platform_target == platform_windows instead. There won’t be many platforms (only one for the foreseeable future!) and it seems like it would be easier to read and write many platform-specific things with a single boolean rather than with a long comparison. I am doing something similar with the configurations where I define booleans that serve as classification descriptions, and so far it feels nice to me (again, there are examples later in this post).

Directory Structure

The directory structure from the perspective of jpmake is currently defined as follows:

-- Source Files
do
	SetEnvironmentVariable("engineDir", "Engine/")
	SetEnvironmentVariable("appsDir", "Apps/")
end
-- Generated Files
do
	-- Anything in the temp directory should be generated by jpmake executing tasks
	-- and the entire folder should be safely deletable.
	-- Additionally, any files that are not part of the Git repository
	-- should be restricted to this folder.
	SetEnvironmentVariable("tempDir", ConcatenateTable{"temp/", platform_target:GetName(), "/", configuration_current, "/"})
	-- The intermediate directory is for files that must be generated while executing tasks
	-- but which aren't required to run the final applications
	SetEnvironmentVariable("intermediateDir", "$(tempDir)intermediate/")
	SetEnvironmentVariable("intermediateDir_engine", "$(intermediateDir)engine/")
	-- The artifact directory is where jpmake saves files
	-- that it uses to execute tasks
	SetArtifactDirectory("$(intermediateDir)jpmake/")
	-- The staging directory contains the final applications that can be run
	-- independently of the source project and intermediate files
	SetEnvironmentVariable("stagingDir", "$(tempDir)staging/")
end

As a general rule I don’t like abbreviations in variable or function names but I decided to keep the “dir” convention from Visual Studio since these environment variable names will be used so frequently in paths that it seems like a reasonable exception to keep things shorter and more readable. (I did, however, decide to change the first letter to lowercase which fits with my variable naming convention better.)

An issue that I have run into in the past is having trouble deciding how to name directory environment variables to distinguish between source and generated files, and with games where there can be code and assets both for the engine and the application the possible choices are even more complex (and, to make matters worse, with this project I am intending to support multiple applications using the engine and so there is yet a further distinction that must be made). What I have will likely change as time goes on and I write more code, but it feels like a good start. The root repository folder looks like this:

Any files that are generated by the build process are kept quarantined in a single folder (temp/) so that the distinction between source and deletable files is very clear. This is very important to me (as anyone who has worked with me can attest). The temp directory looks like the following, expanded for one platform and configuration:

With such a simple application the only thing in the staging directory is the executable file, but when I develop more complicated applications there will be other files in staging directories (e.g. the assets that the game loads).

One further consideration that is currently missing is what to do with “tools”, those programs that are used during development (either for authoring content or as part of the build process) but that don’t get released to end users. I can imagine that I might want to update some of this directory structure when I start developing tools.

C++ Configuration

The next section in the jpmake project file configures the default settings for how C++ is built for the current target platform and build configuration:

-- C++
------

-- Initialize C++ for the current platform and configuration
cppInfo_common = CreateCppInfo()
local cppInfo_common = cppInfo_common
do
	-- #define VLSH_PLATFORM_SOMENAME for conditional compilation
	do
		local platform_define_suffix
		if (platform_target == platform_windows) then
			platform_define_suffix = "WINDOWS"
		else
			platform_define_suffix = "NONE"
		end
		cppInfo_common:AddPreprocessorDefine(("VLSH_PLATFORM_" .. platform_define_suffix),
			-- There isn't any anticipated reason to check anything other than whether the platform is #defined,
			-- but the name is used as a value because why not?
			platform_target:GetName())
	end
	-- The project directory is used as an $include directory
	-- so that directives like the following can be done to show scope:
	--	#include <Engine/SomeFeature/SomeHeader.hpp>	
	cppInfo_common:AddIncludeDirectory(".")
	local isOptimized = configuration_current ~= configuration_unoptimized
	cppInfo_common:AddPreprocessorDefine("VLSH_CONFIGURATION_ISOPTIMIZED", isOptimized)
	local isForProfiling = configuration_current == configuration_profile
	cppInfo_common:AddPreprocessorDefine("VLSH_CONFIGURATION_ISFORPROFILING", isForProfiling)
	local isForRelease = configuration_current == configuration_release
	cppInfo_common:AddPreprocessorDefine("VLSH_CONFIGURATION_ISFORRELEASE", isForRelease)
	do
		local areAssertsEnabled = not isForRelease and not isForProfiling
		cppInfo_common:AddPreprocessorDefine("VLSH_ASSERT_ISENABLED", areAssertsEnabled)
	end
	cppInfo_common.shouldStandardLibrariesBeAvailable = false
	cppInfo_common.shouldPlatformLibrariesBeAvailable = false
	cppInfo_common.shouldExceptionsBeEnabled = false
	cppInfo_common.shouldDebugSymbolsBeAvailable =
		-- Debug symbols would also have to be available for release in order to debug crashes
		not isForRelease
	if platform_target == platform_windows then
		cppInfo_common.VisualStudio.shouldCRunTimeBeDebug = not isOptimized
		cppInfo_common.VisualStudio.shouldIncrementalLinkingBeEnabled =
			-- Incremental linking speeds up incremental builds at the expense of bigger executable size
			not isForRelease
		-- Warnings
		do
			cppInfo_common.VisualStudio.shouldAllCompilerWarningsBeErrors = true
			cppInfo_common.VisualStudio.shouldAllLibrarianWarningsBeErrors = true
			cppInfo_common.VisualStudio.compilerWarningLevel = 4
		end
	end
end

This shows the general approach I am taking towards configuring things (both from the perspective of the game engine and also from the perspective of jpmake and my personal ideal way of configuring software builds). The named configurations (e.g. unoptimized, optimized, profile, release) that I defined earlier are just arbitrary names from the perspective of jpmake and don’t have any semantics associated with them. Instead it is up to the user to specify how each configuration behaves. I can imagine that this would be seen as a negative for most people, but I have a personal issue where I generally prefer to have full control over things.

This section should not be understood as being complete (most notably there actually aren’t any optimization-related settings except for which C run-time to use!) but that is because I haven’t implemented all of the Visual Studio options in jpmake yet.

Engine Static Library

Below is one example of a static library that I have made, which provides base classes for applications (meaning that an actual application can inherit from the provided framework):

do
	local task_application = CreateNamedTask("Application")
	local cppInfo_application = cppInfo_common:CreateCopy()
	do
		if (platform_target == platform_windows) then
			cppInfo_application.shouldPlatformLibrariesBeAvailable = true
		end
	end
	engineLibrary_application = task_application:BuildCpp{
			target = "$(intermediateDir_engine)Application.lib", targetType = "staticLibrary",
			compile = {
				"$(engineDir)Application/iApplication.cpp",
				"$(engineDir)Application/iApplication_windowed.cpp",
				platform_target == platform_windows and "$(engineDir)Application/iApplication_windowed.win64.cpp" or nil,
			},
			link = {
				engineLibrary_assert,
				platform_target == platform_windows and CalculateAbsolutePathOfPlatformCppLibrary("User32.lib", cppInfo_application) or nil,
			},
			info = cppInfo_application,
		}
end
local engineLibrary_application = engineLibrary_application

My current plan is to have the “engine” consist of a collection of static libraries that all get linked into the single application executable.

This named task shows a file specific to Windows that is only compiled for that platform (iApplication_windowed.win64.cpp, where my convention is to try to put as much platform-specific code in separate platform-specific CPP files as possible and then those files have the platform name as a sub-extension), as well as a Windows library that is only needed for linking on that platform (User32.lib) and another static library (engineLibrary_assert, which was defined earlier but that I don’t show in this blog post) that this static library depends on.

As more files get created that are specific to one platform or another I think my style will have to change to make it less annoying to conditionally specify each one.

Applications

Finally, the two proof-of-concept applications that I have created are defined as follows:

-- Hello World
--============

do
	do
		SetEnvironmentVariable("appDir", "$(appsDir)HelloWorld/")
		SetEnvironmentVariable("stagingDir_app", "$(stagingDir)HelloWorld/")
	end
	local cppInfo_helloWorld = cppInfo_common:CreateCopy()
	do
		-- For std::cout
		cppInfo_helloWorld.shouldStandardLibrariesBeAvailable = true
	end
	do
		local helloWorld_task = CreateNamedTask("HelloWorld")
		local application_subTask = helloWorld_task:BuildCpp{
				target = "$(stagingDir_app)HelloWorld.exe", targetType = "consoleApplication",
				compile = {
					"$(appDir)EntryPoint.cpp",
				},
				info = cppInfo_helloWorld,
			}
		helloWorld_task:SetTargetForIde(application_subTask)
	end
end

-- Empty Window
--=============

do
	do
		SetEnvironmentVariable("appDir", "$(appsDir)EmptyWindow/")
		SetEnvironmentVariable("stagingDir_app", "$(stagingDir)EmptyWindow/")
	end
	local cppInfo_emptyWindow = cppInfo_common:CreateCopy()
	do
		cppInfo_emptyWindow:AddIncludeDirectory("$(appDir)")
	end
	do
		local emptyWindow_task = CreateNamedTask("EmptyWindow")
		local application_subTask = emptyWindow_task:BuildCpp{
				target = "$(stagingDir_app)EmptyWindow.exe", targetType = "windowedApplication",
				compile = {
					"$(appDir)EntryPoint.cpp",
				},
				link = {
					engineLibrary_application,
				},
				info = cppInfo_emptyWindow,
			}
		emptyWindow_task:SetTargetForIde(application_subTask)
	end
end

These show the general approach towards making executable applications that I am envisioning, although these both are as simple as possible.

One idiom that I discovered is reusing the same environment variable names but setting them to different values for different applications. This allowed the names to be shorter and thus more readable (before this I had different versions with _helloWorld and _emptyWindow), but I don’t have enough experience to decide if this will work well long term.

The examples also show calls to SetTargetForIde(), which has no effect when executing tasks but is instead used when generating the solution files so that Visual Studio will correctly have its $(TargetPath) set, which makes setting up debugging easier.

Visual Studio Solution

It is now possible for jpmake to generate Visual Studio solution and project files. I did this work to make it easier to write code and debug in Visual Studio. The Solution Explorer currently looks like the following for the jpmake project that I have been showing in this post:

And the properties of the EmptyWindow project have some things filled in:

I had to spend more time on generating these files and additional jpmake features than I had initially anticipated before working on the engine code because I wasn’t able to debug, which felt like a requirement. With the way it works now, however, I was able to write the empty window application and things worked reasonably well.

I did have one discouraging realization, however, which is that Intellisense doesn’t work yet. I was able to complete the empty window application without it but it was more annoying than I would have anticipated. I think I need to take some more time to improve jpmake so that Intellisense will work at least somewhat because not having it has proven to be an annoying impediment.

Devlog 2024-11-11

This is a regularly-occurring status update. More generally-relevant posts can be found under Features (see Creating a Game and Engine from Scratch for context).

This is the beginning of week 4.

What I have done

  • A jpmake project can now generate Visual Studio IDE files
    • A solution file is created with project files corresponding to every named task
    • The appropriate command line is generated when a project gets built. The output can be generated if the Lua jpmake file specifies it, e.g.:
      • cppNamedTask:SetTargetForIde(myApplication)
    • Additionally, there are two special projects that are generated:
      • Do [JPMAKEPROJECTNAME]
        • This does all tasks, and it is the only Visual Studio project that is configured to build when the entire Visual Studio solution is built
      • Regenerate Solution
        • This can be built when there are changes to the jpmake project file
    • With these improvements it is possible to reasonably work with a jpmake project in Visual Studio. There are still several obvious important missing features (and many more that I could think of that would be nice to have), but I think the status is good enough to start development and add more features as they prove themselves to be issues.
  • It is now possible to specify include directories
    • This seemed important in order to be able to #include engine header files from application code
  • I have added options for generating debug info/symbols
    • After experimenting with debugging in Visual Studio this clearly was important
  • I have added options for handling warnings
    • Specifically, being able to set warning levels and to treat warnings as errors
  • I have created a new git repository to contain the actual engine and application code
    • I have integrated jpmake into this as a submodule
      • This is not what I had initially intended and instead had anticipated just having a jpmake.exe in the repository. It seems, however, that there will probably be frequent jpmake changes in response to me actually working on real projects and realizing that I want or need more features (especially early on) and so a submodule started to make more sense to me than a frequently-changing EXE binary file.
    • I have set up a Windows platform and three different configurations
      • For now I have “unoptimized”, “optimized”, and “release”
    • I have set up some initial directory structure, although I’m sure my mind will change about the best way to do organize things and what environment variables to use as I start authoring actual engine and application code
  • I have created a HelloWorld console application
    • This doesn’t do anything except print “Hello, world!” and so it doesn’t represent any interesting progress over the more complicated examples I had last week
    • It does, however, live within an actual project and so represents the first application in this new game engine repository

Next Steps

  • I need to make an application that displays an empty window
    • This should almost entirely use engine code, where the only thing that the application does is derive an application class from an abstract base application class and then pass on command arguments
  • The next step would be to initialize Direct3D 12 and clear the back buffer to some color
    • This might take more time than it otherwise would because I also want to implement some mechanisms for manual memory management