Syncing without VSync

I don’t remember how many years ago it was when I first read about doing this, but it was probably the following page which introduced me to the idea: https://blurbusters.com/blur-busters-lagless-raster-follower-algorithm-for-emulator-developers/. I am pleased to be able to say that I have finally spent the time to try and implement it myself:

How it Works

An application can synchronize with a display’s refresh cycle (instead of using vsync to do it) if two pieces of data are known:

  • The length of a single refresh (i.e. the refresh rate)
    • This is a duration
  • The time when a refresh ends and the next one starts (i.e. the vblank)
    • This is a repeating timestamp (a moment in time)

If both of these are known then the application can predict when the next refresh will start and update the texture that the graphics card is sending to the display at the appropriate time. If the graphics card changes the texture at the wrong time then “tearing” is visible, which is a horizontal line that separates the old previous texture above it and the new texture below it. (This Wikipedia article as a simulated example image of tearing.)

The texture that is active that the graphics card is sending to the display is called the “front buffer”. The texture that isn’t visible but that the application can generate before it is activated and being sent to the display is called the “back buffer”. There is different terminology for the act of making a back buffer into a front buffer but this post will call it “swapping”, and conceptually it can be thought of as treating the former back buffer as the new front buffer while what was the front buffer becomes a back buffer.

(What vsync does is take care of swapping front and back buffers at the appropriate time. If the application finishes generating a new texture in the back buffer and submits it in time then the operating system and graphics card will work together to make sure that it becomes the new front buffer for the display during the vblank. If the application doesn’t submit it in time and misses the vblank then nothing changes visually: The display keeps showing the old front buffer (meaning that a frame is repeated) and there is no tearing).

Why does swapping the front and back buffers at the wrong time cause tearing? Because even though swapping the front and back buffers happens instantaneously in computer memory the change from one frame to another on a display doesn’t. Instead the display updates gradually, starting at the top and ending at the bottom. Although this isn’t visible to a human eye the effect can be observed using a slow motion camera:

With that in mind it is possible to understand what I am doing in my video: I wrote a program that is manually swapping front and back buffers four times every refresh period in order to intentionally cause tearing. I am not actually rendering anything but instead just doing a very simple clearing of the entire back buffer to some single color, but by swapping a single color back buffer at the correct time in the middle of a display’s refresh I can change the color somewhere in the middle of the display.

Doing this isn’t particularly new and my results aren’t particularly impressive, but I was happy to finally find the time to try it.

Here is a bonus image where I wasn’t trying to time anything and instead just swapped between alternating red and green as quickly as I was able to:

Implementation Details

The rest of this post contains some additional technical information that I encountered while implementing this. Unlike the preceding section which was aimed at a more general audience the remainder will be for programmers who are interested in specific implementation details.

How to Calculate Time

Calculating the duration of a refresh period isn’t particularly difficult but it’s not sufficient to simply use the reported refresh rate. Although the nominal refresh rate would be close it wouldn’t exactly match the time reported by your CPU clock, and that’s what matters because that’s what you’ll be using to know when to swap buffers. In order to know what the refresh rate is in terms of the CPU clock an average of observed durations must be made. In order to calculate a duration it is necessary to keep track of how much time has elapsed between consistently repeating samples, but it doesn’t actually matter where in the refresh cycle these samples come from as long as they are consistently taken from the same (arbitrary) point in the refresh cycle. So, for example, timing after IDXGISwapChain::Present() returns (with a sync interval of 1 and a full present queue) would work, and timing after IDXGIOutput::WaitForVBlank() returns would also work.

It is more difficult, however, to calculate when the vblank actually happens.

DXGI_FRAME_STATISTICS

I finally settled on using IDXGISwapChain::GetFrameStatistics(). Using this meant that I was relying on DXGI instead of taking my own time measurements, but the big attraction of doing that is that the timestamps were then tied directly to discrete counters. Additionally, as a side benefit, after a bit of empirical testing it seemed like the sampled time in the DXGI frame statistics was always earlier than the time that I could sample myself, and so it seems like it is probably closer to the actual vblank than anything that I knew how to measure.

(The somewhat similar DwmGetCompositionTimingInfo() did not end up being as useful for me as I had initially thought. Alternatively, D3DKMTGetScanLine() seems like it could, in theory, be used for even more accurate results, but it wasn’t tied to discrete frame counters which made it more daunting. If my end goal had been just this particular demo I might have tried using that, but for my actual game engine renderer it seemed like IDXGISwapChain::GetFrameStatistics() would be easier, simpler and more robust.)

The problem that I ran into, however, is that I couldn’t find satisfactory explanations of what the fields of DXGI_FRAME_STATISTICS actually mean. I had to spend a lot of time doing empirical tests to figure it out myself, and I am going to document my findings here. If you found this post using an internet search for any of these DXGI_FRAME_STATISTICS-related terms then I hope this explanation saves you some time. (Alternatively, if you are reading this and find any mistakes in my understanding then please comment with corrections both for me and other readers.)

My Findings

The results of IDXGISwapChain::GetFrameStatistics() are a snapshot in time.

If you call IDXGISwapChain::GetLastPresentCount() immediately after IDXGISwapChain::Present() you will get the correct identifier for the present call that you just barely made, and this is very important to do in order to be able to correctly associate an individual present function call that you made with the information in the DXGI frame statistics (or, at least, it is conceptually important to do; you can also just keep track yourself of how many successful requests to present have been made).

On the other hand, if you call IDXGISwapChain::GetFrameStatistics() immediately after IDXGISwapChain::Present() there is no guarantee that you will get updated statistics (and, in fact, you most likely won’t). Instead, there is some non-deterministic (for you) moment in time after calling IDXGISwapChain::Present() where you would eventually get statistics for that specific request to present in the results of a call to IDXGISwapChain::GetFrameStatistics().

How do you know if the statistics you get are the ones that you want? You know that they are the ones that you want if the PresentCount field matches the value you got from IDXGISwapChain::GetLastPresentCount() after IDXGISwapChain::Present(). Once you call IDXGISwapChain::GetFrameStatistics() and get a PresentCount that matches the one that you’re looking for then you know two things:

  • The statistics that you now have refer to the known state of things when your submitted request to present (made by your call to IDXGISwapChain::Present() ) was actually presented
  • The statistics that you now have will not be updated again for your specific present request. What you now have is the snapshot that was made for your PresentCount , and no more snapshots will be made until another call to IDXGISwapChain::Present() is made (which means that the next time the statistics get updated they will be referring to a different PresentCount from the one that you are currently interested in).

Once you have a DXGI_FRAME_STATISTICS that is a snapshot for your specific PresentCount the important corresponding number is PresentRefreshCount. This tells which refresh of the display your request to present was actually presented during. If vsync is enabled PresentRefreshCount is the refresh of the display when your request to present was actually presented.

Once you have that information you can, incidentally, detect whether your request to present actually happened when you wanted and expected it to. This is described at https://learn.microsoft.com/en-us/windows/win32/direct3ddxgi/dxgi-flip-model#avoiding-detecting-and-recovering-from-glitches, in the “to detect a glitch” section. Although the description of what PresentCount and PresentRefreshCount is confusing to me in that document (and in other official documentation) the description of how to detect a glitch is consistent in my mind with how I have described these fields above, which helps to give me confidence that my understanding is probably correct.

Once you know the information above you can now potentially get timing information. The SyncRefreshCount refers to the same thing as PresentRefreshCount (it is a counter of display refresh cycles), and so it may be confusing why two different fields exist and what the distinction is between the two. PresentRefreshCount is, as described above, a mapping between PresentCount and a display refresh. SyncRefreshCount, on the other hand, is a mapping between the value in SyncQPCTime and a display refresh. The value in SyncQPCTime is a timestamp corresponding to the refresh in SyncRefreshCount. If SyncRefreshCount is the same as PresentRefreshCount then you know (approximately) the time of the vblank when your PresentCount request was actually displayed. It is possible, however, for SyncRefreshCount to be different from PresentRefreshCount, and that is why both fields are in the statistics struct.

To repeat: Information #1 is which display refresh your request was actually displayed in (comparing PresentCount to PresentRefreshCount) and information #2 is what the (approximate) time of a vblank for a specific refresh was (comparing SyncQPCTime to SyncRefreshCount). Derived information #3 is what the (approximate) time of a vblank was for the refresh that your request was actually displayed in.

(Side note: The official documentation here and here is very intentionally vague about when SyncQPCTime is actually measured. The driver documentation here, however, says “CPU time that the vertical retrace started”. I’m not sure if the more accessible user-facing documentation is intentionally vague to not be held accountable for how accurate the timing information is, or if the driver documentation is out-of-date. This post chooses to believe that the time is supposed to refer to the beginning of a refresh, with the caveat that I may be wrong and that even if I’m not wrong the sampled time is clearly not guaranteed to be highly accurate.)

One final thing to mention: A call to IDXGISwapChain::GetFrameStatistics() may return DXGI_ERROR_FRAME_STATISTICS_DISJOINT. One thing to note is that the values in PresentRefreshCount and SyncRefreshCount are monotonically-increasing and, specifically, they don’t reset even when the refresh rate changes. The consequence of this is that the DXGI_ERROR_FRAME_STATISTICS_DISJOINT result is very important for determining timing (like this post is concerned about). If you record the first PresentRefreshCount reported in the first successful call after DXGI_ERROR_FRAME_STATISTICS_DISJOINT was returned then you have a reference point for any future SyncRefreshCounts reported (until the next DXGI_ERROR_FRAME_STATISTICS_DISJOINT). Specifically, you know how many refresh cycles have happened with the current refresh rate.

How to Calculate Refresh Period

Calculating the refresh period using SyncRefreshCount and SyncQPCTime is not difficult: Average the elapsed time between the sampled timestamps of refreshes. I am using the incremental method described here: https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford’s_online_algorithm. This is easy to calculate and doesn’t require any storage beyond the current mean and the sample count. It can have problems with outliers or if the duration changes, though, and although I don’t anticipate either of those being issues it remains to be seen.

How to Predict VBlanks

I did some thinking about how to do this, and after some internet searching (I am not a numerical methods expert and so it took me a while to even figure out the correct search terms for what I was thinking) I found a really nice post about how to do exactly what I wanted, an incrementally-updating method for calculating the line that is the least-squares best fit for a bunch of sample points: https://blog.demofox.org/2016/12/22/incremental-least-squares-curve-fitting/. I liked this because it was a match for the incremental average function I was using, and since refresh cycles should happen regularly I figured that I could use SyncRefreshCount as the independent variable and SyncQPCTime as the dependent variable and then have a really computationally cheap way of predicting the time for any given refresh (in the past or in the future).

The good news is that this worked really well after some tweaking of my initial naive approach! The bad news is that the changes that I had to make in order to make it work well made me nervous about whether it would continue to perform well over time.

The big problem was losing precision. The SyncRefreshCount are inherently big numbers, but I already had to do some normalizing anyway so that they started at zero (see discussion above about DXGI_ERROR_FRAME_STATISTICS_DISJOINT) and so that didn’t seem so bad. SyncQPCTime, however, are also big numbers. The same trick of starting at zero can be used, and I also represented them as seconds (instead of the Windows high performance counts) and this helped me to get good results. I was worried about the long-term viability of this, however: Unlike the incremental method for the average this method required squaring numbers and multiplying big numbers together, and these numbers would constantly increase over time.

Even though I was quite happy with finding an algorithm that did what I had thought of, once I had implemented it there was still something that bothered me: I was trying to come up with a line equation, where the coefficients are the slope and the y-intercept. I already knew the slope, though, because I had a very good estimate of the duration of a refresh. In other words, I was solving for two unknowns using a bunch of sample points, but I already knew one of those unknowns! What I really wanted was to start with the slope and then come up with an estimate of the y-intercept only, and so it felt like the method I was using should be constrainable even more.

With that in mind I eventually came up with what I think, in hindsight, is a better solution even aside from precision issues. I know the “exact” duration between every vblank (we will conceptually consider that to be known, even though it’s an estimate), and for each reported sample I know the exact number of refreshes since the initial starting point (which is a nice discrete integer value), and then I know the approximate sampled time, which is the noisy repeated sample data I am getting that I want to improve in order to make predictions. What I can do, then, is to calculate what the initial starting time (for refresh count 0) would be, and incrementally calculate an average of that. This gives me the same cheap way of calculating the prediction (just a slope (refresh period) and a y-intercept (this initial timestamp)), but also a cheap way of updating this estimate (using the same incrementally-updating average method that I discussed above). And, even better, I can update it every frame without worrying about numerical problems. (Eventually with enough sample counts there will be issues with the updated value being too small, but that won’t impact the accuracy of the current average if we assume that it is very accurate by then.)

This means that I don’t have to spend time initially waiting for the averages to converge; instead I can just start with single sample that is already a reasonably good estimate and then proceed with normal rendering, knowing that my two running averages will keep getting more accurate over time with every new DXGI frame statistic that I get with new SyncRefresh information.

Leave a Reply

Your email address will not be published. Required fields are marked *