loading times and early optimisation


Recently I decided to investigate long loading times and hiccups that happen during the loading. I had two reasons to do so.

The first is that it takes way too long to load the game. It was 5+37s on Quest. 5 seconds of initial logo when system stuff is loaded + 37 seconds of loading the actual content. Development builds on PC take even longer (lots of checks). I already had some ways to offset the loading process, so it seemed like I can't get much lower.

The second reason is partially related to frame drops I observed in reports from Quest. Before the latest public demo update it was more consistent and higher. Of course I assume that it might be related to other things, including my implementation of support for OpenXR. But I wanted to be sure that hiccups during loading are not the most important reason here. And they are indeed terrible and they started to be so apparent with that demo as well.

My initial suspect were shaders. Their compilation. I decided to add precompiled shaders to the game, to speed up the loading process. And it sped up. It was taking now 32 seconds. Not a game changer. And the worst thing was that the stuttering was still there. After further investigatoin it turned out that it was not the shader compilation that was taking so long. In fact, that was quite fast. But still most of the shaders were taking around a half of a second to be processed. How come, I thought.

I was affraid that for some unknown reason, reading state of shader params is taking this long. Turned out it wasn't that. It was using setting up Name variable. It was all my code that was so slow.

Now, a few words about Name structure. It is just a lookup for text. The variable itself is just a number but there's a dictionary that associates numbers to text. The text doesn't change during the runtime. It's just for the lookup, comparison, etc. Named script variables use that. The idea is to register all texts during loading or at least in asynchronous generation and use just indices during actual runtime.

My initial implementation was pretty naive. Then I changed it to make sure that there are as few locks as possible - to not stop other processes. But because I used Multiple Read Single Write kind of a mechanism that has also a simple implementation, memory barriers were still there. And to make it even worse, the whole dictionary was a huge array but with character strings allocated to random memory. Changing String into Name was a painful process of processing array one by one until the match is found. For new Names you had to go through the whole array. By the end of the loading, registering one Name could take as long as 0.700 ms. Which is A LOT.

What I did then?

Inlined strings. Now they have a limit (and I had to trim some of the texts due to that). Each Name has also a lookup where to look after seeing string comparion (that may result 0 if both strings are equal, negative value if the first different character's code is lower than the counterpart, positive value if opposite). And the whole array is preallocated, to avoid moving memory. Due to all of that I made locks happen only when actually registering a new Name. When just looking up an existing one or accessing it as a text, no concurrency mechanisms are used.

This made the whole process go down from 0.700ms to 0.003ms. 200x faster!

The loading went down from 37 seconds to 24 seconds. And no stuttering.

But that wasn't the end of it. There were other things that were taking lots of time during loading. Sounds (including music) required significant amount of time to load. I already had a system to mark things to load later (or on demand) but sounds were not part of that. They had to be loaded while loading info about them. Implemented that the actual content can be loaded "on demand".

Also, lots of other things went into "load on demand" further reducing the loading time. The last time I checked, it was 16 seconds. And I haven't yet said the last word. It's good enough for now, though.

I could have save lots of time spent on waiting for things to load if I implemented Name properly in the first place. And did "load on demand" earlier.

I didn't. Why? Because I just wanted to get on with other things to be done.

There is this saying "the premature optimisation is the root of all evil". Some people understand it as optimising at the end of the development. Which is not the intention behind this phrase. It's about not spending time on optimising something that you don't even know if is an issue. But if you're already sure that sooner or later you will have to optimise a part of code, it's better to do so. And sometimes you don't even have to measure things. There are a few things that I know might have to be done when there's a hunt for the better framerate. And implementing them late in the game may affect other parts of the game.

One of them is frame skipping. To avoid running specific code every frame but to make it run every 2, 3 or more or even random number of frames. If the code is written with that in mind, nothing bad will happen. If not, it may seriously affect how the game works. It may even break the game. Even if the frame skipping is related to animation. If a character is far away, you may advance its animation every second or third frame and interpolate the results. Animation system may accumulate events that would happen during those 2 or 3 frames and everything should work fine. Just the animation would be at a bit lower framerate but interpolated, at distance it wouldn't be apparent. But if other parts of the code are advanced every frame and provide information to the animation system in a synchronous way, having requests being active only for one frame, things are going to break (and it was a case in one game I was working on, while for a completely different game, the same solution it was not an issue).

If such an optimisation was implemented sooner, the bugs could be solved in a different way or avoided at all. Late in production you may only just disable optimisation for specific cases.

That's why, while I avoid premature optimisation in general (that results in such thing as Name being an issue), I am all for optimising things when they becoma an issue or in some cases even if it's just your experience that tells you that it might be a problem.

I am not fan of dirty tricks to gain extra speed or memory at the end of the production. And by dirty tricks I mean "early in the project allocating extra memory just to force people to use less memory, so when the memory becomes an issue, we drop that allocation and we have extra memory". Yeah, it works. And people who do that, do that because in every project they worked on, there were problems related to the memory. But if the things were handled properly early on, it wouldn't be an issue.

And this brings me to the third thing, that was not mentioned in the headline.

The game should be tested on the right device early into the development. It's not always possible as sometimes the devices are not there yet. But you can't imagine how often it is a real issue that the game does not work at all at the target device, only on a really powerful PC.

I know that the game development is a dirty process, things are half baked, ugly, etc. But just as you have whiteboxes instead of the actually meshed levels to test if the level or a gameplay mechanism feels good, something similar should be done with the code - to check if it actually works. And if it doesn't it should be addressed as soon as possible. In my opinion, it is better to accomodate game design to avoid such problems as sometimes it may open new doors, rather than cut quality or do some random radical changes late into the project.

Early problems with world generation lead Tea For God into a bit different way to generate the world and ulimately to allow open/procedural world solution to happen. Because by solving the issue, I learned that I could use this solution to generate the world on fly while moving through it.

And now back to the loading times. Because optimising the code there made me aware of how I could improve other parts of the code. And that I could push more things into "load on demand". And because of that I could add more content to the game as well.

And while the game loading time went down, generating the level time (or "loading" checkpoint which is also generating the level but at a different place) is still high. Well, it went down a bit due to Name changes and because of a few other things, but not by such an order of magnitude as game loading. I will be speeding that up. In the worst case I still may have a minigame there. Haven't thought yet if maybe giving extra experience points for playing that minigame could a bit over the edge.

Get Tea For God

Comments

Log in with itch.io to leave a comment.

Glad to hear you're working on the frames. Tea for God is my fav VR game. I'm still using my Oculus Quest 1 so I don't know if that is the source of my issues, but I have pretty rough frame rate issues and after playing for a few minutes the game stops recognizing my controller inputs (button presses). If I use the Oculus menu button to pause and go back in my hands will work again for a few seconds but then they get stuck again. Had to end my session early due to this. 

Hope you get things worked out! 👍

Haven't tried Quest in a while. Added to my todo list to have a look at it soon.

woah- that seems like a lot of work but reduced load times / better overall performance will be appreciated by everyone

The big part was the experience I gained during the last 9 years. Coding is never ending learning process :)

And in terms of risk, it was something that could break potentially everything but if it did, it would be noticeable immediately. Other ways to speed up things might be riskier and require more work. But should be done, if required :)