Systems Code in Detail
I am going to showcase some code and my thoughts related to it. It is rarely pretty. So be warned. Even if you do not want to read all of the systems, you probably still want to read the closing thoughts and outlook. Do not expect advertising speech. It is rather honest. If you want to get code without explanations, have fun on GitHub
Progression System
That is a parasite. It is a sin. It was death we needed to die. An open knife, I knew we would jump into… and we jumped. We needed to be able to get data from about everywhere in the game so this was my solution to get connected to that data without coupling the progression system too tightly to game components.
Connecting is done on start-up in our progression driver (and tutorial driver but it is essentially copied until it is not. Tutorials and quests were one system until their requirements changed and it was better to split them. Yet they work very similiar and I may copied the progression driver). It is those simple two lambda functions. Honestly I am not proud about them but you can read that in the comment above. Oh and I like offensive programming. I like to shove myself error messages and stop execution rather than behaving as if nothing happend and wondering later why certain systems don’t work as expected. Don’t treat a bug as an exception where operation in a degressed state is needed. I don’t like to operate in degressed states. I do that way too often. I have migraine.
Anyway, our quest is the data struct containing everything we need. We needed both references to other quests as well as references to scene objects. That is unfortunate because in Unity you can’t have both. Hence we have the quest as ScriptableObject
saved as data object to disk and as serialised object saved inside the scene. That way we reference using the saved files but save the completion requirements, rewards, and prerequisites inside the wrapper.
In Unity this monster turns into an UX hell describing a quest. I mean it was fine without further tools for our limited scope but I would have invested into a quest and tutorial tool if we would have had more than the seventeen quests and tutorials.
Speaking of tutorials, they are kind of similar but they are not. They all are active by default and you can silently finish them with certain actions and never see most of them. The main reason was to make them as unobtrusive as possible. We tried to think of ways to detect when players are stuck but that ended up being a dead end, so instead it is only time based with only one “active active” tutorial. This tutorial reveals itself after a certain timeout and it will show its helper(s) after another. However I must stress while this one “active active” tutorial will run and play its animation and helpers, all other can still be completed.
And because these are tutorials, we are happy if you meet one of their completion requirements
If you choose to take a closer look at the tutorial driver, you will notice the heavy use of LINQed statements (ok that one fell flat xD) but I don’t consider these drivers to be super performance critical as they have a lot of early returns, operate on little data, and I also didn’t find the drivers on my profiler. Because we did have a performance issue but it was caused by our outline plugin because it did something very stupid causing … sub-optimal run-time behavior. Something like O(nnn)… Changing that, was my most impactful oneliner ever. Here again very familiar code just changed so it returns true when one requirement is met.
This code along with most of both drivers should have been unified but these changes happend shortly before our release so that’s the lame excuse why the copied code is still copied-ish.
The tutorials have the aforementioned helpers. We wanted to provide contextual help and needed to delegate the specifics because our helpers are very different.
True, these are a bit on the nose but after a fair share of games not working due to information starvation and playtests indicating sys:logic was not working either because concepts were not understood (and we already put a lot of effort into information presentation of our games state), we figured it was time for drastic solutions.
This should cover most of our progression system. The interesting bits at least. The related code can be found in the progression system folder.
Tile System (Execution Graph)
More interesting than the progression however is the tile system. It is what essentially drives the game logic. Everything else is related to player-game-communication. But this very system is very distinct and the core. It gets context through the embedding into our world that is required to ease access to this system. However let’s get started with the tile itself. I am going to get into details as we move on.
I should have added a trigger warning… HEAVY USE OF OOP FEATURES. We have a deep inheritance structure with many virtual functions and some abstract classes.
The tile is a MonoBehaviour
because we use Unity and these are components that live on GameObjects
. Each tile knows its Cell
. However it is important that the value is invalid as long as they are not added to the Grid
. The default value of a Vector2Int
is (0, 0)
and caused quite a mess with our pathfinding and … “interesting” recursions when paths actually crossed (0, 0)
that led to a stack overflow. Lovely. Also because this value is serialised, setting a new default value didn’t help at all. But we will come to the second part of the fix later.
The Grid
is a reference to the owning, well, grid. We operate quite often on the grid itself. Thankfully I chose to do dependency injection here instead of making the grid a singleton. A singleton would have worked because in earlier version of the game we had only one grid. But since I didn’t, reworking the game from a single grid to multiple grids owned by an entity took only two days plus a few days to stabilise the game again.
The Inactive
property (yes, it is not a C# property but it is a property of the tile in terms of “when set it behaves differently”) is used by preplaced tiles to register them but render a question mark instead. The actual behavioural implementation is in Module
and I don’t know why I put it here. My bad.
Ok the next parts are quick. OnRemoveTile()
and OnAddTile()
are callbacks that are triggered when a tile was added to a grid or is about to be removed from it. GetNeighbours<T>()
had once the implementation that the grid now has and as it was used I just put a redirect there. We needed the implementation in other parts where we didn’t have a tile object and just a cell instead. Remove()
was born out of lazyness however in my defence: Without it, someone would have simply used Destroy
and introduced an invalid state in the grid. It is also not straight forward that you need to remove the tile on the grid itself. I thought about using the OnDestroy()
message but decided against as most removals are actually triggered by outside code. It is a rare case that tiles delete themselves. And we cannot have both as that would trigger two removals which we consider a bug.
All our tiles live in a grid structure called TileMap
. It maps world space coordinates to grid space coordinates and keeps track of all tiles. It also comes with a good amount of utility functions that we build upon. It started as this data only structure but progressed to also generate the visible mesh and partake in the progression as well. Still we try to decouple it from the execution graph and progression systems through the usage of events.
Because we want to use types when possible and need to know which prefabs to spawn when we want to call something like GridMap.AddTile<Buffer>(new Vector2Int(2, 3))
, we have a tile database called TileDB
. It consists of a handful of settings for each module we add and is heavily used throughout our game.
For the descriptions we chose a static naming system to minimise the available options and to keep consistency. In general, the usage of the Unity Localisation package helped us quite a bit even if we had only one language because we were able to treat these strings as assets. The proper structure gave us a speed advantage as we developed the game in both German and English at the same time. This was due to us releasing it on itch.io to a multi-lingual audiance and being a German research project attending German speaking conferences.
Now that we got the basics of the tile system covered, let’s move up a bit to the next class inheriting from tile: Module
.
See, there was a rule, especially in the beginning, to write doc comments as we knew the project would be travel through various hands with most knowledge getting lost in the process. And this was even more true for fundamental classes and methods.
Modules are the essential base class of our system. A module is an interactable thing that is meant to partake in the micro layer graph puzzle game play. That was on purpose because we considered adding obstacles to the grid and that would have worked perfectly fine with this approach as they could have inherited from Tile
, block these tiles and do stuff without interacting with the rest.
Modules also come with ports and are connected to other modules and connections. This got a bit messy admittedly. The reason being our connections and pathfinding, and because we wanted to exclude some modules from the execution graph. More on that later. Modules also must have UI. The rest is used for feedback and our colour pallete system (Grids can have a defined colour pallete and the modules need to match them obviously).
Here are some excerpts of functions that are implemented for further usage. Please, please ignore the if (this is Connection)
s. I know this is an architectual problem based on the very fact that Module
contains the ConnectedModules
, Connections
among other things and not ExecutableModule
. However, as I am just showing this system here isolated, I cannot show you how widespread our use of Module
in the codebase is. It should have been done but life happend and we fixed it dirty. Then our time was over :/ It is not like that this a super hard refactoring to be made and one that should have been done. However these kind of fixes tend to get in at the least convenient times and quite frankly, our base classes were not touched that often, so forgot about it and every time me saw it, something else was way more important.
Our messengers are back! They notify the tiles around them about their arrival and departure. Also on removal of a module, we destroy all connections to that module as connections cannot live alone.
Awake
and Start
are both needed in child classes so these are marked as virtual. SetModuleOutline
has the third parameter because of a child override that needs it. Just some basic stuff to ensure UI is set up, we find our modules in the hierachy, and our ports are spawned.
That is the last part of the interface. Note how you should not believe everything that is doc comments. Sigh. Sadly, such wrongness in one place biting you once, leaves you with the feeling that the documentation may not be in a good shape overall. That is a huge problem, especially since we wrote these comments to fasten future work. We provide a basic UI showing method that can and has to be customised. It offers dismantling of tiles if they have a button (They should not have the button, if you cannot dismantle them). You then override this method to initialise your UI state when you have more, like sliders, buttons, or combo boxes.
As previously mentioned, Module
s are only really half of the story as almost all contain logic that we wish execute. The shared base is ExecutableModule
.
Now this Input
is quite interesting. The first draft did not use any state outside the execute functions and transported all information as parameters and return values. The documentation is even lying about that until today:
However, that worked against splitters. They need to have one execution branch evaluated first, then another and so on. Saving the results of multiple callers in the target module made the most sense. Though implementation is… a return value may have been better.
Input
is protected
. My intuition and laugh about it because it sounds stupid, always was: Private
stuff is only visible to the class declaring it, public
stuff is visible to everyone, and protected
is private however visible inside the inheritence tree. Meaning if you inherit from ExecutableModule
you would be able to access Input
like any other private
variable but if you happen to access a foreign ExecutableModule
class object you can’t directly access Input
. Apparantly, in C# you can. In other words, I am confused why that is allowed:
But back to “why a return value would have been better”. You may remember our module having up to four outputs. With this implementation, the execution graph can not assert if the function actually behaves to this agreement. Our assumption is, if Execute()
is called everything is in working order and we have to expect data to be present. I want the crash then because it is a bug and a simple one to fix on top. However this could have been minimised with return values because you get a compiler error if you don’t return in every possible path which was probably the most common issue. Okay and in general it was not that much of an issue overall. But we are talking design here and I do not like that aspect.
There are actually broken states that we expect to happen based on user input that we handle. One of them is recursion (easily doable with any module with more than one input and one module with more than one output. Like comparator and splitter) and half connection/unconnection.
In these cases the execution list is built normally but the broken modules are collected in a set. If they are in this set, their GarbageExecute()
is called instead.
Of course the execution graph also needs to know if a module is an execution source. Meaning, it is a starting point that generates data.
And the execution graph actually needs to access the input on one occasion: It has to reset it.
No class is complete without callbacks for events. And ExecutableModule
has their own events, too. We need to reset state before a graph rebuild and update state after a rebuild. These callbacks are used quite heavily as we only want to update the visual states on user interaction. These callbacks don’t get called in the regular run.
With so much talking about the ExecutionGraph
let’s take a closer look. It is a super simple graphing algorithm that backtracks links and puts the modules into a list.
First of all, we see that Build(sources)
is called by someone that passes all the execution sources to the graph. That’s the nice thing about the graph, it works directly with the data provided by this call and the passed ExecutableModule
s. It has no clue about our tile maps and tiles.
Also notice the two callbacks I mentioned, here they are. Right before I reset our graph state and after we are done.
The interesting bit then boils down to
and
Sinks are modules that only process data but don’t output anything. In our world these are actors because they do something. However there are also unfinished connections where the last module is not an actor. In that case the endpoint is the last module that can’t push the data any more forward because there is no target. These endpoints are the real starting points for the backtrack search. So from this little excerpt you can see that this algorithm actually first does a forward search and then backtracks to get a correct execution list. I assume this is standard practice? I have no idea honestly.
FindSinks(...)
and BuildExecutionOrder(...)
are implementation detail.
Okay, okay there would not be #if DEBUG_EXECUTION_GRAPH
s scattered arround if the implementation detail would have not caused issues while developing the graph into one that handled sub branches and linking correctly. And I never removed the debug functions because you never know :D
FindSinks
stopps if it comes across a recursion (eg. module is in addedModules
). We don’t need to handle it here and we can’t. But most of the time, we still get to a sink and thus we don’t have an issue. allModules
is global in contrast to addedModules
. Both provide “have I already been here?” It’s remnant and in this specific way not useful for this function. The global one is sufficent.
We only travel along output ports in this method. The next null check is valid, the ExecutableModule
check is not. We had some issues with our Connection
s and our path finding that caused these modules to be added to the ConnectedModules
which is the design flaw I discussed in Module
. The initial bug has been solved, the design flaw not, so the check stayed.
If we find a next module we recurse into that module. If we found a module, the current module is not a sink, we didn’t find one it is.
Now bool BuildExecutionOrder(module, addedModules, allModules)
does need all the data that is passed. addedModules
is used to detect local recursions whereas allModules is used to prevent double execution of a branch or module. The return value indicates the health. True is working and false is broken.
Contrary to FindSinks
we travel along the input port aka backwards. Also no need to assert twice the same thing. Though the shrug is a must!
We recurse into the previous module(s) by giving them a copy of our added modules and a reference of all modules. We switch recursed to true and let it stay true if it ever gets there. Finally we increase our input module counter.
We use this counter to check if all modules are indeed connected and the modules is able to perform with all required data. If not, we add it to the broken modules set. Then we check if we recursed and are at the bad module (the second time). We need to add all other modules but skip the recursion inducing module. The comment is not a to do but a hint to the bottom !recursed
. Because we want to put the graph back into a working state as soon as we pass the recursion. This means, we only want to garbage execute the recursion causing modules and all the dependent ones normal and unaware of the recursion.
The run is as simple as one would expect. We just traverse the list and call the correct Execute
.
Run()
ticks at the end of the frame to ensure all game updates have run and all signals are processed. We also control the update frequency of the graph. This is a performance measurement though a premature one and after profiling probably not even a needed one. However there is also a logic implication because this makes it a somewhat fixed update which we do require for our design. Additionally, if modules interact with the world, they will cache Execute
results and implement Update
or FixedUpdate
.
Let’s have a look at some examples:
This actor moves a rigidbody of an entity forward. Actors are preplaced modules and can have inspector settable values like Speed
as such. The Execute
only caches the input value (among a visual and progression update). The real logic is performed in FixedUpdate
that moves the rigidbody forward.
In our second example, the RotationMotor
we see a similar structure. However the rotation motor has two inputs so it may happen that only one of them is connected and as such the module is broken. Because this is an actor that needs to cache values to rotate the transform in the world. That’s why here GarbageExecute
is specifically implemented to fill the missing inputs with garbage (hence the name) and then call Execute
.
Did you notice something here?
Hint: It has to do with the data type that is used throughout the execution graph to transport data.
Our execution graph transports analogue signals that are sometimes interpreted digitally. Though, not sometimes… Most often digitally.
We called that the signal dualism as the signal is both an anlogue and a digital signal depending how you use it. While a float
represents an anlogue value good enough it does not so much with a binary signal. I wanted something that ensures whenever we read signals, we interpret them the same way. That is how MicroData
was born… And boy did feel dirty implementing it.
… I think, I just spotted a bug in line 208. I think, it should have been a value check, like I do in line 210 though that one isn’t super right either. But I am not sure when these Equal
functions for objects are used in C# because we sure do not use them manually. Because I also overloaded the the ==
operator that we do use to perform a correct value check.
Here is another example this time of a sensor doing sphere casts. The result is a normalised value (It is between zero and one) that can be interpreted as the distance to the object. All of this is actually documentation of our functions that players need in-game to be able to use these modules. Because of that we have a lot of UI dedicated to the inputs their meaning and what one can expect.
Have a look at the modules folder for more examples to see for yourself how easy it is to add new modules. Though be warned, once implemented we rarely changed them. Some of them are now a few years old and I did not wanted to be super nitpicky in code reviews. The more recent modules are better. Surprise, practice makes a difference.
You may ask why did I implement the system SO super much object oriented? I believe, there are two main reasons. First, C# is an OOP language. However that does not mean a more functional and thus data driven (I will explain shortly) approach would not work. Second, all these modules live inside Unity and are represented by objects inside unity so it was more transparent to add individual scripts to these objects than tracking that data differently.
A different approach
It’s just a sketch. But the idea would remain roughly the same. The difference is then how data and functions are implemented. Maybe I would have used function pointers (yes it is not C, I know, but there has to be some similiar concept right? Funcrefs? Abuse delegates?) or used the giant switch with cases on each tile type. Not super nice and not extendable unlike the current system, but possible. The data would be moved to the exectution graph (and this time I mean the script that runs the functions) to collect in- and outputs. Look, there is a reason why I chose the aforementioned OOP path. It is the easiest to maintain and extend. It may not be the best for performance but that system didn’t cause a bottleneck so far. If I remember correctly, the base cost of the URP, our custom shaders, and the UI prove to be the biggest “issues”. I really only profiled once or twice. As long as it ran on a super weak laptop we decided the game to be fine. The situation probably looks different on mobile hardware but I expect rendering performance to be the bottleneck, here to.
Closing Thoughts
It somehow works. I mean, we worked hard to keep it working. Keep it in a runnable state. Keep the systems in place. Keep stuff as local as reasonably possible within our visible constrains. Before writing this breakdown, I felt at peace with this project. I felt a sense of accomplishement. I mean look at all these projects and you will see how small the role of game design and programming were. This time was different. There was this big focus on game design. On systems. It was fun. We discussed some directions in the beginning and I kept quiet because I always want to let other people have their way first because their ideas are probably better than mine. I don’t want to force my way onto the team and I can be very convincing when I believe in something. That is a bit dangerous and can easily kill brainstorming. Though after some circles, I put myself back in with this very specific design and example of a robot vacuum cleaner or gardening robot (two examples). I pitched the idea and we played an analogue prototype. In the end we agreed to use this as our base concept and build the systems as a sandbox at first because we did not know yet how to put this as a game together. We saved these questions as boomerangs. We had quite a few of these in the beginning that we carried around but I actually think, we handled them well. They were specifically named that way because they can kill you if you are not aware of them. However if you are, you can also catch them at the right time. And they stuck on top of our Miro board, always visible and we talked a lot about them when we did planning, when we were about to finish tasks, and on our way to get food.
Throughout the implementation I natuarlly got to know the systems the best. I had the most intimate knowledge of the systems of both expected behaviour and actual behaviour. As I said, I felt at peace with the project… yep, I was proud. Now, I took the word in my mouth. The group was amazing: Emma researched so hard to find something to get away from the classic tech representation that tries to be still understandable. To the speed in execution and the concepts. Sina worked super hard on the shaders that turned out very well in my opinion. Even though the more complex tile shader got cut to improve the experience. She also worked on the initial and second pathfinding. I only cleaned it up later a little bit and ensured the behaviour was predicatable in all cases. Jules had the hardest start as he had to work with code, too and against me. (He was also responsible for planning and communication with our research partners.) He implemented most of the UI and feedback effects and I did my reviews on his pull requests and sometimes did follow up PRs to clean stuff up. But he did a good job and improved significantly. I then was left with the rest. Some systems, connecting stuff, keeping our build pipeline working, and extinguishing fires where ones appear.
But seing the code with some time again, I see so many errors. Not just plain bugs but also structural issues. I can just hope the applied duct tape by our successors will be enough to release a good project. Pride comes before the fall. And so I see the fall. It is always nice if you do not see your mistakes but then you stall. I would like to learn and improve but all do is feeling bad.
Outlook
I visited them at the 2nd and 3rd February 2024. That was about five months after I had been actively involved in the project the last time. When we transferred the project, I told them explicitely to not fear to change things. It was probably good that I was departing because I had a tight grip on the idea and plan in my mind. I just figured it would be better if they use the first weeks to break stuff because they have a less biased view still. They then created the exhibition mode, something we talked about doing sometime in the future but we never came around doing. It is a dual screen setup. The world on a big upper screen, a touch screen (they had to redo the controls for touch) and an entity selector which consists of an NFC reader and 3d printed versions of our entities that you can click in our game. However in the exhibition mode you select them by putting them onto the entity selector. They also polished the UI so much. It has come such a long way even from our 0.8 release. Everything has little animations. The game is just fun to be interacted with on the most basic level. I am so proud of them. Like seriously. I played it at the HIVE:FIVE and I just smiled. It put me at peace, again. I may not have the skills to make good stuff but at least I know some people that worked hard make something good. At least I can be proud of someones else’s work.