Create an account


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
News - Blog: Referencing objects†“Names vs GUIDs

#1
Blog: Referencing objects†“Names vs GUIDs

<div style="margin: 5px 5% 10px 5%;"><img src="http://www.sickgaming.net/blog/wp-content/uploads/2019/04/blog-referencing-objectsa-names-vs-guids.jpg" width="200" height="200" title="" alt="" /></div><div><p>Here is the situation: You have created some sort of&nbsp;<a href="https://ourmachinery.com/post/the-story-behind-the-truth-designing-a-data-model/">data model</a>&nbsp;for representing objects in memory/on disk. Now you need the ability for objects to refer to other objects. I.e., an object needs to talk&nbsp;<em>about</em>&nbsp;another object. Some examples:</p>
<ul>
<li>A material object may point to a texture object and say “I want to use this as my diffuse map”.</li>
<li>An animation object may point to a model object and say “I want to rotate this model around its z-axis”.</li>
</ul>
<p>How can we accomplish this?</p>
<p>Here are two options:</p>
<ul>
<li>
<p><strong>Names:</strong>&nbsp;Each object is referred to by its&nbsp;<em>name.</em>&nbsp;The name is a string assigned to the object by the user and the user can change this string at will (rename the object).</p>
</li>
<li>
<p><strong>GUIDs:</strong>&nbsp;Each object is referred to by a globally unique identifier (GUID). The GUID is assigned to the object on creation and never changes. It is guaranteed to only represent this particular object and no other.</p>
</li>
</ul>
<p>Names are resolved in some kind of context (typically the children of the current object). Thus, to refer to an object that is “far away” from us we might have to use a sequence of names to navigate the object tree, e.g.,&nbsp;<code>../../player/head/left_eye</code>. Much like a path in a file system, this sequence of names provides a&nbsp;<em>path</em>&nbsp;from one object in our object tree to another. Note that in this post I will sometimes somewhat sloppily talk about the&nbsp;<em>name</em>&nbsp;of an object when I actually mean the full path to an object.</p>
<p>You might protest that there are other ways of representing references too. For example, an in-memory representation could just use a pointer. A disk representation could use a file offset. Combinations are possible too — for example (filename + offset) to represent an object inside a file. However, it is easy to become confused when considering the myriad of possibilities, so let’s put all of that aside for the moment. In this post, I’m going to focus on the difference between&nbsp;<em>names</em>&nbsp;and&nbsp;<em>GUIDs</em>&nbsp;and in the end we will see how the discussion applies to the other possibilities.</p>
<blockquote>
<p>Side note: There is another interesting option apart from names and GUIDs and that is to refer to an object by the hash of its content. With this approach, the same content is always referred to by the same unique identifier (its hash) and if you change the content all the references have to be updated. If you start to think about it, most of&nbsp;<a href="https://en.wikipedia.org/wiki/Git">git</a>&nbsp;falls out as the result of this single design decision.</p>
</blockquote>
<p>Names and GUIDs both have their pros and cons, making it hard to say that one is strictly better than the other:</p>
<table>
<thead>
<tr>
<th><strong>Names</strong></th>
<th><strong>IDs</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>Fragile — if objects are renamed, moved or deleted, references will break</td>
<td>Unreadable — references look like random numbers which makes them hard to debug</td>
</tr>
<tr>
<td>Cumbersome — coming up with meaningful names for everything is a chore</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>Expensive — names have to be matched against the object tree to find the objects</td>
<td>&nbsp;</td>
</tr>
</tbody>
</table>
<p>Each of these points can be argued back-and-forth endlessly. Can’t we auto-assign names to make them easier to come up with? But how readable are names really if most of the things are just named&nbsp;<code>box_723</code>? Can’t we make a tool that looks up a readable name from a GUID? Can’t we also make a tool that automatically patches references when an object is renamed? Etc, etc, etc.</p>
<p>Again, it’s easy to get stuck in the nitty-gritty details of this and miss the bigger picture. To make things clearer, let’s take a step back and ask ourselves:</p>
<blockquote>
<p>What is the fundamental difference between names and GUIDs?</p>
</blockquote>
<p>Think about it for a bit. Here’s my answer:</p>
<blockquote>
<p>A GUID specifies an object identity, but a name specifies an object’s role.</p>
</blockquote>
<p>The GUID&nbsp;<code>90e2294e-9daf-45f0-b75b-01fb85bb6dc8</code>&nbsp;always refers to one specific object — the one single object in the universe with that GUID. The path&nbsp;<code>head/left_eye</code>&nbsp;refers to whatever object is currently acting as the character’s left eye. It does not always have to be the same object. Maybe the character loses her eye at some point and it gets replaced with a glass eye. Maybe we can spawn multiple instances of the character in different configurations with different kinds of eyes — flesh eyes, robot eyes, anime eyes, etc. Regardless of the setup,&nbsp;<code>head/left_eye</code>&nbsp;will refer to the character’s left eye.</p>
<p>In contrast, if we used a GUID to refer to the left eye and the eye got replaced, the GUID would still refer to the old eye we lost. And a single GUID couldn’t be used to refer to different eyes in different character setups.</p>
<table>
<thead>
<tr>
<th>&nbsp;</th>
<th><strong>Name</strong></th>
<th><strong>GUID</strong></th>
<th><strong>Hash</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>References objects by</strong></td>
<td>Their role</td>
<td>Their identity</td>
<td>Their content</td>
</tr>
</tbody>
</table>
<p>The pointers and offsets that I talked about in the beginning of the post are similar to GUIDs, since they reference objects by identity. A pointer always points to the same object. In fact, you could see a pointer as a deserialized version of a GUID — a way of uniquely referencing an object in memory. Offsets too, uniquely identify objects. (But offsets are not permanent, so references must be updated each time a file is saved.)</p>
<blockquote>
<p>A name allows for “late binding” of references.</p>
</blockquote>
<p>To get from a name to an actual object, we need to&nbsp;<em>resolve</em>&nbsp;the name at some point. This involves matching the path against the object tree and finding the corresponding object. In contrast to a GUID, which always points to the same object, a name might resolve to different things at different points in time, or in different contexts. The reference isn’t bound to a particular target until the name is resolved.</p>
<p>When does this happen? You can decide that when you design a system based on your performance/flexibility requirements. For example, you can decide to resolve all references once and only once — when the object is spawned. This is faster, because you only need to look references up once, but it also means that in the case where the eye is removed and replaced, the reference won’t be updated to point to the new eye. So it’s less flexible.</p>
<p>The other option is to resolve the reference every single frame. This can handle objects being removed and/or replaced, but it also means having to pay the performance cost of resolving the reference every single frame.</p>
<p>With this new understanding of the fundamental difference between names and GUIDs we can take another look at the pros and cons we listed above and see if we can understand them better.</p>
<p><strong>Names are fragile — they can break if objects are moved, renamed or deleted</strong></p>
<p>Yes, this is the whole point!</p>
<p>The main reason for using names is to allow late binding. Late binding means we don’t know beforehand what the name will resolve to (or if it will resolve to anything at all). We can’t get the benefits of late binding without also getting the drawbacks.</p>
<p>For instance, in the example above, after the eye has been removed, but before it has been replaced with a glass eye,&nbsp;<code>head/left_eye</code>&nbsp;will not resolve to anything — because the character doesn’t have a left eye. Code that expects to find an object at&nbsp;<code>head/left_eye</code>&nbsp;might break.</p>
<p>A name might also resolve to&nbsp;<em>something unexpected</em>. For example, the eye might be removed and replaced by a little man. Code that was written to deal with an eye, or even with no eye, might break when it finds a little man in the eye socket.</p>
<p>In addition to breaking in this correct way — where a resolve rightly fails because the object doesn’t exist — references can also break in incorrect ways. The resolve might fail, not because there is no eye, but because the user made a mistake. For example, maybe the eye was named&nbsp;<code>LeftEye</code>&nbsp;instead of&nbsp;<code>left_eye</code>.</p>
<p>To an extent — problems like this can be mitigated by good tooling. For example, the tools might warn about unresolved references. The tools might also assist with renaming, so that if you rename&nbsp;<code>left_eye</code>&nbsp;→&nbsp;<code>LeftEye</code>&nbsp;all the references are updated to&nbsp;<code>LeftEye</code>&nbsp;too.</p>
<p>But note that there is an inherent conflict here. The whole point of using names is to allow the references to be more lax and flexible. If the tools are too anal with their warnings it kind of defeats that purpose. For example, it might be totally correct that&nbsp;<code>head/</code><code>halo</code>doesn’t refer to anything, because the character starts out without a halo — she only gets that once she’s completed the Holy Mission. If a tool spews out false positive warnings about things like this, users will soon learn to ignore them and miss the actually valuable warnings about real typos.</p>
<p>Similarly, tools can’t be too aggressive about updating references when objects are renamed either. Suppose that you designed a really cool robotic left eye for the character. Then you decide that it would look better as the right eye, so you move it into the right eye socket and rename it from&nbsp;<code>left_eye</code>&nbsp;to&nbsp;<code>right_eye</code>. If the references are auto-patched, all references to the left eye will now be changed to the right eye, which probably isn’t correct. For example, if the&nbsp;<code>left_eyebrow</code>&nbsp;had a reference to its eye, and that reference was auto-patched, the&nbsp;<code>left_eyebrow</code>&nbsp;would now think it sits over the&nbsp;<code>right_eye</code>. On the other hand, some references could have meant “the robotic eye” rather than “the eye in the left socket” when they talked about&nbsp;<code>left_eye</code>&nbsp;and those references&nbsp;<em>should</em>&nbsp;get patched. Pretty messy and hard to make a nice UI for, although&nbsp;<a href="http://bitsquid.blogspot.com/2010/10/dependency-checker.html">I’ve tried before</a>.</p>
<p><strong>Names are cumbersome — coming up with meaningful names for everything is a chore</strong></p>
<p>As discussed above, a&nbsp;<em>name</em>&nbsp;isn’t just a string of characters, it is a description of a role, of a relationship.&nbsp;<code>head/left_eye</code>&nbsp;means the left eye object in the head of the character. If you gave it a nonsensical name like&nbsp;<code>bob</code>&nbsp;or an auto-generated name like&nbsp;<code>Object_13</code>&nbsp;it wouldn’t say anything about the role.</p>
<p>To take advantage of the late binding feature of names you want to use meaningful names that match the concepts that you have in your game. I.e. if your characters can put on different helmets and backpacks you probably need&nbsp;<code>helmet</code>&nbsp;and&nbsp;<code>backpack</code>names. If helmets and backpacks are just visual features of some character models, can’t be removed or swapped out and don’t have any gameplay purpose, they might not need their own names, they might just be part of the&nbsp;<code>head</code>&nbsp;and&nbsp;<code>body</code>.</p>
<p>You can think of this naming as sort of a “logical rigging” of the model.</p>
<p>So yes, if you want to take advantage of late binding, you do have to spend some time coming up with meaningful names and hierarchies. If all your objects are just named&nbsp;<code>entity_2713</code>&nbsp;you are basically just using names as IDs. This has all the drawbacks of names (fragility, costly resolution) as well as all the drawbacks of IDs (unreadability). Don’t do that.</p>
<p><strong>Names are expensive — they have to be resolved</strong></p>
<p>Again, late resolve is the point of using names, and it will always have a cost. You can’t get the benefit of late resolve without paying the cost for it.</p>
<p>Of course, it can be more or less costly, depending on how you implement it. My most important performance tip is: be clear about the scope in which names are resolved.</p>
<p>I like to use fully qualified paths. I.e., referring to a character’s left eye would be&nbsp;<code>head/left_eye</code>. Referring to the left eye from the right eye would be&nbsp;<code>../left_eye</code>. Here,<code>..</code>&nbsp;goes up to the head and then&nbsp;<code>left_eye</code>&nbsp;descends to the right eye.</p>
<p>It can be tempting to fall into the trap of convenience and say that we should be able to just use the name&nbsp;<code>left_eye</code>&nbsp;to refer to the left eye instead of a full path, but it has scary performance implications. Instead of just searching our children for a name match, we now have to search all our descendants recursively. And if we want this to work from the right eye too, we not only have to search all&nbsp;<em>our</em>&nbsp;descendants, we have to search all our&nbsp;<em>parent’s descendants</em>&nbsp;too. Before you know it, you have to search the entire world for this&nbsp;<code>left_eye</code>. And even if we find it, how do we know it is the “right one” — the one the user meant? Maybe our helmet has a little statue on it, and maybe that statue has a&nbsp;<code>left_</code><code>eye</code>too? How do we make sure we don’t find that one? Messy.</p>
<p>My preferred implementation for resolving paths is to first hash each part of the path (this can be done offline), and then at each step, we match the hash at that step against the hashed names of the current object’s children — either through a lookup table or directly. Maintaining a lookup table is probably only worth it once you start to have hundreds of children to match against.</p>
<p>Even though this avoids really expensive stuff like searching the entire object tree or doing string comparison, it is still a lot more expensive than just following a pointer (which an ID deserializes into).</p>
<h2 id="names-vs-guids-the-smackdown">Names vs GUIDs — The Smackdown</h2>
<p>With this deepened understanding — who wins, names or GUIDs?</p>
<p>As discussed above, names have many disadvantages — they’re fragile, cumbersome and costly. But they have two main advantages:</p>
<ul>
<li>
<p>They express intent. When I refer to&nbsp;<code>head/left_eye</code>&nbsp;it is clear to the reader what I&nbsp;<em>want</em>&nbsp;to refer to. Thus names have a&nbsp;<em>meaning</em>&nbsp;that pointers/GUIDs don’t have. Recording this meaning can be helpful. When we complain that identifiers are&nbsp;<em>unreadable</em>, it is the lack of&nbsp;<em>meaning</em>&nbsp;we talk about — not just the fact that the identifier is a jumble of hex characters. But meaning requires explicit intent. If your object is named&nbsp;<code>entity_23415</code>&nbsp;— there is no meaning in the name, it might just as well be called&nbsp;<code>cc1b9a7b-a5bb-4355-8cf1-f78b74fe2774</code>.</p>
</li>
<li>
<p>They allow for “late binding” of the reference to an actual object. This allows new objects to “take the place”/”fill the role” of the originally referred object. Using this, we can “patch” objects in lots of interesting way. For example, we can take a character, replace its eye with something else and all references to the eye will still work. To do this with GUIDs we need to patch up all references too, so that they point to the “new eye”.</p>
</li>
</ul>
<p>So which is best? It comes down to a judgment call.</p>
<p>My take is this — we’re trying to create a high-performance user-friendly game engine. Thus, we only want to pay the costs of using names (bad performance &amp; fragility) in the cases where we really take advantage of their strengths (intent &amp; late binding). In my experience — most of the times, we&nbsp;<em>don’t need</em>&nbsp;these features. For example, when you are placing a bunch of trees in a level you don’t really care about naming them and you don’t have any need for something else “assuming” the role of one of those trees.</p>
<p>For this reason,&nbsp;<em>The Machinery</em>&nbsp;uses GUIDs as the default way to represent references. When a model refers to a texture, it does so with a GUID. You can move or rename the texture and the model will still keep using the same texture until you explicitly point it to a different one.</p>
<p>But in addition to this, we also explicitly allow for&nbsp;<em>name</em>&nbsp;references in some systems — systems that we think benefit from the extra flexibility:</p>
<ul>
<li>
<p>Our visual scripting system has a “lookup entity by name” node, allowing entities to be referred to semantically (such as&nbsp;<code>head/left_eye</code>) from within the scripts.</p>
</li>
<li>
<p>Our animation system is still under construction, but we plan to have a similar feature there, allowing animations to target loose and flexible things such as&nbsp;<code>helmet/headlight/color</code>.</p>
</li>
</ul>
<p>Having two different “kinds” of references like this is in the engine by no means ideal. Whenever we are designing a system we have to ask ourselves: does this reference need the flexibility offered by names or is using a GUID OK? We are also possibly missing out on some flexibility — in the cases where we’ve decided to use a GUID, it is much more cumbersome for the user to achieve the kind of dynamic retargeting that names make easy.</p>
<p>Still, it seems like the best compromise to me — we get the performance and stability of GUIDs/pointers in the majority of the code, but can still use the flexibility of names in the situations where we think it’s needed.</p>
<h2 id="see-also">See also</h2>
<ul>
<li>
<p>Pixar’s&nbsp;<a href="https://graphics.pixar.com/usd/docs/index.html#IntroductiontoUSD-Whatcan'tUSDdo?">USD</a>&nbsp;format uses names for everything. From their viewpoint, having to occasionally fix broken references is worth the extra flexibility they get from using names everywhere. Of course, since they’re not primarily targeting real-time rendering, their performance requirements are different.</p>
</li>
<li>
<p>I’ve&nbsp;<a href="http://bitsquid.blogspot.com/2014/06/what-is-in-name.html">written about this topic before</a>, if you want to see how my viewpoint has shifted over the years. Note though that the focus of that article is a little bit different. In that article I’m talking about referencing assets/resources on disk, so when I mention a&nbsp;<em>path</em>&nbsp;in that article I mean a&nbsp;<em>disk path</em>. Whereas, in this post, I’m talking more about references in general and I’m not that concerned with exactly how things get serialized to disk.</p>
</li>
</ul>
</div>
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

Forum software by © MyBB Theme © iAndrew 2016