Relationships in RDF graphs have a direction. Frequently, direction is clear from the name of the relationship. For example, John receives Employee of the Year 2020 Award. It is clear from the name of the relationship receives that the direction is from John to the award he received.
Clarity in naming is best achieved when a verb term is used as the name of a relationship. Sometimes, however, it is hard to find an appropriate verb and a noun is used instead. For example, John parent Jane. In this case, it is often not clear who is the parent and what is the direction of the relationship. Should you add something to the name to indicate the direction? If so what should be added?
When verb is used for a name, should you use the present or the past sense? For example, receives versus received?
Whether you use a verb or another part of speech in the relationship name, there is often a question of the direction of a relationship. Should you use receives or received by (which goes in the opposite direction) or, may be, both? I said “often” and not “always” because some relationships are understood to be bidirectional or symmetric. How should you identify them?
This blogs attempts to answer these questions and provide recommendations on the best practices.
Direction of a Relationship
A simple rule in selecting a direction in which the relationship will be stored is to go from the many-to-one direction. If a relationship is many-to-many, then choose many-to-few if such distinction could be made. Thus, to select the direction of a relationship, you need to understand the data it will be used with.
- A database table can have many columns. A column belongs to only one physical table. It could also potentially belong to a number of virtual tables or views, so, at most, a link from a column to a table will be many-to-few. Chose column of as the relationship to store.
- A person could receive multiple awards. An award can be given to multiple people and/or parties. Still most people only have a few awards, sometimes no awards and sometimes just a single award. While awards always have at least one recipient and most awards have many, often, very many recipients. Choose receives as the relationship to store.
If a relationship is one-to-one or there is no way to determine the direction that will be many-to-few, then choose the direction that you believe will result in the clearest name.
Modeling of Inverse Relationships
The existence of a relationship typically implies a different relationship going in the opposite direction e.g., if John receives an award, conceptually, this means there is either received by or given to or recipient relationship from an award to John. In the section below we talk about how to choose relationship names. In this section, we will talk about how to define an inverse relationship.
First, should you store inverse links? The simple answer is No. You can navigate RDF graphs in both directions. Storing extra triple statements bloats your data. This negatively impacts index size and performance of queries. It also creates a need for so-called truth maintenance. For example, if we discover that John did not receive an award and we stored both receives and received by relationships, we now need to update both of them.
If you are not storing the inverse links, then you do not need the URI for the inverse predicate. In other words, you would only have
ex:receives URI and no
ex:receivedBy URI. Note, for example, how in RDFS, there is
rdfs:subClassOf, but no predicate for the relationship from a parent to a child class. Some ontology models, however, do define URIs for relationships in both direction. This is not a good practice. If your ontology has URIs for properties going in both directions, then you are leaving it to ontology users to choose which one they will use. It is quite possible that different users will choose different properties. You have just made data interoperability harder.
You may, however, in some cases need to explicitly support the inverse direction. You may need to:
- Display the opposite relationship when you show information in the UI and give it a clear name. For example, when displaying information about an award, you may want to list all recipients of an award, making it clear their relationship to the award.
In SHACL, this requirement is supported by declaring a property shape with an inverse path. For example, you would use a simple path to declare the shape for
a sh:PropertyShape ;
sh:path ex:receives ;
sh:class ex:Award ;
sh:description “Relates a party to an award they are a recipient of.”@en ;
sh:name “receives” ;
And a property shape with an inverse path to give the name (and specify constraints) for the opposite direction:
a sh:PropertyShape ;
sh:inversePath ex:receives ;
sh:class ex:Party ;
sh:description “Relates an award to a party that receives it.”@en ;
sh:name “received by” ;
sh:name "received by" gives the name to the values that are found when navigating in the opposite direction of the
Modeling of Bidirectional Relationships
You can navigate and query RDF graph in both directions. For example, there could be a sibling or a neighbor relationship between two people. It works the same in both directions. John sibling James means the same as James sibling John. Thus, you do not need a different name for the inverse and do not need to create a separate property shape for the inverse.
The following property shape, when associated with the class/node shape Person says that the relationship works in both directions – because the value of the
sh:class constraint is the same class as the class the property shape is associated with using
a rdfs:Class, sh:NodeShape;
a sh:PropertyShape ;
sh:path ex:sibling ;
sh:class ex:Person ;
sh:description “Relates a person to their sibling.”@en ;
sh:name “sibling” ;
If desired, you can add a SHACL Rule to infer the symmetric relationship values in the other direction. You could do this for the inverse as well. The rule can be dynamically calculated (SHACL Property Value Rule) or static/materialized rule. However, this is typically not necessary for inverses.
Using a verb is preferable as it typically makes the direction of a graph relationship clear.
If this is not possible and you need to use another part of speech, you can clarify the direction by using a suffix or a prefix. If the direction of the relationship lets you use the suffix, then choose that over a prefix. Again,
rdfs:subClassOf is a good example. Here “of” is used as a suffix modifier to clarify the direction of the relationship.
Some people prefer to use “has” as a prefix. We do not like this convention that much and instead of using “has” in the name, prefer assuming that if the direction of a relationship is not clear, then there is a silent “has”. In other words, parent means has parent i.e., the object in the triple statement is the parent of the subject. When we say John parent Jane, it means that Jane is John’s parent. If there could be only one prefix “has”, adding it is mostly noise. Similarly, if you are using a suffix to clarify the direction of a relationship, do not also add “is” as a prefix. For example, note that we use
rdfs:subClassOf as opposed to
Purely from the naming perspective, a better approach would be to use parent of instead of parent. The preference for the use of a suffix is because you could utilize different suffixes to clarify the relationship e.g., of, by, to, etc. Suffixes add something more to the meaning than just the word “has”. However, if we follow the many-to-few rule, then strictly speaking the relationship should be from a child to the parent. A child can only have two parents and a parent can have many children. Thus, we should not use parent of. This is a good example where the directional rule can be somewhat fuzzy – a matter of judgement. Yes, parents can have many children. But many parents have only one child, many have only two children and, increasingly, fewer and fewer parents have a large number of children.
When you use a verb, you should always use the verb term for the present sense. This is because a relationship name should not embed in it the timeframe of the relationship. Data stays and time moves on. When you record a fact, it may be happening now. When someone retrieves this data later, for them, it happened in the past. Thus, embedding time in the name of a relationship is misleading. If you need to associate a time period with such statements, use one of the reification approaches.
Note that when referring to the name of a relationship, I am really talking about its URI or a local name part of the URI e.g., receives or subClassOf. It is customary for the label to be based on the local name with the camel case notation replaced by spaces. However, it is possible for the label to be somewhat different than the local name. If you decide to make the label different, it should not deviate too much from the local name. The meaning should definitely be the same. Any deviation could lead to confusion. Therefore, you need to weight the downside of the deviation against the perceived benefits.
Some ontologies use meaningless local names e.g., ex:p123. This is atypical for most knowledge graphs with the exception of the knowledge graphs with a very large scope that may be using thousands of predicates – like Wikidata. Our recommendation is to use meaningful names for relationships. This makes it much easier to understand and use the ontology model and graph data based on the ontology.
Making choices about the best name for graph relationships and/or their direction can be challenging. In this blog we have attempted to offer advice, with clear rationale, to help you in making modeling choices. In our practice, we found this advice to work well in many different situations.
In the end, however, best practices are simply guidance. You may have a situation that requires you to use an approach different from what is recommended in this blog. If this is the case, as you make your choice, be sure to consider pluses, minuses and implications of your decisions and to document them well.