By: Irene Polikoff, co-founder of TopQuadrant
The idea for this blog came from a question asked by a customer. They said:
“When do you use OWL and when do you use SHACL? Isn’t OWL about semantics and SHACL about data validation? And what about RDFS?”
My answer was that I no longer used RDFS/OWL. I now use SHACL for everything. The only parts of RDFS I use are subclasses. I stopped using RDFS/OWL back in 2017 when SHACL became the official W3C standard.
Why? Because I like to keep things as simple as possible. I prefer to use one language rather than two because it is simpler. The one language I use in my data modeling is SHACL. If using two languages had some advantages, I would consider using SHACL plus OWL provided benefits exceeded costs. However, I see no real benefits in adding OWL to the mix.
SHACL delivers everything I need nicely. I work a lot with different customers of TopQuadrant. SHACL also delivers everything they need nicely. I, and everyone else at TopQuadrant, learned and used OWL before we learned SHACL. If OWL addressed our needs and the needs of our customers well enough, we would have stayed with it and not moved on to SHACL.
Broad adoption of a technology requires:
- Relative simplicity – so that it could be learned and used by many
- Ability to deliver strong value – this means good support for common requirements
We feel that OWL does not meet these criteria.
I will use the rest of this blog to discuss why, after being a strong supporter of using OWL, I completely abandoned it. In the next blog I will describe why I find SHACL a better fit for my needs and the needs of TopQuadrant’s customers.
My History with OWL
First, let me talk about my personal journey in using OWL. I first started to use it back in 2002-2003 when it was still in development. At that time, we were talking about DAML+OIL and how it was becoming OWL.
Since then, I helped to develop training curriculum for RDF, RDFS and OWL. I worked closely with several people who developed these standards. At one point, they were a part of TopQuadrant team. I trained and mentored many people from many organizations. I created large ontology models in OWL. As part of designing our products which are model driven, I came up with different ways of using ontologies. I also worked with many customers as they tried to adopt these standards, build and use ontologies.
I have a degree in mathematics and feel comfortable thinking about sets and set algebra – important when using OWL.
Why I was interested in OWL
OWL was an exciting language for me primarily because of RDF. RDF provides a flexible graph data model which I believed and continue to believe to be extremely valuable for supporting enterprise data processing.
Before I started to work with the Semantic Web standards, my team at IBM built our own version of a graph database. We felt that graph databases were important to support flexibility in applications and data. As popular as RDBMS were and continue to be, they are not a good fit for everything.
Graphs are powerful and flexible, but to realize their full potential they need a schema language – language with which one could express the model of the data in a graph. RDF Schema is described as a schema language for RDF. However, it is a very minimalistic language. You can’t even define cardinalities with RDFS. Nor is it a data definition language. It is a data categorization language.
OWL promised to offer a rich language for describing data we work with. You could use it to describe data objects conceptually, without worrying about tables, foreign keys, joins, etc. In other words, we no longer needed to start with a conceptual model that is close to the business understanding of data, then translate it into a logical model and, finally, into a physical model which would no longer resemble our understanding of the data objects. We could now define and directly use, at runtime, models that reflected our understanding of data. This was exciting!
And, indeed, you could express fairly rich models in OWL circa 2004. In 2009, OWL2 came along and now you could do even more with OWL.
So, why I don’t use OWL anymore?
The Shortcomings of OWL
Neither RDFS nor OWL were created to address the real world data processing challenges I have faced throughout my career and was passionate about. I wanted a data modeling language to describe enterprise data objects.
RDFS and OWL, on the other hand, were designed to describe what could be inferred about “real world things”. In other words, if you know some facts about some thing in a world, RDFS/OWL define what additional facts you could infer. People who designed OWL were academics. Their concerns and notions about what was important were too removed from the practical needs in the field.
As the result, the fundamental design ideas in RDFS/OWL manifested in the following key issues:
- The Open World Assumption
- Confusion Around Reasoning
- Having Too Much (features) and, yet, Not Enough
Let me explain each of these issues.
1. The Open World Assumption
Both, RDFS and OWL use the open world assumption. In simple terms, it boils down to two concepts:
- Absence of information can not be used to conclude that it does not exist
- Two things with different identity are not necessarily different things.
This assumption immediately makes cardinalities and, to some extent, restrictions on class membership in OWL models practically unusable:
- Yes, you can say that a person must have a date of birth. But what does this mean?
- Let’s say you get a data object representing a person and it does not contain a date of birth.
- Well, this is not a problem in OWL because the fact that there is no date of birth present in your data, does not mean that it does not exist somewhere else in the universe.
- You could also say that a person may have only one mother.
- Let’s say you get a person with two mothers.
- Not a problem, may be both of them are the same individual.
- Unless it can be proved based on the contradictions in the model and data that they can’t be the same, we don’t know that they not the same.
- You could say a product must be made of material that is metal.
- Let’s say you got information about a product that is made out of a material that is not metal.
- Not necessarily a problem, may be we simply do not know that its material is metal.
- It will just be inferred that the material is metal.
All of this comes as a surprise to pretty much any data modeler or software developer that tries to learn OWL. The concept is so foreign that the understanding of it and its implications do not stick with people. Even people who have worked with OWL for 5+ years tend to be constantly surprised by what a reasoner comes back with or does not come back with.
What happens if you again and again tell a person who is trying to work with this technology that their assumption or understanding is wrong? After a while, they are likely to decide that this is just not for them. They prefer to use something else that more naturally fits their intuition. OWL and, by extension, RDF quickly got reputation for being hard to learn and work with.
Yet another reaction is “let’s forget what it is supposed to mean”. And pretend that it means what we want it to mean. This approach is contrary to the benefit of having a standard in the first place. And if a standard does not mean what you want it to mean, then, perhaps, you need a different standard.
2. Confusion around Reasoning
Reasoning (or inferencing) is typically listed as one of the strongest advantages of RDFS/OWL. These languages are designed for reasoning. Every statement is defined by what could be inferred from it. This is the semantics or meaning in RDFS/OWL.
What does something mean? It means that the following additional statements can be inferred from it. RDFS/OWL specify precisely what these additional statements are.
Sounds intriguing and promising, right? So, why do I list it as a shortcoming?
- The outcome of the inferencing (and, thus, the meaning of the statements) is often unintuitive and hard to understand
I provided some examples in the previous section and could easily add more. For example, the difference between the meaning of global statements in RDFS about property’s domain and range and local restrictions on property in OWL.
Further, it is common for modelers to not remember or understand the inferences (semantic meaning). As a result, their models do not necessarily mean what they think they mean. This brings me to the next point.
- Too many flavors of OWL
The initial OWL specification, described three profiles of OWL: OWL Lite, OWL DL and OWL Full. In creating OWL Lite, authors hoped to define a standard that vendors will find it easier to implement than OWL DL. This hope never realized, OWL Lite never got much uptake. OWL Full was there because of the mismatch in semantics of RDFS and OWL.
OWL 2 introduced a number of other profiles. The goal of new profiles was, again, to target different implementation technologies – to identify subsets that different vendors will be able to implement with relative ease. And will want to implement because a profile would deliver value by supporting common needs of their customers. In addition to the original three, we now also got:
- OWL 2 RL – a subset of OWL 2 that could be implemented using rules-based technology. It was designed to accommodate applications that can trade the full expressivity of OWL 2 for efficiency.
- OWL 2 QL – an intersection of RDFS and OWL 2 DL that was believed to be implementable in RDBMS. It is defined not only in terms of the set of supported constructs, but it also restricts the places in which these constructs are allowed to occur.
- OWL 2 EL – a subset of OWL 2 that is focused on reasoning capabilities needed by very large ontologies with tens and hundreds of thousand classes. While the previous two profiles are about optimizing for working with large data, this one tries to optimize for working with large models and no (or very little) data.
Why all the different profiles? Because OWL is very expensive computationally. It was defined around tableau algorithms which reason by looking at all possible conclusions from each statement, then ruling out ones that contradict each other. It is a mathematically sound approach and it is proven to be decidable – meaning that the algorithm will complete and return one decision. However, it does not mean it will complete in acceptable amount of time for your application – or even while we are still alive.
If you are developing OWL ontology, how exactly do you know what profile of OWL you are using in your model? More importantly, how do you know what you’d need to change in your model if you want to stay within a given profile? This is often hard to know. Even more importantly, why should you care? This leads me to the next point.
- Limited and Inconsistent Vendor Support
After you develop your ontology, you will presumably want to put it to use. Commercial support for OWL reasoning is, at best, inconsistent. I highlighted commercial because commercial support is necessary if the models are to be used broadly and for important applications.
- No one supports OWL DL. One exception is Stardog, but they recommend against it and say it may disappear in future versions.
- Most vendors support OWL 2 RL. It is the easiest to support since rules engines are typically available in most deployment environments. They also support rules outside of the OWL 2 RL profile, in their own proprietary languages and these get used often. The RL profile (and even OWL in its entirety) does not address all the data definition needs of applications. More on this in the next section.
- Many vendors support something they call RDFS+ or RDFS++ profiles. There is no commonly accepted definition of what this means.
- By and large, OWL 2 QL and OWL 2 EL are not supported.
If the meaning of your ontology is defined by what could be inferred, then the environment you deploy it to needs to have reasoning capabilities. The capability of the target environment would then determine what you could and could not put into your ontology and what your ontology actually means.
This, again, is too complex and confusing. As a result, most implementations simply resort to writing their own code that determines what an ontology actually means in the context of their application. They write some SPARQL queries over ontologies or use APIs to read the statements in the ontology. In doing this, they often interpret the meaning liberally, focusing not on the formal specification, but on what they actually need or want. Such approach means that practically speaking there is no consistent and interoperable semantic standard.
3. Having Too Much and, yet, Not Enough
If you develop OWL ontologies the way they intended to be developed, you will end up with proliferation of classes, both named and unnamed (anonymous). This is because OWL ontologies are based on the set theory and on reasoning about sets. To precisely and correctly specify the intended meaning, you will need to use intersections and other set operations, think about whether you class is equivalent of a restriction or a subclass of a restriction and so on. Few data modelers are accustomed to doing this. Even fewer people who would need to understand these models are accustomed to such approach to modeling.
OWL reasoning is primarily about classification e.g., given information about a resource we can infer what class or set it belongs to. And/or given information about a class as a set of resources, we can infer what other class it is a subclass of or is an equivalent class or disjoint class of. This strength does not really align well with the common needs of data processing.
OWL is very expressive, but it is often quite cumbersome to create relatively simple definitions in it. Further, there is still no way in it to make definitions that are very commonly needed. For example, you can’t say that:
- person’s full name must equal a concatenation of their first and last names
- duration is the difference between the end and start dates
- start date can’t be greater than the end date
Anything that involves some operations can’t be stated. This means that key parts of what one would commonly want to express in a model can only be written in either application code or a rules language proprietary to a given RDF tool.
OWL is not extensible, it is a closed language. You can’t declaratively extend its meaning to support your use cases. You can’t declaratively limit it either. If, for example, you built your ontology targeting OWL-QL, there is no standard way to say this.
I could continue this blog some more. For example, I believe that defining XML serialization of OWL was a mistake that, again, created confusion. Confusion and lack of clear messages always impacts adoption. Distancing OWL from RDF was not a good decision. And so on. But if I did so, this would not be a blog, but a book or, at minimum, a white paper.
Ultimately, I came to a conclusion that if I had to explain the core basics of OWL to people again and again and constantly figure out how to work around its issues, the problem was not with me nor was it with them, the problem was with OWL.
Too often, OWL proponents are motivated by wanting to keep their guru status. Complexity is a plus if this is your motivation. My own interest in OWL had a different motivation. It always included a vision of having flexible data and flexible and rich way of defining data, that could become broadly adopted. As stated in the beginning of this blog, in my opinion, OWL failed the criteria necessary for broad adoption. In my next blog, I will talk about SHACL and why I believe it is a better modeling language than OWL.