We fielded several questions as part of our recent webinar (recording and slides available here): Knowledge Graphs vs. Property Graphs – A brief Overview and Comparison
The list below includes questions we were able to answer during the webinar as well as questions we did not get to in the webinar.
Questions and Answers:
Q1:How do knowledge graphs compare to property graphs in supporting graph analytics, which property graphs are known to be good for?
Yes, majority of implementations based on property graphs use some sort of graph analytics. Features like node centrality, shortest paths, and so on.
There is nothing inherent in the LPG data model versus the RDF data model that makes these kinds of analytics work better. The main question is whether the necessary functions are available. It is a matter of what functions that the system you will end up using supports; and how well they meet your needs – not the graph data model itself.
Graph analytics functions are generally available from the property graph vendors and, increasingly, they are also available from RDF graph vendors.
From the perspective of a technical implementation and what is actually “under the hood”, this question brings up something called “index-free adjacency”. It is an approach to data storage where each node stores a physical address of a node it is connected to. This design was pioneered by Neo4J and it is positioned as their unique advantage.
Initially, Neo4J even tried to define a graph database as a database that uses index-free adjacency. This did not work because many graph databases do not use this approach. Nevertheless, Neo4J has a large marketing budget and, as a result, we see various consultants saying that this is the most important thing about graph databases – something that essentially defines them.
In reality, the use of index-free adjacency makes jumping between adjacent data nodes fast, but does nothing for jumping between data, schema, and schema-as-data – which is important for working with complex data. It also creates limitations in processing the so-called “dense nodes”. And it has serious limitations in addressing distribution of a graph.
For example, take a look at a blog from ArangoDB explaining why they do not use index-free adjacency. And while TigerDB does use it, in its most recent release indexes were added because the index-free approach did not address the needs of many customers. Indexes can be actually quite capable and rival performance of the index-free adjacency for quick graph traversal. Increasingly, there are hybrid strategies as well.
In the end, each storage design has pros and cons. The Neo4J design works well for larger graphs with a simple and consistent schema. It is not work well for complex data and for managing the schema itself (semantics) as part of the graph data.
Q2: Since it is said that Property Graphs can store anything, why can’t they store models of the information they contain?
In principle, it is possible. One could design an approach for capturing models as part of a property graph. However, due to the richness of, for example, SHACL this would not be easy. We are talking about complex schemas for supporting complex data.
Making these definitions to be “native” to the graph data model so they are actually treated as schema for the data in a graph would be even more challenging. Since property graphs were not designed with this goal in mind, it would probably require fundamental changes in the LPG systems.
Further, anything like this would need to be a standards initiative so that it is not a proprietary approach
Q3: Is the knowledge graph space mature enough that there are established best practices for knowledge graph engineering and design? e.g. Are there degree programs that cover this stuff? What is the best way to get good at building these things?
Yes, a number of universities offer courses on these subjects as part of their Computer Sciences and Informatics and Library Sciences programs. There are also training programs offered by the technology vendors, including TopQuadrant – see https://www.topquadrant.com/training/introduction-to-semantic-web/.
Generally speaking, the three important components to learn are RDF, SPARQL and SHACL. Here are two examples of the university level courses covering RDF and SPARQL https://www.futurelearn.com/courses/linked-data and https://libraryjuiceacademy.com/shop/course/054-introduction-to-linked-data/. Here is an example of a university seminar on the Knowledge Graph topics: https://web.stanford.edu/class/cs520/. These are just examples, we have not attended these classes and are not in a position to recommend them.
With respect to the methodologies, general object oriented and domain modeling techniques are applicable to knowledge graph modeling.
Yes, there are best practices for knowledge graph engineering and design. This is a large scope topic for which we can only outline a response here. TopQuadrant offers training classes and workshops that cover knowledge graph engineering technology, methodology and best practices informed by the experience and expertise that have developed over many years working with customers in many domains. Here is a brief outline of just a few of the many areas that require consideration.
Some Guidelines for Knowledge Graph Architecture and Ontology Modeling
- Deliver Value Early and Extend Incrementally.
- Design and Use Consistent URI and Naming Conventions.
- Use “Test-first” Methodology by Defining Typical Queries Early and Using them to Guide Ontology Development.
- Define a Modular Knowledge Graph Architecture to Support Future Extensions
Q4: Could a knowledge model show the cardinality between entities, in order to represent business rules more explicitly? Does it show the business definition clearly? Please show an example if so.
Yes, it can. We showed a cardinality example on the slide entitled “More Schema …”
It showed that the maximum number of allowed values for the “released” property was set to 1.
For defining such rules, TopBraid EDG uses SHACL – a highly expressive W3C standard for defining data models. In SHACL, cardinality constraints are expressed using sh:minCount and sh:maxCount.
By default, if nothing is stated, then sh:minCount is zero and sh:maxCount is unlimited.
Q5: It seems like slide 8 shows some good “sample data” with some instances, but not showing the “comprehensive business rules” for all the instances.
Correct, slide 8 shows an example property graph with data about actors, movies, etc.
As we explained during the webinar, property graphs contain only instance data, they do not contain models/semantic of the data.
Q6: Why would you say Properties are different from Relationships? As I understand, a ‘Property’ is called a ‘Relationship’ when the ‘Object’ is another ‘Subject’.
Property graphs and RDF Knowledge Graphs use different terminologies.
For RDF, indeed, a property could be a relationship (if a value is another resource) or it could be a literal value (e.g., a string or an integer). In both cases, they are called properties.
For property graphs, a property is always a literal, stored as key value pairs. The values in the key value pairs are NOT nodes in a property graph. Nodes have identifiers, they are connected to each other via relationships (or edges) and they can have property values “hanging” from them.
Q7: Can you tell which Ontology is available for Manufacturing?
The domain of manufacturing can be partitioned, top-down, into Recipe Management which concerns itself with how lower domains have to be configured and sequences of parameters issued, such as setpoints and alarm settings, and metrics or the manufacture of different materials and products, Process Management, Supervisory Control and Data Acquisition (SCADA), Programmable Logic Controllers (PLC), then, at the lowest level, Devices, Sensors and Actuators.
There are ontologies for many of these sub-domains. For the lowest two domains, TopBraid EDG has an optional package which builds off a “Technical Assets” ontology to represent hardware assets. This could be augmented with the IoT(Internet of Thing) ontology, and the SSN (Semantic Sensor Network Ontology) from W3C. For the upper domains, there are choices, but these depend on what is being manufactured. To state the obvious, car manufacture is very different to semiconductors. TopQuadrant has built and assisted the building of ontologies for these upper levels on customer projects.
Orthogonal to the partitioning described above, are ontologies for measurements and quantifiable data. For the later, QUDT (http://qudt.org/) is an ontology with instance graphs for units of measure, quantity kinds, dimensions, and types.
Q8: Can you explain why the RDF imported into Neo4J will not meet expectations?
It really depends on your expectations 🙂
This webinar was focused on exploring differences between Labeled Property Graphs and RDF Knowledge Graphs.
As you have probably experienced in other situations, data can be transformed and ported from one place to another. As a result of the port, you may be losing some information. Once ported, you are in a different environment from the original one – you use the capabilities of that new environment and operate within its limitations.
As we discussed, it is possible to connect to a Property Graph and generate RDF Serialization of its data. You can then load it into a product that implements RDF Knowledge Graphs – such as TopBraid EDG. Once you do so, it is no longer a Property Graph – it is an RDF graph. EDG can then auto-generate an ontology model from this data. For details, see https://doc.topquadrant.com/6.4/ontologies/#Creating_Classes_and_Property_Shapes_from_Data.
Similarly, it is possible to load RDF data into a product that implements a Labeled Property Graph data model – such as Neo4J. Neo4J offers a module that will let you map RDF, configure and load from RDF serialization. With other products, you may need to write a converter. In either case, you will be able to only load data. Ontology definitions will be treated as another set of data, not as a model of the data. Once you port your RDF, you no longer have an RDF Knowledge Graph, you have a Property Graph. This means that you have the limitations associated with that.
Q9: Is it possible (using the KGs) to represent the software that makes data?
Yes, anything can be represented in a knowledge graph. In fact, TopBraid EDG ships with pre-built ontology models to support such use cases out of the box. These models describe data sources, software and other relevant assets.
During the second demo we showed an example describing have several business applications contribute to generating Patient Discharge Form.
Q10: Is it possible to do the things you showed in your demo with TB Maestro? What are the main differences between Maestro and EDG?
TopBraid EDG is a server-based product for working with knowledge graphs. It supports multi-user collaboration and provides services and APIs that can be used by other enterprise applications. TopBraid EDG is deployed as a Java application running on Apache Tomcat.
TopBraid Composer Maestro Edition is a single user Eclipse plugin. It is a desktop IDE. It is “not networked” i.e., can’t be used over HTTP by other systems. However, for the application development purposes it bundles Jetty web server running on localhost. For evaluation of TopBraid EDG, it includes a single user, demo-only version of EDG that can be launched from Eclipse.
The demos we showed today were run using TopBraid EDG on the localhost from TopBraid Composer.
Q11: Any support for RDF* and SPARQL* to handle edge properties?
Yes, this is what we have shown in the demo when adding a role to the “acted in”. TopBraid EDG also supports GraphQL queries for such reifications.
For interoperability with other systems, they are exported as RDF statements. On import, we also convert RDF statements into RDF* style reifications.
See the following in the user guide to learn how to declare what properties should be reified and with what values: https://doc.topquadrant.com/6.4/ontologies/#Enabling_Reification_of_Property_Values
In GraphQL, if your property has a dash:reifiableBy shape then you should see an additional GraphQL field ending with _reif such as in
We also use this approach to build ordered list for property values – order is stored as a reification.
Q12: How does the graph staging or partition relationship function within TopBraid? There was a slide with reference to this material with KG1, KG2, and KG3.
In TopBraid EDG, information is organized into “asset collections”. Technically each asset collection is a named graph. These named graphs can be used individually and they can be included into each other by references (using owl:imports) creating a combined knowledge graph.
- TopBraid EDG comes pre-built with different collection types for different purposes. For example:
- Ontologies are asset collections that contain classes, properties, shapes and rules. Taxonomies are asset collections that use SKOS (or custom extensions of SKOS) to store concepts and their relationships.
- Data Asset collections are collections that use TQ built ontology describing data sources (databases, datasets, etc.). They store information about such data sources including profiling and sampling of data.
- Crosswalks are asset collections that contain only links between resources in two other asset collections.
And so on, there is more available. See https://www.topquadrant.com/products/topbraid-edg-gov-packs/asset-collections/.
Q13: Is Apache TinkerPop framework a standard for LPG graph?
There are currently no official standards for LPG. There are some technologies that are supported by multiple vendors.
Apache TinkerPop is an open source graph computing framework that is integrated with some property graph and with some RDF graph databases. It offers the Gremlin language which is more of an API language than a query language.
Q14: What are TQ EDG plans for future development regarding SHACL vs OWL and why?
Currently, TopBraid EDG offers the following:
- Auto-conversion of RDFS/OWL to SHACL
- Powerful and user friendly editor for creating SHACL shapes and constraints
- Basic features for creating OWL and RDFS classes and properties
- Optional support in the editor for creating OWL axioms using Manchester syntax
- Model-driven UI based on SHACL definitions
- Auto-generation of GraphQL Schemas from SHACL models to provide GraphQL query of RDF data – with introspection, supporting both, read and write queries
- Data validation based on the SHACL constraints
- Dynamic inference based on the SHACL Property Value Rules
- “Batch” inference based on the SHACL SPARQL and Triple Rules
- Auto-conversion of SHACL to RDFS/OWL for export to external systems
Our immediate plan for the next release is to extend ADS scripting. Based on the early feedback from users, we believe that it will become the preferred choice for extending TopBraid EDG with custom reports, imports, exports and more.
We are also planning to create additional pre-built templates for creation of SHACL Property Value Rules. We will need user input on what templates they would like to have.
Beyond the next release, we are looking at adding a caching layer for expensive SHACL Rules to deliver the best possible performance while continuing to support maximum expressivity and flexibility.
TopQuadrant has no plans for adding new capabilities for OWL. SHACL is highly expressive and open to extensibility. We believe that it is very well suited for delivering the rich and scalable functionality needed by our users. It is also much better aligned with the traditional approaches to data modeling (e.g., object oriented modeling) and, thus, is easier to understand and master for the new users. We consider OWL to be an “older generation” knowledge representation language for RDF graphs that, for a variety of reasons, did not reach industry adoption. SHACL design incorporated lessons learned from the industry experiences with OWL.
This opinion seems to be shared by the majority of other vendors of the RDF-based technology. Today, just three years after SHACL became a W3C standard, all major triple store vendors support SHACL and few support OWL. OWL is traditionally associated with OWL DL which has proven to be not appropriate for the typical enterprise use cases. The software that focuses on OWL is mainly produced by academia (e.g., Protege) and is not suitable for enterprise scale production use.
Q15:Are there any heuristics (may be AI based) that would enable auto conversion of relational models into Knowledge Graph based models?
Yes, we have a capability to create an ontology from a relational database by creating classes from tables and properties from columns where certain columns are turned into relationships – based on the foreign keys. This does not require any AI since relational models are explicit.
It is stored using RDF* style reification, but not exactly. Right now, we are using the approach described here http://datashapes.org/reification.html – see http://datashapes.org/reification.html#uriReification for specifics.