Sometimes it is useful to link your data with entities from external knowledge graphs. For example, Wikidata contains background information about almost any topic in the world, such as the current population of each country. So in order to avoid duplication and manual data entry, your local repository may want to fetch the latest population count from the external sources. Wikidata also defines unique identifiers for each entity, so any vocabulary that talks about Australia could link to the Wikidata entity for Australia as a way of building links into a central knowledge hub.
TopBraid EDG 6.2 introduces a new capability that supports such links to Wikidata, and any other external knowledge graph that provides a SPARQL endpoint, including endpoints managed by your own organization. This article illustrates these new capabilities by establishing a link from a local People database to similar entities in Wikidata, reusing values such as date of birth, height and images from Wikidata.
We distinguish between local entities (aka resources or assets) and remote entities. Local entities are maintained “manually” by dedicated staff, while selected data from remote entities are periodically copied over and thus maintained automatically. Dedicated link properties (such as “Wikidata Person” in the diagram above) are used to point from local entities to remote entities. On-the-fly inferences using SHACL property value rules are employed to transform selected remote values into properties of the local entities, so that they can be queried just like any other local values:
Let’s now walk through the steps to make this happen, with TopBraid EDG 6.2.
Creating a Link Property
In this example we have an Ontology (schema) as an EDG asset collection. This is using a SHACL version of the schema.org namespace as a starting point. The class schema:Person already declares various SHACL property shapes for values such as schema:givenName and schema:height. In order to store links from schema:Person instances to corresponding entities on Wikidata, we introduce a new property called wikidataPerson:
Next we navigate into the SHACL property declaration by clicking on the grey box below:
Then we pick “Make this property a Wikidata link” from the context menu of the property shape:
This adds a value for the property dash:detailsEndpoint, linking it to the SPARQL endpoint of the Wikidata server: https://query.wikidata.org/sparql
Technical background: Whenever a property shape carries a value for dash:detailsEndpoint then TopBraid will understand that the values are URIs and that more RDF statements for these URIs can be queried from the given SPARQL endpoint. If the endpoint happens to be exactly the URL above then additional features for Wikidata get activated.
That’s all. The local schema:Person class is now linked to Wikidata.
Linking Local Instances to Wikidata Entities
Assuming we have an EDG Data Graph with people instances, and we have pre-populated it with the usual suspects from the Kennedys family:
Not much is known locally, except for the names of the people and their gender. However, the names of the people are sufficient to establish crosslinks into Wikidata. Assuming that some Wikidata link properties exist, the Transform tab offers a wizard-like feature that suggests suitable Wikidata entities based on (approximate) similarity of the labels:
Clicking on “Suggest Mapping to Wikidata – For all Assets” runs a sequence of queries to a web service kindly provided by Wikidata. This may take a while but can be interrupted at any time.
The resulting page can be used to review the suggestions and accept those that seem plausible:
Alternatively to the batch process, you can use “Suggest matching Wikidata entities…” from the context menu for each individual local entity. This would bring up a dialog such as the following:
Selected entities now have outgoing links to remote entities such as Q2685 for Arnie:
You can follow the link to explore whatever Wikidata knows about this individual:
Now that our local instances have references to corresponding Wikidata entities, we can start using the property values of the remote entities.
Defining the Shape of Remote Entities
Our schema doesn’t know anything about the remote entities yet. We need to tell the system which properties we are interested in, and what format they have. The W3C Shapes Constraint Language (SHACL) is well suitable for that job. We define a SHACL node shape that carries property shapes for the relevant properties. This acts like a “view” on the remote data and informs the system what kinds of SPARQL queries it needs to use to fetch the actual values.
Back in our example schema, we define a node shape called “Wikidata Person”. (Alternatively, we could make it a class too, yet here a shape is arguably cleaner). To get to the following screen in TopBraid EDG 6.2, make the node shapes visible using the small hollow circle button above the class tree and press “New” in the Instance panel:
TopBraid now offers another wizard that greatly simplifies the linkage with Wikidata. From the context menu of the new node shape, select “Add property shapes from Wikidata sample…”:
The resulting dialog asks you for the ID of any example instance that may hold typical values. In our example, we pick Arnold’s wikidata ID Q2685 and click on “Load”:
This dialog is fetching all properties of this sample instance, and allows you to browse the values. You can then select which properties you are interested in and (optionally) set cardinality and datatype constraints too. Above, we have selected the “height” property with a maximum cardinality (sh:maxCount) of 1, and datatype xsd:decimal.
We can repeat this process for other sample instances, for example to pick “death date” which wasn’t available for Arnold. Eventually, the system has generated suitable SHACL property shape declarations for all the selected properties, and attached them with our Wikidata Country shape:
For experts, here is the Turtle source code of this node shape:
people_schema:Wikidata_Person rdf:type sh:NodeShape ; rdfs:label "Wikidata Person" ; sh:property [ rdf:type sh:PropertyShape ; sh:path schema:description ; sh:name "description" ; ] ; sh:property [ rdf:type sh:PropertyShape ; sh:path rdfs:label ; graphql:name "rdfs_label" ; sh:name "label" ; ] ; sh:property [ rdf:type sh:PropertyShape ; sh:path wdt:P18 ; sh:name "image" ; ] ; sh:property [ rdf:type sh:PropertyShape ; sh:path wdt:P2048 ; sh:datatype xsd:decimal ; sh:maxCount 1 ; sh:name "height" ; ] ; sh:property [ rdf:type sh:PropertyShape ; sh:path wdt:P569 ; sh:datatype xsd:dateTime ; sh:maxCount 1 ; sh:name "date of birth" ; ] ; sh:property [ rdf:type sh:PropertyShape ; sh:path wdt:P570 ; sh:datatype xsd:dateTime ; sh:maxCount 1 ; sh:name "date of death" ; ] .
Now that we have described the shape of the remote entity, we tell our link property about it, using sh:node (or, alternatively, sh:class if the node shape is also a class):
That is enough to instruct the system about which values we want to fetch from the endpoint. However, it does not yet establish the relationship of these remote values with our local schema.
Defining Property Value Rules
Here we want the values of the local property schema:height to hold the same values as the property wdt:P2048 (aka “height”) of the remote entities from Wikidata. SHACL property value rules can be used to instruct the system that certain property values shall be computed on the fly, whenever they are queried. The resulting values are called “inferences” and are not editable in TopBraid EDG. A simple form of property value rule can be employed to walk from the local schema:Person instance into the associated wikidata person, and from there retrieve the height value. More complex rules can be defined to perform additional transformations, when needed.
You can either enter such rules by hand, or use the new wizard in TopBraid EDG. Start by navigating into the property shape that defines the local height property (click the grey box):
Once there, select “Create property value rule from template…” from the context menu:
This wizard offers a growing number of templates, including the one that just copies a value from a linked entity:
Once finished, the property shape of schema:height carries a SHACL property value rule:
To confirm that this is all now working, we can visit the local Arnold instance, and use “Refresh details of remote values” to fetch the remote values from the Wikidata SPARQL endpoint:
Once this has completed, we can see that our local Arnold instance has a schema:height property, which is inferred straight out of the Wikidata knowledge graph:
We can repeat the same steps for the other properties. In some cases, the property value rules may need to be post-processed to include extra transformations. Here, we have modified the rule for schema:deathDate so that the xsd:dateTime value from Wikidata is automatically turned into an xsd:date literal:
If you are not familiar with the syntax, check the SHACL Advanced Features 1.1 draft. The above roughly means “query the values of wikidataPerson and then query the values of P570 of those, and finally convert those to xsd:date using the SPARQL xsd:date(v) function”. Similarly, we can use the function sparql:iri to convert the image URL strings delivered by Wikidata into IRI resources. (To see the sparql: functions, include the “SPARQL vocabulary for SHACL” into your Ontology).
Refreshing and Querying Remote Values
Now that all shapes have been set up, we can use batch processes to periodically refresh the remote values, e.g. once a night. In TopBraid EDG, this can be automated using scheduled jobs. The batch process can be triggered from the Transform tab:
Alternatively, individual resources can be refreshed as shown earlier.
We can now see that all local person entities that have links to Wikidata entities carry values for height, birth date, death date and image:
You can also query these values, consistently with locally defined values, using GraphQL:
Since TopBraid’s GraphQL support is based on shape declarations, we can even query the values of the remote entities, as follows. Note that this requires the shape to be marked with graphql:protectedShape in the Ontology.
Oh, and since we have used SHACL node shapes to declare the structure of the Wikidata entities, we can also perform constraint validation on that data. Combined with TopBraid EDG workflows, this means that data can be pulled from the remote service and then validated before it is accepted into production.