FAQ – Semantic Web Technology
What is semantic technology?
Standards-based semantic technology is the use of W3C standards such as RDF and SPARQL to share and integrate data and, optionally, the semantics of that data. Semantic technology is popular for public sharing of data (see “What is the Semantic Web” below) as well as for sharing and integration of data on private intranets.
What is the Semantic Web?
The Semantic Web is the network of machine-readable data that is publicly shared using semantic technology over the same infrastructure as the World Wide Web.
How is Linked Data related to semantic technology?
“Linked Data” is a more recent term than “Semantic Web” with much overlap, describing the use of RDF-related standards to share data. Linked Data has a strong emphasis on dereferenceable URIs and the inclusion of links between available datasets and less emphasis on the use of OWL and the more “semantic” parts of the Semantic Web. TopQuadrant's TopBraid platform makes it easy to build both Semantic Web and Linked Data applications.
What is "semantic" about semantic technology?
Semantic technology standards give us ways to unambiguously identify resources and to describe relationships between them. When you can write, in a machine-readable way, that PO in one database means the same thing as PurchaseOrder in another, or that a Purchase Order must have a single “issue date”, or that “spouse” is a symmetric property, you can store small but useful bits of the meanings of these terms that let applications get more out of your data—which is what metadata is for.
What is RDF?
The Resource Description Format (RDF) is a data model that has been a W3C standard since 1999 and continues to be the core standard of semantic technology. By giving you a way to describe all facts as simple, three-part statements, RDF makes it easier to mix and match and connect different datasets.
These three-part statements are known as triples. You can think one as the equivalent of a single cell in a spreadsheet where the column name is the property name, the row name is the resource identifier (the thing described by the property) and the contents of the cell are the property value. The resources, properties, and sometimes values all have Web-wide unique identifiers known as URIs so that they can be linked to from anywhere, can be accessed using normal Web and Internet technology, and can be aggregated with other data without fear of name conflicts.
What is SPARQL?
The SPARQL Protocol and RDF Query Language (SPARQL) is a set of standards for querying and updating RDF data as well as for transmitting queries and results over a network.
The query language is the most commonly used of these specifications and forms the basis of the SPIN rules and constraint language and the SPARQLMotion and SPARQL Web Pages scripting languages developed by TopQuadrant to make application development easier and more efficient. In the words of World Wide Web inventor Tim Berners-Lee, “Trying to use the Semantic Web without SPARQL is like trying to use a relational database without SQL.”
SPARQL is commonly used to perform data validation and, with capabilities such as transitive closure operators and property functions, perform query time reasoning. SPARQL can also be used to construct and persist new facts (triples) based on the existing facts. TopBraid distributions include a complete OWL RL profile implemented using SPARQL.
What is RDFS?
The RDF Schema (RDFS) language is a vocabulary description language that lets you describe classes and properties using RDF itself—that is, RDFS descriptions themselves are a series of three-part statements about classes and properties. Sharing of such descriptions helps to make RDF-based data and applications more interoperable. While RDFS stands for RDF Schema Language, it is not a traditional schema language. For example, it can't support data validation because with RDFS, data is never invalid. It is designed to support certain, very basic type of inferencing.
Use of RDFS (and its more powerful cousins, OWL and SHACL) are optional when working with RDF.
What is OWL?
The Web Ontology Language (OWL) builds on RDFS to provide a vocabulary for making more sophisticated modeling statements. With the appropriate OWL-enabled tools, these descriptions may be used as the basis to infer new facts from combinations of other facts.
Note that because RDFS and OWL are based on the Open World Assumption, they are not intended to be used for data validation. Open World Assumption means that one can't make conclusion based on the absence of facts. For example, while OWL allows us to say that a Person must have one and only place of birth, absence of the place of birth information, will not be considered inconsistent. Having multiple places of birth may not (depending on what else is in the model) be considered inconsistent either. SHACL, on the other hand, operate under the Closed World Assumption and is a better fit for typical data validation, data modeling and reasoning needs of many business applications.
Various profiles of the OWL specification have been published to more efficiently deal with certain classes of use cases; the original version of OWL had OWL Full, OWL-DL, and OWL Lite, and OWL 2 has OWL RL, OWL EL, and OWL QL.
What is SHACL?
SHACL is an official W3C standard, a modeling language for describing a set of conditions data should meet – specifically, data in the knowledge graphs. These conditions are defined in structures called SHACL shapes. Using SHACL you can describe what properties are required to have values, a number and type of allowed values and much more.
SHACL Rules support reasoning applications. Rules specify new facts that can be inferred from the existing facts.
SHACL provides an alternative to using RDFS and OWL for ontology modeling. It can also be used together with RDFS/OWL. In addition to supporting rules for specifying requirements (constraints) your data must meet, SHACL also offers a way to specify rules that infer new facts from the available data.
If I have a model in RDFS or OWL, can I automatically convert it to SHACL?
Yes, both TopBraid EDG and TopBraid Composer Maestro Edition can auto-convert RDFS/OWL to SHACL.
What new capabilities does semantic technology bring?
Semantic technology makes it possible to get more value out of more of your data:
- The simplicity of the RDF data structure and the optional nature of schemas means that different sets of data can be combined and used together very easily. SPARQL queries against these combinations can reveal patterns that were not apparent when the data was stored separately.
- To really make the whole of such aggregations greater than the sum of their parts, schema information can be added incrementally to bring more value out of the data.
- Changing data structures does not mean expensive changes to the infrastructure for handling the data, making your systems more agile in accommodating new data.
- Because middleware available as part of the TopBraid platform lets you treat a wide variety of formats (relational databases, Excel spreadsheets, and more) as RDF, semantic technology lets you gain all of these benefits from data stored using more traditional formats.
What kind of industries are using it, and for what?
Just a few examples of industries where semantic technology is gaining popularity:
- Life Sciences companies use semantic technology for flexible aggregation of research and clinical trial data from different sources so that they can develop new products and perform testing more quickly.
- Publishing companies use it to track the different kinds of metadata associated with content assets in different formats so that they can combine these assets into new products for new media.
- Legal publishers use it to track relationships between the highly structured components of laws, court decisions, and related content.
- Companies in a range of domains use it to store taxonomy data and metadata and then use this data to enhance the performance of other applications such as search engines.
Does ontology development require agreement from all stakeholders, from the beginning?
Because data models are themselves expressed using RDF triples, they too can be easily aggregated. Knowing this, system developers often start small and grow a semantic technology system organically, adding pieces as their system scales up. Instead of forcing different stakeholders to agree on a common model that may not be ideal for either of them, the differences can themselves be modeled and made part of an application.
How can semantic technology address Big Data challenges?
The main challenges of Big Data are usually defined as the ability to handle the greater volume, velocity, and variety of data becoming available. In a recent report titled “Big Data Adoption in 2013 Shows Substance Behind the Hype,” the Gartner technology research and advisory company wrote “Many organizations find the variety dimension of big data a much bigger challenge than the volume or velocity dimensions… When asked about the dimensions of data organizations struggle with most, 49% answered variety, while 35% answered volume and 16% velocity.”
The ease of combining data from different sources with different structures and the availability of a standardized query language to work with the combination means that semantic technologies are well-suited to deal with data variety. The optional status of schemas means that instead of wrestling to coordinate schemas from different sources—a huge task in most data integration projects—you can start with only as much schema metadata as you want and then build from there to accommodate all of the data sources you want to incorporate into your system.
SEMANTIC TECHNOLOGY AND MORE TRADITIONAL TECHNOLOGY
Do I have to convert my data to RDF to take advantage of semantic technology?
TopQuadrant's TopBraid platform includes middleware that can dynamically treat relational databases, Excel spreadsheets, XML, JSON, and other formats as RDF. This brings the power and flexibility of semantic technology to your legacy data with no need for conversion. (Actual conversion of the data can bring certain advantages; TopQuadrant engineers can work with you to determine the best architecture for you to take the fullest advantage of your data.)
What can be done in RDF that can't be done with relational databases?
RDF offers several benefits over the relational model:
- Data does not need to be stored in tables. This means that you can start accumulating data and using it right away, with no need to plan out all the structure of all the data you might use first; it also means that when a system is in production, taking advantage of additional new data classes and properties does not require splitting and augmenting of tables and the necessary data dumps and reloads and query rewriting that often accompany schema evolution in relational databases.
- Integration of two different RDF datasets into a single one is almost trivial compared with the work of integrating two different relational databases into a single one, especially when the relational databases are stored using RDBMS systems from different vendors.
- SPARQL is a much more standardized query language than SQL. While SQL is a standard, vendors often replace very basic commands with their own extensions, reducing portability.
- RDF has a more explicit schema language with a much smaller reliance on external documentation for developers to understand the schema work of others.
Are there RDF databases?
Because RDF is ultimately a data model, there are a variety of choices in how it can be stored. Specialized databases for storing RDF are known as triplestores. TopBraid workspace is a named graph triplestore. This means that it has an important feature of organizing RDF triples into sets or graphs.
When using TopBraid Composer, graphs can be persisted as files in TopBraid workspace or stored in the databases external to the workspace. TopBraid EDG works with RDF databases that are packaged with it. You do not need to purchase a separate database. Depending on the required scale, you can use RDF-native (Apache Jena TDB) and RDBMS-based system for storing triples.
RUMORS AND FACTS
Does this have something to do with artificial intelligence?
Much of the work that led to development of the various RDF-based standards drew on knowledge representation work. SHACL, for example, can be used to model and build applications that take advantage of rules and reasoning. It is also possible to use both, rules and machine learning as a powerful combination.
Can semantic technology understand natural language?
Semantic technology standards are about storing, sharing, and modeling data and the semantics of that data. Tools that parse sentences and paragraphs to look for facts and sentiments are a separate category of tools that often have no connection to these standards. Some of them can read and write RDF-based data, making it possible for TopQuadrant products and solutions to work with them as part of an application architecture.
Do I need an ontology? Where do ontologies come from?
Ontologies provide metadata that adds a lot to applications. If an ontology that models data required by your application already exists, using that ontology instead of designing your own can make your data and application more interoperable with other systems. For example, the existence of the W3C's SKOS ontology means that there is no need to to write a new ontology for vocabulary management. Several tools and communities of practice are available to help you find potentially helpful ontologies, and TopQuadrant can work with you to help you find whether an existing ontology can address your needs.
Because RDF-based ontologies can be combined and extended as easily as any other RDF data, it's very common to combine several existing small, specialized ontologies and to then add a bit more to customize the combination for the application under development. TopBraid EDG is based on a number of pre-built ontologies that describe data, technical and enterprise assets. When you use EDG's features to define new classes and properties, you are actually creating and extending ontologies.
Any metadata you already have can be used to create an ontology. For example, if you have a spreadsheet describing your products, TopBraid products will automatically create properties from the column names and associate them with the class Product. Similar approaches are used to generate ontologies from schemas in relational databases and XML. These initial ontologies can be later on refined and enriched as needed. To assist with your ontology development needs, TopQuadrant provides comprehensive modeling services.