FAQ – Semantic Web Technology
[expand title=”What is semantic technology?”]
Standards-based semantic technology is the use of W3C standards such as RDF and SPARQL to share and integrate data and, optionally, the semantics of that data. Semantic technology is popular for public sharing of data (see “What is the Semantic Web” below) as well as for sharing and integration of data on private intranets.
[expand title=”What is the Semantic Web?”]
The Semantic Web is the network of machine-readable data that is publicly shared using semantic technology over the same infrastructure as the World Wide Web.
[expand title=”How is Linked Data related to semantic technology?”]
“Linked Data” is a more recent term than “Semantic Web” with much overlap, describing the use of RDF-related standards to share data. Linked Data has a strong emphasis on dereferenceable URIs and the inclusion of links between available datasets and less emphasis on the use of OWL and the more “semantic” parts of the Semantic Web. TopQuadrant’s TopBraid platform makes it easy to build both Semantic Web and Linked Data applications.
[expand title=’What is “semantic” about semantic technology?’]
Semantic technology standards give us ways to unambiguously identify resources and to describe relationships between them. When you can write, in a machine-readable way, that PO in one database means the same thing as PurchaseOrder in another, or that “located in” is a transitive property, or that “spouse” is a symmetric property, you can store small but useful bits of the meanings of these terms that let applications get more out of your data—which is what metadata is for.
[expand title=”What is RDF?”]
The Resource Description Format (RDF) is a data model that has been a W3C standard since 1999 and continues to be the core standard of semantic technology. By giving you a way to describe all facts as simple, three-part statements, RDF makes it easier to mix and match and connect different datasets.
These three-part statements are known as triples. You can think one as the equivalent of a single cell in a spreadsheet where the column name is the property name, the row name is the resource identifier (the thing described by the property) and the contents of the cell are the property value. The resources, properties, and sometimes values all have Web-wide unique identifiers known as URIs so that they can be linked to from anywhere, can be accessed using normal Web and Internet technology, and can be aggregated with other data without fear of name conflicts.
[expand title=”What is SPARQL?”]
The SPARQL Protocol and RDF Query Language (SPARQL) is a set of standards for querying and updating RDF data as well as for transmitting queries and results over a network.
The query language is the most commonly used of these specifications and forms the basis of the SPIN rules and constraint language and the SPARQLMotion and SPARQL Web Pages scripting languages developed by TopQuadrant to make application development easier and more efficient. In the words of World Wide Web inventor Tim Berners-Lee, “Trying to use the Semantic Web without SPARQL is like trying to use a relational database without SQL.”
SPARQL is commonly used to perform data validation and, with capabilities such as transitive closure operators and property functions, perform query time reasoning. SPARQL can also be used to construct and persist new facts (triples) based on the existing facts. TopBraid distributions include a complete OWL RL profile implemented using SPARQL.
[expand title=”What is RDFS?”]
The RDF Schema (RDFS) language is a vocabulary description language that lets you describe classes and properties using RDF itself—that is, RDFS descriptions themselves are a series of three-part statements about classes and properties. Sharing of such descriptions helps to make RDF-based data and applications more interoperable.
Schemas (and their more powerful cousin, OWL ontologies) are completely optional when working with RDF. You may choose to have a schema that describes every detail of your data model, or you may choose to use no schema at all, or you may choose to use a schema that only describes the parts of your data that need it. This last choice is a particularly popular option, and you can always add more to the schema that you’re using.
[expand title=”What is OWL?”]
The Web Ontology Language (OWL) builds on RDFS and knowledge representation technology to provide a vocabulary for making more sophisticated modeling statements. With the appropriate OWL-enabled tools, these descriptions may be used as the basis to infer new facts from combinations of other facts.
Note that because RDFS and OWL are based on the Open World Assumption, they are not intended to be used for data validation. Open World Assumption means that one can’t make conclusion based on the absence of facts. For example, while OWL allows us to say that a Person must have one and only place of birth, absence of the place of birth information, will not be considered inconsistent. Having multiple places of birth may not (depending on what else is in the model) be considered inconsistent either. SPARQL-based constraint checking and reasoning, such as SPIN, operate under the Closed World Assumption and tend to be a better fit for common data validation, data description and reasoning needs of many business applications.
Various profiles of the OWL specification have been published to more efficiently deal with certain classes of use cases; the original version of OWL had OWL Full, OWL-DL, and OWL Lite, and OWL 2 has OWL RL, OWL EL, and OWL QL.
[expand title=”Where can I learn more about semantic technology?”]
TopQuadrant has extensive training available in semantic technology and the best ways to take advantage of it. You can also download an evaluation copy of TopBraid Composer Maestro Edition and follow along with the tutorials and online help to get started right away.
[expand title=”What new capabilities does semantic technology bring?”]
Semantic technology makes it possible to get more value out of more of your data:
- The simplicity of the RDF data structure and the optional nature of schemas means that different sets of data can be combined and used together very easily. SPARQL queries against these combinations can reveal patterns that were not apparent when the data was stored separately.
- To really make the whole of such aggregations greater than the sum of their parts, schema information can be added incrementally to bring more value out of the data.
- Changing data structures does not mean expensive changes to the infrastructure for handling the data, making your systems more agile in accommodating new data.
- Because middleware available as part of the TopBraid platform lets you treat a wide variety of formats (relational databases, Excel spreadsheets, and more) as RDF, semantic technology lets you gain all of these benefits from data stored using more traditional formats.
[expand title=”What kind of industries are using it, and for what?”]
Just a few examples of industries where semantic technology is gaining popularity:
- Life Sciences companies use semantic technology for flexible aggregation of research and clinical trial data from different sources so that they can develop new products and perform testing more quickly.
- Publishing companies use it to track the different kinds of metadata associated with content assets in different formats so that they can combine these assets into new products for new media.
- Legal publishers use it to track relationships between the highly structured components of laws, court decisions, and related content.
- Companies in a range of domains use it to store taxonomy data and metadata and then use this data to enhance the performance of other applications such as search engines.
[expand title=”Does ontology development require agreement from all stakeholders, from the beginning?”]
Because ontologies and schemas are themselves expressed using RDF triples, they too can be easily aggregated. Knowing this, system developers often start small and grow a semantic technology system organically, adding pieces as their system scales up. Instead of forcing different stakeholders to agree on a common model that may not be ideal for either of them, the differences can themselves be modeled and made part of an application.
[expand title=”How can semantic technology address Big Data challenges?”]
The main challenges of Big Data are usually defined as the ability to handle the greater volume, velocity, and variety of data becoming available. In a recent report titled “Big Data Adoption in 2013 Shows Substance Behind the Hype,” the Gartner technology research and advisory company wrote “Many organizations find the variety dimension of big data a much bigger challenge than the volume or velocity dimensions… When asked about the dimensions of data organizations struggle with most, 49% answered variety, while 35% answered volume and 16% velocity.”
The ease of combining data from different sources with different structures and the availability of a standardized query language to work with the combination means that semantic technologies are well-suited to deal with data variety. The optional status of schemas means that instead of wrestling to coordinate schemas from different sources—a huge task in most data integration projects—you can start with only as much schema metadata as you want and then build from there to accommodate all of the data sources you want to incorporate into your system.
SEMANTIC TECHNOLOGY AND MORE TRADITIONAL TECHNOLOGY
[expand title=”Do I have to convert my data to RDF to take advantage of semantic technology?”]
TopQuadrant’s TopBraid platform includes middleware that can dynamically treat relational databases, Excel spreadsheets, XML, JSON, and other formats as RDF. This brings the power and flexibility of semantic technology to your legacy data with no need for conversion. (Actual conversion of the data can bring certain advantages; TopQuadrant engineers can work with you to determine the best architecture for you to take the fullest advantage of your data.)
[expand title=”What can be done in RDF that can’t be done with relational databases?”]
RDF offers several benefits over the relational model:
- Data does not need to be stored in tables. This means that you can start accumulating data and using it right away, with no need to plan out all the structure of all the data you might use first; it also means that when a system is in production, taking advantage of additional new data classes and properties does not require splitting and augmenting of tables and the necessary data dumps and reloads and query rewriting that often accompany schema evolution in relational databases.
- Integration of two different RDF datasets into a single one is almost trivial compared with the work of integrating two different relational databases into a single one, especially when the relational databases are stored using RDBMS systems from different vendors.
- SPARQL is a much more standardized query language than SQL. While SQL is a standard, vendors often replace very basic commands with their own extensions, reducing portability.
- RDF has a more explicit schema language with a much smaller reliance on external documentation for developers to understand the schema work of others.
[expand title=”Are there RDF databases?”]
Because RDF is ultimately a data model, there are a variety of choices in how it can be stored. Specialized databases for storing RDF are known as triplestores. TopBraid workspace is a named graph triplestore. This means that it has an important feature of organizing RDF triples into sets or graphs.
Graphs can be persisted as files in TopBraid workspace or stored in the databases external to the workspace. TopBraid platform includes both an RDF-native and RDBMS-based database system for storing triples. Additionally, TopBraid includes connectors to select third party triplestores.
RUMORS AND FACTS
[expand title=”Does this have something to do with artificial intelligence?”]
Much of the work that led to the OWL standard drew on knowledge representation work, and OWL can be used to model and build applications that take advantage of reasoning. These kinds of applications played a larger role in the earlier days of semantic technology than they do now.
[expand title=”Can semantic technology understand natural language?”]
Semantic technology standards are about storing, sharing, and modeling data and the semantics of that data. Tools that parse sentences and paragraphs to look for facts and sentiments are a separate category of tools that often have no connection to these standards. Some of them can read and write RDF-based data, making it possible for TopQuadrant products and solutions to work with them as part of an application architecture.
[expand title=”Do I need an ontology? Where do ontologies come from?”]
Ontologies, like their simpler cousin schemas, provide metadata that adds a lot to applications. If an ontology that models data required by your application already exists, using that ontology instead of designing your own can make your data and application more interoperable with other systems. For example, the existence of the W3C’s SKOS ontology means that there is no need to to write a new ontology for vocabulary management. Several tools and communities of practice are available to help you find potentially helpful ontologies, and TopQuadrant can work with you to help you find whether an existing ontology can address your needs.
Because RDF-based ontologies can be combined and extended as easily as any other RDF data, it’s very common to combine several existing small, specialized ontologies and to then add a bit more to customize the combination for the application under development. TopQuadrant’s TopBraid EVN is based on the SKOS ontology, and when you use EVN’s features to define new classes and properties, in the background, it’s actually creating customizations of SKOS.
Any metadata you already have can be used to create an ontology. For example, if you have a spreadsheet describing your products, TopBraid Composer will automatically create properties from the column names and associate them with the class Product. Similar approaches are used to generate ontologies from schemas in relational databases and XML. These initial ontologies can be later on refined and enriched as needed. To assist with your ontology development needs, TopQuadrant provides comprehensive modeling services.