It was another year of productive exchange of experience, insights and ideas at EDW 2017! As it grown in importance for most enterprises, data governance has become a key focus and high-priority of the conference program. So, not surprisingly, we learned a lot through conversations with the attendees about the challenges they face with implementing a data governance strategy.

It was also great to hear that more folks are becoming familiar with key open, graph-based semantic standards for representing and managing data and metadata such as RDF and SPARQL. Perhaps due to this emerging awareness, we fielded many questions about why semantic web standards are important to data governance. As many of you may know, semantic standards are the foundation of our solutions at TopQuadrant so we like to think that everyone understands the enormous benefits that come along with leveraging this open standards approach to data governance.

Semantics Are Crucial for Data Governance As an interesting coincidence, a very important new semantic-web standard called SHACL reached Candidate Recommendation* status the week following EDW2017. (*W3C standards, such as RDF and SPARQL are called “Recommendations”, so this means that SHACL is expected to become a “standard” very soon.) More on what this means for data governance below, but first as part of our discussions it was clear that organizations are still struggling to find data governance solutions that meet their specific needs.

Data governance as a discipline involves people, processes and technology. As modern enterprises depend on technology for all day to day operations, their technology and data landscape has grown increasingly rich and complex. Most organizations can’t govern this landscape without sophisticated tools. Simply put, the complexity of today’s information landscape makes the technology part of data governance increasingly critical for key essential initiatives such as meeting regulatory compliance requirements, ensuring data quality and getting value from big data implementations. But implementing it has become more complex due to the increasing diversity of data sources and stakeholders.

Our conversations at EDW confirmed once again that here are many questions that organizations want well-designed data governance practices and systems to help answer, such as:

  • Who created this data?
  • Where is this data used and shared?
  • What is the business definition of this data element?
  • What are the business rules for this data?
  • Where is this data stored?

A common solution in the enterprise is to create reporting and analytics warehouses that aggregate information from select data sources in order to answer specific queries. The processes for design and loading of warehouses do not accommodate rapid change. Since the number of data sources and information they contain constantly grows and the types of queries that business users need answered changes quickly, organizations increasingly find themselves with an expanding number of silo data warehouses, in addition to the silo transactional sources. More recent addition to this picture are data lakes which are often used to store historic, ad hock and more fluid datasets. With this:

  • identifying related and relevant information and tracking its lineage, for example for compliance reporting, is challenging.
  • the quality, relevance and freshness of information is often unclear — with the increasing number of potentially pertinent data sources it becomes harder to know what data can be trusted in what context and to screen out outdated, irrelevant or conflicting information from different sources.
  • there are a growing number of contexts in which data has been collected and a correspondingly diverse number of reference data and metadata used to describe it.

The proverbial lightbulb seemed to go off with each explanation of how semantic web standards make it possible for enterprises to overcome these challenges by automating the correlation of disparate information specific to their needs. These open standards allow users to customize their data governance approach based on existing IT investments, methodologies and their specific data requirements.

With its semantic standards-based foundation, TopQuadrant’s comprehensive solution, TopBraid Enterprise Data Governance (EDG) can manage the entire range of enterprise information assets and the cross-connections between them. As noted, we can support these needs because TopBraid EDG is based on modern, semantic-graph-database technology, uses rich and flexible semantic standards, has an extensible platform, and, most importantly focuses on relationships between assets.

This is why the addition of SHACL is such a significant complement to the semantic standards technology stack. As Irene Polikoff explains in Now You Can Finally Validate your RDF – SHACL Reaches Candidate Recommendation status!

“It delivers much needed capabilities for ensuring data quality in enterprise solutions.”

What does this mean for data governance? As an experience practitioner in information management and data governance solutions, Jan Voskuill puts it:

“… SHACL is a significant step towards making Linked Data more viable and useable in practical situations. Many people believe that once it is an official standard, SHACL will be a game changer in data governance and Big Data. It will enable a new level of growth in the uptake of Linked Data.”