Knowledge Graphs are Key to Data Fabric

In the first blog post of this series, we have discussed how the data fabric is different from the previous, more traditional data integration architectures. The difference is in its fundamental reliance on metadata. We have identified the three components the data fabric needs to take advantage of the metadata:

Collect metadata
Enrich metadata
Use metadata

In the second post of this series, we will discuss how knowledge graph technology is uniquely suited to support these components of the data fabric architecture. Let’s take a look at each of the components to understand why this is the case.

1. Collect: capture comprehensive metadata

To implement a solution that heavily relies on metadata, we must create and instrument processes for collecting metadata. Each new data source and data operation included in a data fabric creates a new set of metadata. With all the variety of metadata we need to collect, we must have a place to store it that is adequate to the task. It needs to:

Accommodate metadata across all categories including complex metadata with relationships and other descriptive information
Be flexible to support new metadata from existing and new sources, as it gets discovered and collected

When we consider possible storage options, knowledge graphs immediately stand out:

Relational databases are too rigid for these requirements.
Key value stores are very flexible being schema-less, but too simplistic and limited in expressivity. They are not good at storing relationships.
Graphs, on the other hand, are perfect for storing a variety of metadata, due to their ultimate flexibility and expressivity:

Graphs can store and connect any information – expected and unexpected.
Knowledge graphs, in particular, have the extra added benefit of being schema-last:

Like schema-less stores, they do not require schemas and, therefore, are flexible.
Unlike schema-less stores, they support having schemas and reap all the benefits of schemas e.g., better data quality, reliability, understanding, optimization, etc. Rich, highly expressive schemas can be added and evolved as needed. This allows SMEs to model and enrich data with semantics and meaning.

As an example of how knowledge graphs can easily accommodate any metadata its users may decide to capture, let’s consider one of TopQuadrant’s customers – a large government agency. It uses our product to catalog and profile many disparate relational data sources and datasets in various formats.

This automated cataloging process establishes technical metadata. Agency’s codesets and naming conventions plus centralized glossaries that were once hardcoded websites are captured as knowledge graphs of business metadata. A catalog of software applications in use by the agency creates a foundation for the operational metadata. Comments and tasks from the users provide social metadata. Agency’s subject matter experts further add to this metadata by capturing their knowledge that previously was only available in the tacit form.

2. Enrich: connect and enrich metadata enrichment

Today, more and more tools are producing and storing multiple metadata in on-premise, cloud and hybrid data environments. This creates silos of disconnected metadata, severely limiting its usefulness. To quote Gartner:

”Metadata often sits among disparate, uncoordinated taxonomies, hierarchies and datasets, and lacks standardization across tools. With increasing demand to coordinate different data stores, including regulatory and compliance pressures, a holistic view of metadata is essential.

Considerable business and governance value can be derived from this assorted metadata if it were managed and linked in the right way.”

Gartner, “Deploying Effective Metadata Management Solutions”, January, 2022.

Once siloed metadata is collected, it needs to be connected. Knowledge graphs specialize in connectivity and linking. As stated in the above quote from Gartner, coordinated taxonomies and reference data are important enablers of this process. Knowledge graphs can:

Store and connect any and all hierarchies as well as other structures
Capture simple and complex inference rules
Support analytic algorithms that exploit graph relationships
Provide information to machine learning algorithms and store results of their processing

Many of our customers look for ways to align metadata in order to provide integrated search and discovery capability. For example, one financial services firm has many unstructured documents as well as spreadsheet datasets. They come from different stores and repositories. Many were tagged at creation, but their tags used uncoordinated vocabularies. This made search and discovery difficult.

The firm was able to align these vocabularies, transforming metadata into a common, shared form and making it possible for users to search for any data asset using common terminology. Collected and aligned metadata was also used to generate additional tags, making search and discovery even more user friendly for specific stakeholder groups.

The enrichment process makes it possible to recommend data to use or deliver to a specific stakeholder and to dynamically select suitable data integration and data delivery processes.

It is also worth noting that, in practice, organizations may not be able to use a single place for storing all metadata. A knowledge graph can help address this issue. Even if you do not have all the metadata in a knowledge graph, a knowledge graph can be used to connect and align metadata located in other repositories. This is because each resource in a knowledge graph is given a globally unique persistent identifier that is used in linking disparate information about resources. Furthermore, knowledge graphs support layering – you can layer a knowledge graph over other graphs (one or more) and you can also layer a knowledge graph over data in other formats. As shown in the following graphic, knowledge graphs are:

‍

3. Use: provide access to metadata in the context of data use

Finally, when enough metadata is collected, connected, aligned and enriched, the data fabric can start to inform, assist and even automate a variety of data integration and management activities executed by systems and stakeholders across an organization.

For people, enriched metadata can provide better search and discovery of data and related assets. Systems can also take advantage of these capabilities. Key to such capabilities are rich query languages and APIs. Just like relational databases have a query language – SQL, knowledge graphs also have a query language – SPARQL. This query language can query both the data and schema of a knowledge graph. Beyond SPARQL, knowledge graph vendors are increasingly offering support for GraphQL.

The vision of data fabric calls for the incorporation of microservices architectures. Microservices are an increasingly popular method for splitting your critical services into smaller services instead of monolithic APIs. These services can be orchestrated by and through the data fabric. Using the power of metadata, it can help to ensure that the clients’ requests are processed and forwarded to the correct microservice. Data fabric can either serve directly as a data layer for microservices and/or assist such data layers. As an open spec for a flexible API layer, GraphQL has emerged as the best fit technology for building a data layer for microservices.

Our customers find GraphQL access instrumental to supporting their requirements. For example, one of TopQuadrant’s customers is a global software company focused on storing, protecting and managing customer information assets. They use TopBraid EDG to store and manage key metadata. The metadata is then used to create different digital products. They use GraphQL APIs to access content in EDG, transforming and repurposing it as needed.

Support for GraphQL does not necessarily require the use of a knowledge graph. GraphQL endpoints can be built over any data source. However, the use of a knowledge graph takes GraphQL to another level. In a recent blog, we described how TopBraid EDG supports GraphQL access to both data and schema for read and write.

In Summary

In this blog, we have discussed what makes the data fabric different from the previous approaches to data integration and data management. The key difference is in its pervasive use of metadata. Data fabric collects metadata of all types, connects and enriches it and makes it available to users and applications alike to drive insights and automate processes. Knowledge graphs offer an optimal technology platform for collection, enrichment and intelligent access to metadata.

‍