Data fabric is a new architecture pattern that is growing in popularity. Interest in the data fabric architecture is high as evidenced by the number of people that signed up and attended our recent webinar on “How Metadata Management Must Evolve to Support Data Fabric”.
During the webinar, we:
- Discussed the key components of a data fabric
- Described how metadata management must evolve to support data fabrics
- Run a poll to understand attendees experience with data fabrics
- Showed demos illustrating key concepts within the data fabric architecture
The chart below shows how webinar attendees characterized their data fabric experience.
As we see, 57% did not know what a data fabric was. This includes 21% who were new to the term and 36% who have heard the term before, but did not know what it meant. The poll also shows that a growing number of organizations (31%) started to discuss data fabric implementation and, finally, 12% are in the process of implementing it.
This blog addresses questions we were asked during the webinar, but did not have time to answer. For those who did not yet watched the webinar, we start by identifying the key facts about data fabric in the section below.
What is Data Fabric?
Data fabric is an architectural pattern. Gartner, the analyst firm that coined the term data fabric, stresses that it is not a single product or even a simple collection of tools. It is a design concept that requires multiple existing and emerging data management technologies to play together in a certain way.
We have created a diagram depicting conceptual architecture for a data fabric environment.
The diagram consists of five layers:
- At the center are data sources. Although shown in the middle of the diagram, in practice, they are distributed and heterogeneous.
- The fundamental concept in the data fabric architecture is cataloging distributed data sources in a knowledge graph which, to quote Gartner, reflects everything that happens to your data. This is not “your father’s data catalog”. It is a dynamic, composable and highly emergent knowledge graph – shown in the diagram as a layer surrounding the data sources.
- Metadata that lives in the knowledge graph gets collected from the data sources and from any other important source of relevant information e.g., log files. It is further interpreted, reasoned over and enriched by algorithms that take advantage of the information in the knowledge graph. Gartner calls this process activation of the metadata.
- Different data delivery tools and services plug into this architecture. They consult the knowledge graph to understand the scope and context of the available information, access rights and other important factors.
- Data governance and standards are very important to the ability of different products work together within the data fabric architecture. We have shown this aspect in an outermost, blue layer in the diagram.
Even if you do not have a subscriber access to Gartner’s research, you can see an informative overview of the data fabric concept in this free article by Gartner.
Frequently Asked Questions
We will now move on to address questions we received from the webinar participants. Generally speaking, questions fell into two categories:
1. Lifecycle of the Metadata Captured in a Knowledge Graph
This included questions and comments like:
“How are verify that the metadata is correct?”
“Verification needs to compare the metadata in EDG with the actual data in the data asset”
“The depth of the capabilities for TopBraid EDG to link data at the many levels of abstractions to just about anything else is very powerful. How do you manage the inevitable complexity, especially how to enforce the “metadata quality” as the external sources change, as the organization’s needs change, and how the ontology changes?”
“Once the results are presented can the metadata be maintained / updated to remediate gaps in the metadata?”
Metadata gets collected from the data sources. It is correct in a sense that it reflects what actually exists. Data sources change. Thus, it is not a “one shot” collection process, but rather an ongoing process. Data sources can be re-cataloged on a regular schedule and/or on demand to ensure that the catalog contains accurate information.
TopBraid EDG offers strong support for the data stewardship processes that are indeed necessary and important. In this webinar, we have not touched on these capabilities, but they have been covered in a number of other webinars and videos such as:
2. Connection between Data Fabric and Data Virtualization
This was represented by questions like:
“The outer ring of the Data Fabric Architecture picture was “Data Access”. Does EDG have a data virtualization capability to access the data?”
“Can you use the metadata recorded about data sources to reach into the data source to query the data source rather than the metadata?” “You” in this context is TopBraid EDG.
It is important to stress that the architecture picture above is not an architecture of TopBraid EDG. It is a conceptual architecture for data fabric.
Data fabric is not about having a single tool. In fact, if you come across a vendor claiming that all you need to implement a data fabric is its product, you should be skeptical of the claim. If your organization is in a process of implementing this architectural approach, you should look for products that are compatible with the approach and will help you implement the necessary aspects – as opposed to products claiming to provide an “out of the box” data fabric solution.
Data fabric should be compatible with different data delivery styles, including data virtualization, but also ETL, streaming, replication, messaging and data micro services. Data delivery tools should be leveraging metadata in the knowledge graph to facilitate delivery of data. On the consuming side, there are different clients. Data fabric needs to support different types of data users and use cases. This ranges from IT users that leverage data fabric in a context of complex data integration scenarios to business users interested in self-service data preparation and access.
The role of TopBraid EDG in the data fabric architecture is to provide the knowledge graph with metadata that can be leveraged by the data delivery tools and, ultimately, data consumers. Further, EDG helps organizations consistently implement governance, security and regulatory compliance across all data.