I had the pleasure, along with Bob DuCharme, to represent TopQuadrant at the recent Data Quality & Information Quality Conference, June 8-12, 2015, San Diego, CA. TopQuadrant was a gold sponsor, exhibited, gave two talks, and we got a great chance to hear about the key issues facing large enterprises trying to efficiently manage their reference data, business glossaries, and other enterprise data.

The conference site, the Catamaran Resort Hotel & Spa, overlooked San Diego’s beautiful Mission Bay, and was uniquely relaxing, fun and comfortable – very conducive to informal discussions and making connections. The conference has become so successful that after nine years at this location they’ll be moving to a larger venue downtown San Diego next year; this year, to accommodate the nearly 500 attendees, several talks were held on two river boats moored near the hotel’s beach.

Speakers and practitioners at the conference confirmed that greater focus and efforts are urgently needed to improve data governance at most organizations. The reasons for this resonated strongly with many of the same motivations and drivers as noted in our recent blog post: Reflections on the Enterprise Data World 2015 Conference.

A pervasive theme noted at DGIQ is that the key to effective governance and use of information is to capture and preserve its meaning through the many, often complex (and sometimes unforeseen), lifecycles of its integration and use within the organization. TopQuadrant’s talks focused on the flexibility needed in the technology support and capabilities to enable effective reference data governance.

Malcolm Chisholm of AskGet and I co-presented Data Governance Preparedness for Reference Data Management (slide set) on the larger of the two boats, the William D. Evans. Malcolm began the talk by drawing on his long experience in data governance at a broad range of organizations to highlight topics essential to improving reference data management and governance. (He has addressed many of these same issues in a recent webinar for which a recording and an accompanying whitepaper are available at The Foundations of Successful Reference Data Management.)

Malcolm concluded his section of the talk by listing the fundamental capabilities to look for in a reference data solution. This list included commonly supported capabilities for managing the data within reference datasets such as the ability to import, distribute and track changes. However, his list also included less commonly supported capabilities that Malcolm stressed are urgently needed. These center around the ability to maintain information about reference data, including profiling of each dataset’s source and documentation of the meaning of dataset properties and, when needed, even of individual codes.

I continued our presentation, by first echoing the core theme noted above that the key to effective data governance is to capture the meaning of data. To do that, those governing and using reference data need to create and preserve a shared understanding of it through the lifecycle of its use. To support that, I described by how the capabilities Malcolm listed could best be delivered with a solution that provides flexibility and extensibility so that end users can not only execute the day-to-day governance and management of the data within the datasets, but can also add or change fields and establish relationships among datasets. This gives end users more control over their reference datasets and the information about those datasets and their contents so that the meaning (semantics) of reference data can be captured and preserved for its correct use and governance . To flexibly manage and use reference data well, we need the ability to semantically integrate management of business models, metadata, and reference data. As both Malcolm and I noted, flexible management of reference data and the semantic metadata about data, datasets, and data sources is difficult within the strict confines of a relational database and can be much easier with a representation such as RDF, a well-supported NoSQL graph data model from the W3C.

At the end of our presentation, the audience had many questions, including this one that highlighted the need for flexibility:

“Are there different types of reference datasets, say 20 or so, and perhaps then multiple reference datasets that belong to one of those types? This would presumably allow for a more pre-specified set of fields and metadata for each type of reference dataset.”

Malcolm responded saying “No, when you take into account the fields need to properly represent the data in a given dataset, the metadata about that dataset to capture its semantics, or meaning, and the need to have relationships between reference datasets, every reference dataset is unique.”

As TopQuadrant’s Bob DuCharme showed in a separate conference presentation Get More Value from Your Reference Data — Make it Meaningful with TopBraid RDM (slide set), TopQuadrant’s TopBraid Reference Data Manager lets data stewards take full advantage of all levels of data semantics with more flexibility but less (if any!) dependence on IT staff to manage evolving data models.

Another hot topic at the DGIQ conference was Business Glossaries—the official definitions of terms used in databases, reports, and elsewhere in the enterprise, whose careful management is often mandated by requirements such as the Basel Committee on Banking Supervision’s rules for risk reporting (BCBS 239). Business Glossaries are another area where the flexibility provided by TopBraid RDM’s underlying semantic standards model makes it easier for users to easily accomplish their data management goals.

In our booth at the conference’s exhibit hall, we met people from both small, specialized companies and large, brand name companies who were feeling an ever stronger need to improve the organization and governance of their reference data so that their enterprise systems can take even better advantage of that data.

We look forward to working with them in the coming months, and while we will miss the tropical setting of this year’s conference, we look forward to attending next year’s DGIQ conference in downtown San Diego as it grows to let even more data practitioners gather together and discuss their challenges and the technologies for meeting them.