Data Governance For Dummies
Book image
Explore Book Buy On Amazon
In general, the definition of a data governance tool is one that assists in the creation and maintenance of policies, procedures, and processes that control how data is stored, used, and managed.

No doubt, many aspects of data governance are complex, particularly in larger organizations. Fortunately, as expected from a competitive marketplace, where there is opportunity, you will find providers and their software solutions only too willing to help.

As data has grown in its significance to every organization, particularly in just the last few years during the Cambrian explosion of data, many innovative data tools have been introduced. Some of the software has emerged from the largest technology players, such as Microsoft, Oracle, IBM, CA, Informatica, and SAP, but also mid-sized and even startups have entered this lucrative space.

I’m not going to list solutions here, as there’s always a risk of implying some bias or leaving out an obvious player, plus, and this is probably the bigger reason, the marketplace is changing too fast and any list I provide will inevitably be dated quickly.

The quantity and quality of innovative data tools recently introduced have been game-changers. The figure below is illustrative of many of the areas now addressed with software tools.

Chart showing the various areas of data governance ©John Wiley & Sons, Inc.
Software tools serve all of these data governance areas.

With the increasing use of technologies, such as artificial intelligence, data management, governance, and analytics (and frankly, all aspects of data science) organizations have benefitted from increased automation, better decision-making, improved efficiencies and speed, higher data quality, greater compliance, and even the ability to contribute to increased revenue.

To achieve these potential benefits, it’s certainly important for your organization to evaluate what tools may make sense.

Selecting data governance tools

Determining what tools you need, like so many things, depends on several factors. Considerations will often include:
  • Business priorities and requirements
  • The suite of data tools already available in the organization
  • The complexity of data environment
  • The complexity of IT infrastructure
  • Current maturity level of data governance
  • A narrow or broad focus of data governance objectives
  • Skill sets of data governance team and data staff across the organization
  • Available budget
  • Data governance team appetite for automation and system administration
Tool requirements may emerge out of an existing pain point, like so many solutions do. But deciding on a toolset may also be the product of a requirements-gathering process that considers the items in this list and others.

Some of the common features now found in data governance tools include:

  • Data discovery, collation, and cataloging: A mechanism to identify, collate, and support data set search.
  • Data quality management: Tools that identify and correct flaws, cleanse, validate, and transform data.
  • Master data management (MDM): This is covered earlier in the chapter in the “Master data management” section.
  • Data analytics: An application to enable the discovery of insights in data.
  • Reporting platform: A solution to generate all manner of business reports.
  • Data visualization: An application that uses graphical elements as a way to see and understand trends, outliers, and patterns in data.
  • Data glossary and dictionary: A repository that contains terms and definitions used to describe data and its usage context.
  • Compliance tools: Solutions that automate and facilitate processes and procedures that support industry, legal, security and regulatory and compliance requirements.
  • Policy management: A tool that helps in the creation policies, supports their review and approval, distributes to impacted staff, and can track that team members have received or viewed content.
  • Data lineage: A solution that identifies, maps, and explains the source and destination of data, including its origin and stops along the way. Data lineage is also known as data provenance.
Keep in mind that some tools are designed to do one or more of these tasks really well, while other solutions try to provide an entire suite of solutions. Needs, cost, and complexity are factors when determining whether to buy a single feature or full-suite solution.

DataOps and DevOps

A defining characteristic of the early years of the 21st century is the need to innovate at speed. In an unforgiving marketplace, organizations that are slow to improve their internal processes or cannot bring products and services to the market are at a disadvantage, which can result in business failure.

In this context, greater emphasis has been placed on finding ways to accelerate innovation and produce more frequent deliverables.

With technology playing such a central role in innovation, it was observed that the relationship between teams that created solutions — primarily based on software — and those responsible for deploying and supporting the code, were not aligned. These two groups, the developers and the IT operations teams, for example, reported to different leaders and had dissimilar performance goals.

Around 2007, a movement started to better integrate development and operations that was aptly named DevOps.

DevOps is a reimaging of how to build and deliver solutions quickly. It incorporates automation, collaboration, communication, feedback, and iterative development cycles.

In a similar fashion, but on the premise that organizations were struggling with data volume and velocity, and the slow speed of deriving insights, it was observed that efficiencies could be gained in rethinking the lifecycle of data within the enterprise.

Using the concepts and successes of DevOps, around 2014, a new approach to data analytics emerged called DataOps. Some called it DevOps for data science. The figure below shows the data management areas that are being automated — the shaded areas — with DataOps.

Flow chart showing data management operations that can be automated with dataops ©John Wiley & Sons, Inc.
More than half of data management operations can be automated (shaded areas) with DataOps.

Like DevOps, DataOps uses contemporary work approaches such as collaboration, tools, and automation to find efficiencies and deliver higher quality and quicker insights. You can think of DataOps as a way to kick data analytics into high gear.

Central to DataOps is the emphasis on collaboration between participants in the data value chain. This includes data analysts, data engineers, IT team members, quality control, and data governance.

In addition, like DevOps, DataOps proposes an agile approach to delivering data solutions. Instead of long periods of requirements analysis, design, and then development, work is broken into smaller chunks and priority is given to delivering value quickly and often. Cycle times are compressed, and business users get the data they need sooner.

As an example of inefficiencies in the absence of DataOps, a marketing leader requests the development of a new monthly report. In traditional development lifecycle organization, it can take weeks and even months to elicit and validate the requirements for the report, design and develop it, receive feedback and make changes, and then deploy it.

The long cycle times lead to disappointment and missed opportunities, and it deters data requestors from even making requests. DataOps changes the game on requests like these through a mix of agile methods, improved collaboration, and automation.

Recent research revealed that many companies that embraced DataOps and agile practices were experiencing a 60 percent increase in revenues and profit growth.

DataOps can be implemented through team structuring and new processes. But it can also be facilitated through new supporting tools that include artificial intelligence and automation. A dynamic marketplace has emerged that will provide you with many options and new capabilities to accelerate your data analytics cycle times.

DataOps is a type of data governance in that it focuses on improved and faster methods to deliver more data value and quality while also considering risk. In addition, it requires the participation and support of the data governance team to help with policies, standards, quality control, and security considerations.

DataOps tools can also give data governance teams new, actionable visibility to data use, flow, and challenges in the organization.

Some say DataOps is the future of data governance. The evidence is certainly pointing in that direction.

About This Article

This article is from the book:

About the book author:

Jonathan Reichental, PhD, is a technologist, author, and professor. Along with his expertise in data governance, he also focuses on areas such as digital transformation, the fourth industrial revolution, the future of cities, and blockchain technologies. He is author of Smart Cities For Dummies and creator of the popular Learning Data Governance course, published by LinkedIn Learning.

This article can be found in the category: