There’s a fitting analogy between the digital world of data and the physical environment we live in. Bad data is like trash: pollutants that infest the environment. Trash makes everyone’s life worse off. Similarly, bad data has a pervasive negative impact on business performance.
One way to deal with trash is periodic cleaning: Every once in a while we assemble a crew to go out and pick up trash as thoroughly as our budget permits. Then immediately, the environment starts to degrade again until the next cleaning.
A more effective and less expensive way is to address the trash problem through social policy aimed to keep trash out of the environment in the first place. In a democracy, we galvanize public support, write and pass a law to prohibit littering. Then, we implement the law by communicating it broadly to the public and translating it to clear and enforceable rules. Lastly, we enforce the new law by monitoring for violations and fining those who litter.
In the data world, the traditional approach to data management is largely about periodic cleaning. A better, policy-centric approach is data governance.
The benefit of data governance in comparison to traditional approaches is better data at a lower cost. Although an effective anti-litter policy doesn’t completely obviate cleaning — there will always be violators — but clearly, the cost of cleaning will be drastically reduced. Data governance significantly reduces the cost of data management activities. More importantly, the environment is cleaner. All the time.
The larger goal of environmental policy, in the end, is not about collecting fines, but about changing behavior. Littering is a perfectly rational behavior at the individual level. It’s easier to toss trash in the most convenient place — anywhere — rather than looking for the trash bin. The behavior is optimized for the short term and self interest. But for the society at large, it is very damaging. This is a conflict between local optimization and global optimization, the short term and the long term. Governance is the most effective tool to resolve this conflict. Gartner’s Debra Logan wrote elegantly about this topic inspired by Jared Diamond’s Guns, Germs and Steel.
The same can be said about data. The providers of data naturally do what’s most convenient based on local and short term objectives. Data governance is about establishing and enforcing policies for the good of the enterprise to benefit multiple consumers of data downstream, and holding the data providers accountable.
In my previous blog, I said that important data assets have multiple providers and multiple consumers who are often unaware of each other, and data quality is often not in the immediate interest of data providers. There is no transparency and accountability, and this is the root cause of bad data. Data governance addresses the root cause head on by formalizing data quality rules as policies, communicating these policies so that there is transparency between data providers and consumers, and enforcing these policies by collecting metrics on policy compliance. Transparency and accountability are institutionalized.
In my next blog, I’ll write about a framework for data governance based on data policies and a set of processes around the policies.
This blog is part 3 of a multi-part series of blogs on the topic of Enterprise Data Governance. To read other posts from this series, please see below.