Data Quality Guideline

Data quality defines the requirements for the accuracy, completeness, consistency, timeliness, and validity of Institutional Data.  The Data Governance Committee will follow a phased approach for data quality initiatives. 

Step 1: Determine list of CDEs (Critical Data Elements)

This requires that you determine the scope of your data quality program. In other words, OU must determine the items that should be under control of a data governance program focused upon data quality. This starts by determining the critical data elements for the enterprise. Typically, these are known to be vital to the success of the organization. These items become in scope for the data quality program. 

Step 2: Data Definitions

With this step, you create a glossary for CDEs. Here each critical data element is described so there are no inconsistencies between users or data stakeholders. With this accomplished, you move to step 3.  Glossaries will be maintained in the OU Data Catalog, Informatica.

Step 3: Business Impacts

What’s the business impact of critical data elements being trustworthy… or not? In this step, you connect data integrity to business results in shared definitions and determine the business impact of critical data elements being trustworthy. This work enables business stewards to prioritize data remediation efforts.

Step 4: Data Quality Rules

With profiling complete, you can use a data quality tool to create rules supporting data quality. AI and machine learning can automatically create and enforce such rules, or people can use the data extracted from data profiling to create the rules manually.

Step 5: Data Quality Metrics

With data quality rules completed and firing, you can collect data quality metrics. These metrics inform users of suspect data and alert data stewards to data needing remediation. In Alation, these metrics are added directly into the data catalog, so users discovering data know about any issues in real-time.

Step 6: Determine Authoritative Sources

Determining authoritative data sources is a key output of a data quality program. For this step, quality metrics gauge data sources to determine if the data is of sufficient quality. This allows users to quickly find the most trustworthy data for analysis. They can then collaborate around that data via shared conversations and queries hosted in the data catalog.

Data Categories and Sub-domains provides an overview of the Data Areas, Data Subcategories, Data Owners, Data Stewards, and System(s) of Records known to OU IT at the time of this plan’s development.

Step 7: Data Quality Remediation Plans

Finally, for data with systemic issues, it is important to address the root cause of data quality issues and determine how to solve the source data issues. This can involve data cleansing or training of data entry personnel.

Print Article

Details

Article ID: 3090
Created
Mon 10/23/23 3:48 PM
Modified
Mon 10/23/23 3:48 PM