+27 (0)21 551 2410
Facebook Twitter LinkedIn

Articles

 

How do you go about improving the quality of existing data?


By: Di Joseph

Now this is an interesting question there are several ways in which to improve the quality because there are several ways in which the data can be defective. Depending on the problems found when measuring the data, we determine how best to improve the quality; for example what data is incomplete (missing), what data is not unique (duplicated) and what data does not comply with the Business Rules.

Data Correction can be undertaken using manual or automated processes.

Data that is inaccurate and does not comply with business rules needs to be corrected to represent the accurate values.

Data that is missing needs to be obtained. It may be possible to derive it from other existing data or it may need to be obtained from external sources.

Data that is duplicated needs to be matched, then merged and de-duplicated where appropriate, or managed to be kept aligned.

Data Improvement is often a long process and uses techniques such as:

  • Parsing
  • Standardisation
  • Data Enrichment
  • Data Augmentation
  • Data Derivation

However it is also important to understand what went wrong! We do this by determining the root causes of the Data Quality problems, preferably prior to fixing the data.