Quantcast

The What, Why, and How of Database Cleansing

Get the WebProNews Newsletter:


[ Business]

Firstly the “What” – Cleaning a database is done to:

* Remove duplicate records

* Ensure your data is consistently formatted

* Correct data that is obviously wrong e.g. wrong postcode for a known suburb

* Find other records that are likely to be the same (more on this later)

So “Why” would you want to do that?

To explain why, I am going to use the example of a customer database, but the principles apply to other types of data also.

Have you ever received a marketing message / catalogue in the mail twice or more times? I receive multiple copies of such communications regularly, and I don’t always get around to telling the sender of their mistake. This can:

* be interpretted as sloppiness on the part of the organisation

* undo your efforts to target / personalise – any attempt on the organisation’s part to “personalise” and “target” the message is wasted, because the recipient knows immediately that it was a mindless distribution of information using a database.

* waste $$$! Everytime you send a communication twice to the one person or household, you have most likely just wasted some of your hard-earned funds.

In addition, cleaning your data, will help you to analyse your data more accurately. For instance, you will know the real number of contacts and perhaps how they are geographically distributed, rather than the distorted figures that can be derived from analysing a corrupted database.

It’s not a crime! In fact it is very easy for your data to get in a state that requires cleaning. For example, when a client changes their address, your staff might update the suburb but forget to put in the new postcode. Or, an existing client returns to your organisation several years later, without informing new staff that they are an existing client, and if you don’t have the appropriate keys on your database preventing duplicates, the client could be set up again as another customer with the same or similar details.

Having documented processes that your staff can use as a checklist, and appropriate unique keys on your database fields, will go some way to ensuring that your data is kept clean, but incorrect data will never be prevented.

“How” then, do you efficiently clean your database?

Fixing incorrect information such as the postcode matching the suburb is usually done by comparing each record to the correct values in another table. For example, to correct all the postcodes in your data, assuming that the suburb entered is correct, you would write SQL code that would compare the postcode of your record against a table of postcode + suburb + state that you may have obtained from Australia Post. Such a process would likely generate a list of records where the suburb was not found, requiring you to manually investigate and correct the data.

Correcting the formatting of your data, is usually done using some pretty simple SQL perhaps combined with logic programming. You need to decide the format you wish to apply to your data, for example, whether you would like the suburb in title case or all capitals. While this is much less important than getting the data actually right, it can help to make your communications look more professional.

Finding duplicates is a fairly easy task for someone who knows a little about the SQL database language. It is more difficult to find similar records that really are the same person, but are not listed in exactly the same way in your database. For instance the following two records may actually be the same person:

ID Firstname Surname Address1 Address2 Suburb Postcode State
3442 John Citizen PO Box 33 Frankston 3199 VIC
682 Jonathon Citien 14 Beach Road FRANKSTON 3199 VIC

Finding records such as the above calls for what is usually called “Fuzzy” Matching. Software is available to find such records, and much more experienced SQL programmers could write software to find such possible duplicates.

Because you can’t confidently use logic to determine whether or not two records are the same in the case given above, usually fuzzy matching would leave the data as is, but produce an exception report, highlighting likely duplicate records.

Even when you can determine confidently that two records are the same, you may wish to manually process the data cleanup to ensure that only the correct data is kept, and that all associated pieces of information are transferred across to the valid record e.g. customer payment history. It is possible however, to set up your de-duplication process to remove all the duplicates and clean up all the records automatically.

Cleaning your database can take some time, and some manual effort on the part of your staff. If you are just starting out with a new database, it is very worthwhile to:

1. Agree and document the data structure, and what information will be stored in what field (which isn’t always obvious despite the names you might give fields)

2. Agree the format of the data entered into each field

3. Agree a process to handle the case where a record needs to be entered that won’t fit into the current structure

Tag:

Add to Del.icio.us | Digg | Reddit | Furl

Bookmark WebProNews:

If you need help cleaning your database, Contact Point can help you. We provide a quick and efficient service to deal with all the database issues discussed above, and can tailor our service to meet your particular needs. Submit a request now for an obligation free quote.

Heather Maloney is the Managing Director of Contact Point IT Services Pty Ltd (http://www.contactpoint.com.au) – a company providing IT Solutions to small to medium sized businesses, that deliver measurable results. Contact Point is focused on helping businesses to interact better with their clients, customers, suppliers and other 3rd parties.

The What, Why, and How of Database Cleansing
About Heather Maloney
If you need help cleaning your database, Contact Point can help you. We provide a quick and efficient service to deal with all the database issues discussed above, and can tailor our service to meet your particular needs. Submit a request now for an obligation free quote.

Heather Maloney is the Managing Director of Contact Point IT Services Pty Ltd (http://www.contactpoint.com.au) - a company providing IT Solutions to small to medium sized businesses, that deliver measurable results. Contact Point is focused on helping businesses to interact better with their clients, customers, suppliers and other 3rd parties. WebProNews Writer
Top Rated White Papers and Resources
  • http://www.dataladder.com/ Norm Dutreau

    Great article, right on the money.

    We found we had trouble choosing a vendor to help us with Data Cleansing and wanted to share our experiences.

    There are a wide variety of Data Cleansing solution sellers in the market offering to fix your problems.

    We found many of these vendors were looking for six figure investments to help clean up our current data. They also were very poor at standardizing item information.

    Their implementation times were also out of hand, weeks or months.

    We found a great packaged solution from a company called Data Ladder at http://www.dataladder.com/
    Using their products we were able to standardize our item data and harmonize customer and vendors in a few days. I don’t usually highlight a vendor publicly, but in this case we saved so much time and money I think other CIOs can learn from the experience. Legacy data cleanup can be done QUICKLY AND CHEAPLY. Just avoid being ‘sold’ and find a solution that makes sense.

  • http://www.amarketforce.com jShrimali

    Nice article!

    We at aMarketForce, providing prospect database development services ensuring that clients get a ready-to-use custom business lead list of pre-qualified, validated prospects to provide to their sales force. Along with the list of validated contacts, aMarketForce can provide detailed contact information to the target decision makers and go as far as generating B2B leads.

    We have also started CRM database cleansing services; here is the link for your information http://www.amarketforce.com/crm-data-cleansing.php

    Have a great day!

    Regards,
    Jshrimali

  • Join for Access to Our Exclusive Web Tools
  • Sidebar Top
  • Sidebar Middle
  • Sign Up For The Free Newsletter
  • Sidebar Bottom