On data quality and the product database disaster

Business Quality Control collage concept

In the enterprise world, using databases is nearly as old as computing. On the other hand the destination and the purpose data are used for have changed over time, what force business to cope with the consequences of their negligence.

The primary purpose of a data base is storage and easy access to data. There’s no need to demonstrate why a product database is much more efficient to use than a catalogue with thousands of items, each one available in many version or why an enteprise directory is more efficient than the old phonebook it replaced.

The end of database auto-consumption

During a long time, data were mainly auto-consummed. Understand that they were used by the same people that put them in the base. Their quality mainly depended on the ability of the user to master his matter enough to cope with inexact or missing data.

For instance, when technical experts enter product data in a product database they are the main users of, it does not matter if they are rigorous when they enter that data, use the right unit of measurement. One product with centimeters, another with milimetters…it does not matter. They know what it’s about and understand the data. Even if some data are not up to date, they will correct by themselves.

It’s the same with enterprise directories. They allow to take into account a lot of information about an employee but, historically, people only need to find the name, phone number and email. As a result many fields were left empty or filled once and never updated.

Then businesses understood they could deliver more value with their data.

A product database can be used to provide a configurator to clients, to build a recommendation engine. Enterprise directories have been used to provide data to people’s profile in the enterprise social network.

Shit in, Shit out

In both cases businesses often had an heart-attack when they lift the lid of their databases. Incoherence of measurement units, typing errors, updates not made. I also remind of enterprise social networks projects where at the moment of connecting the directory to the rich profile businesses saw the state of collapse of the directory data. Or businesses starting an ecommerce project and realizing their product database is in a terrible state.

Now that data utilization and exposition is not limited to people able to find their way in the mist, we realize to what extent data quality has been neglected. I don’t even mention big data projects where data quality is a basic requirement. The quality of the output will never be better than the quality of the input.

Artificial intelligence and bots to clean up databases

There’s nothing really new here. Businesses have always been aware of the issue but it required too much work to clean things in regard to the expected benefit. In 2016 an Experian study shown that 9 enterprises out of 10 did not trust their data and that 27% of data were considered reliable. Now that it’s vital to expose and process these date, either for ecommerce, big data, AI etc, it’s essential to deal with the issue.

The good news is that today we have more or less smart solutions to do what would have been like baling the water of an ocean out with a spoon for an human. They can detect inconsistency in database and in the best case even correct them.

However, while the cleaning process is in progress, more and more data is being collected and entered every day. Hence the need to :

1°) Make employees aware of the importance of data quality

2°) Use, when critical, robots or intelligent assistants that will support people entering data or report issues to the people able to fix them.

This matter may look both trivial and obvious but it’s a real concern for many businesses that discover how poor is the quality of their data the day they decide to start a major data or ecommerce project and have to postpone it until they have cleaned their databases.

Photo Credit : Fotolia.