Data Validation

An Approach to Data Import Validation

Data import validation is not something that I have had to worry about for a while. I recently had to develop some code for an application that I support to import data from a CSV file, match selected fields against data in an Oracle database, and add the information contained in the file to printed output that the application generates.

The tasks was a fairly simple one and I suspected from the start that the biggest problem was likely to be matching data from the CSV file with the correct records in the database. Developing an interface between systems which are loosely coupled in this manner is open to all manner of difficulties when you have no control over the quality and accuracy of the data in the interface file.

The initial implementation appeared to work perfectly well with the limited amount of test data that I had available and was delivered to the customer for them to try. After some lengthy delays whilst the customer waiting for the application that generated the CSV file to be finished they commenced their testing and immediately reported that too few records were being matched. Was I surprised?

Despite of the fact that the code I had developed included what I thought was a fairly comprehensive set of logging options, generating a log file that could show exactly how each record in the CSV file had been handled, the customer needed help to identify why records were not matched. A quick look at the log file revealed where the problems were and some minor changes were made to handle certain conditions, but a new approach was required.

After given this some thought it occurred to me that instead of ignoring the data from the CSV file that didn’t match any information in the database, with a fairly trivial change I could import these records to an put them into a special status of ‘hidden’, where they would be ignored by the rest of the system unless the user set a flag to indicate that they should be shown. Another minor change to the user interface to add this special flag and suddenly the users could see all of the data and inspect the records where no match had occurred. Why didn’t I think of that at the beginning?