31 October 2008

What Data Are You Using for Testing?

Creation of representative data sets is very difficult and time consuming. Therefore, developers and testers often want live data from your web application to work with. This needs to be complete, current and representative. Make sure this is controlled - live data should never be used in development or testing. Live data are also sometimes requested for "running reports offline" - this can be just as risky and illegal.

For example, I've seen a developer testing their work on an email broadcast module send thousands of test messages to real customers of their client because they were working with an exact copy of the live data.

Live data may contain personally identifiable information, authentication credentials or other sensitive information such as contract values or intellectual property. Three methods that can be of help are:

  • Extracting a subset - to limit the number of records
  • Anonymise records - to remove personally identifiable information
  • Masking - to hide sensitive information

On smaller, less complex, systems where the file & database structure and data content are well understood, this could be undertaken manually, or more likely with some pre-prepared scripts. If possible, these should be done on the devices hosting the existing data, so as not to have to manage the transfer of such data elsewhere first. The extract could be exported, treated and then removed from the server over a secure transmission protocol, to a possibly less-secure destination.

In more complex systems such as customer relationship management (CRM) and enterprise resource planning (ERP) applications, the effort of extracting subsets, transforming and masking data can be very time-consuming, complicated to maintain data integrity and difficult to ensure statistical consistency. A difficulty can be the introduction of inconsistences... for example in an insurer's data where the postcode and house insurance premium are inter-related. In these cases, tools can be purchased to help with the task. However, firstly understand the purposes for which the data extract will be used, and ensure that the tool can be used to generate suitable data sets.

Ensure the extracted data sets cannot be reverse engineered back into the original data, are tracked and disposed of securely at the end of their use. Don't forget that "data" can exist in formats other than database files and office documents... in images, multimedia files, caches, logs and backups.

Are there legal restrictions? Under the Data Protection Act 1998 (DPA), you need to inform subjects of your intended uses for the data they provide. If they haven't agreed to its use for testing your systems, you mustn't use it in this way. Remember, if the data cannot be used to identify individuals, the DPA doesn't apply.

Do you have any experiences to share?

Update 7th November 2008: The question of whether IP addresses are personally identifiable data often arises. Comments by Peter Hustinx, the European Data Protection Supervisor, at the RSA Conference Europe 2008 are a useful reminder that nameless data, such as IP addresses, could be personal data and are thus protected by data protection legislation.

Posted on: 31 October 2008 at 08:03 hrs

Comments Comments (0) | Permalink | Send Send | Post to Twitter

Comments

Comments are filtered automatically and should appear shortly after they been checked.

Post a comment
Confirm acceptance and understanding of the terms of use
New posts to this thread will be sent to your email address
What Data Are You Using for Testing?
http://www.clerkendweller.com/2008/10/31/What-Data-Are-You-Using-for-Testing
ISO/IEC 18004:2006 QR code for http://clerkendweller.com

Page http://www.clerkendweller.com/2008/10/31/What-Data-Are-You-Using-for-Testing
Requested by 38.107.179.221 on Tuesday, 7 February 2012 at 21:27 hrs (London date/time)

Please read our terms of use and obtain professional advice before undertaking any actions based on the opinions, suggestions and generic guidance presented here. Your organisation's situation will be unique and all practices and controls need to be assessed with consideration of your own business context.

Terms of use http://www.clerkendweller.com/page/terms
Privacy statement http://www.clerkendweller.com/page/privacy
© 2008-2012 clerkendweller.com