Earlier this month I discussed a seminar being organised by the Information Commissioner's Office (ICO). I was fortunate enough to be able to attend the event on Wednesday at The Wellcome Trust on Euston Road in London.
The event began with a welcome from Christopher Graham (Information Commissioner, ICO). He explained the seminar was not a theoretical debate about legal definitions, but instead a discussion of the current and emerging practical risks of re-identification. In particular he hoped ideas would form on how best to assess and mitigate the privacy risks of some form of statistic leading to someone being identified.
Sir Mark Walport (Director, The Wellcome Trust) continued on this theme but focused on the medical research sector. He explained that having good data is inextricably linked to good public health. He outlined various benefits of sharing data to individuals and the public, and identified proportionality, choice of terms of service and confidentiality vs. consent as the key issues. He also touched on some of the content in the Data Sharing Review Report, written in conjunction with the previous Information Commissioner Richard Thomas.
Dr Mark Elliot (University of Manchester) discussed anonymisation as disclosure avoidance and the need for formal disclosure risk assessments. This can include undertaking simulated data intrusions to help rank file riskiness, in a similar way an organisation might rank processes or applications by other forms of operational risk. He explained such processes need to consider the intruder's motivations, the consequences of disclosure (to individuals, organisations and society), but also that it needed to take into account the issue of spontaneous recognition.
Following a short break, Nicola Westmore (Cabinet Office) outlined the government's transparency agenda which has the aims to promote efficiency & effectiveness, improve public services and allow citizens to make an informed choice. She talked about the privacy risks inherent in data.gov.uk and the drivers for government data disclosure.
Dr Kieron O'Hara (University of Southampton) asked whether transparency will pose a threat to privacy, especially in the areas of crime data and demand-driven transparency which he believes will be strongest in the area of health, education and court data. He said that privacy is not only a legal matter — it is not just data protection, as this is insufficient to retain trust, the law has grey areas, and citizens' perceptions do not follow the content of the Data Protection Act. He felt the law was not the answer and a discussion was needed between transparency activists, privacy activists, technical experts, domain experts and information entrepreneurs. He would also like to see auditable debate trails by organisations making decisions as to whether and what data are released.
Dr Marie Cruddas (Office for National Statistics) talked about the balance between data utility and risk. She walked through the confidentiality protection framework, used to determine how data are released by the ONS. This considers the end-user requirements, data quality, sensitivity, age, coverage and other characteristics, a disclosure risk assessment, disclosure controls (legal, ethical and practical), management of disclosure risk and implementation. An interesting idea was the concept of undertaking a penetration test on data sets, to see how they can be re-identified alone, or together with other data sets.
Once delegates had re-assembled from the lunch break, Paul Ohm (University of Colorado) described how there is a perception that anonymisation is ubiquitous, trusted and rewarded by law in terms of benefits and exemptions. He described how even relatively innocuous data can be used to identify individuals and discussed how policy makers should respond. He believes lists of personally identifiable information (PII) are unsustainable and that technology will not be a solution, partly due to the accretion problem where we creep closer and closer to personal data releases. He believes in the use of contextual risk assessments, best effort approaches, consideration of risks, motives & criminal behaviour, accountability measures and reduction in unjustifiably risks collection of information. I can see how threat modelling can be extended into this area further.
Barry Ryan (Market Research Society) provided a background to the MRS' principles, from classical research to how this has changed through the use of non-anonymous participation, qualitative groups, online market research communities, and ethnographic and deliberative techniques. Research clients often provide individual contacts, and they are demanding more information which is more detailed.
David Smith (Deputy Commissioner and Director of Data Protection, ICO) chaired the panel discussion where the speakers discussed whether access controls are useful, the rights of individuals to compensation and redress, audit trails for data downloads, the usefulness of a register of data controllers, anonymisation as a failed concept, the influence of China on the internet with its focus on traceability, the need for trust, effort needed in the education system and, inevitably, the need for further research.
David Smith thanked all the speakers and provided an engaging summary of the seminar. Since he considered the outcome was that true anonymisation is not possible, this made summing up more difficult. The ICO will develop and issue a report on the day, together with the presenter's slides, and David Smith asked if there were any further contributions, to forward them to the ICO.
My own conclusions? The situation is complicated, and there isn't yet agreement on the best path forward. Anonymisation is a partial privacy protection method, but data can almost always be re-identified and therefore it cannot be relied upon as a definitive protective measure, or as an excuse/exemption from data protection requirements. It seems there may be a move towards risk assessments rather than specified conditions and controls.
But do read Paul Ohm's paper Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization which I highlighted in a previous post about test data. He also provided the best quotation of the day: "Data can be either useful or perfectly anonymous but never both".
Update 5th August 2011:A report of the proceedings is now available.