Last year, Apple kicked off a massive experiment with new privacy technology aimed at solving an increasingly thorny problem: how to build products that understand users without snooping on their activities.
Its answer is differential privacy, a term virtually unknown outside of academic circles until a year ago. Today, other companies such as Microsoft and Uber Technologies are experimenting with the technology.
The problem differential privacy tries to tackle stems from the fact that modern data-analysis tools are capable of finding links between large databases. Privacy experts worry these tools could be used to identify people in otherwise anonymous data sets.
Two years ago, researchers at the Massachusetts Institute of Technology discovered shoppers could be identified by linking social-media accounts to anonymous credit-card records and bits of secondary information, such as the location or timing of purchases.
”I don’t think people are aware of how easy it is getting to de-anonymise data,” said Ishaan Nerurkar, whose start-up LeapYear Technologies sells software for leveraging machine learning while using differential privacy to keep user data anonymous.
Differentially private algorithms blur the data being analysed by adding a measurable amount of statistical noise. This could be done, for example, by swapping out one question (have you ever committed a violent crime?) with a question that has a statistically known response rate (were you born in February?). Someone trying to find links in the data would never be sure which question a particular person was asked. That lets researchers analyse sensitive data such as medical records without being able to tie the data back to specific people.
Differential privacy is key to Apple’s artificial intelligence efforts, said Abhradeep Guha Thakurta, an assistant professor at University of California, Santa Cruz. Mr Thakurta worked on Apple’s differential-privacy systems until January of this year.
Apple is now expanding its use of differential privacy to cover its collection and analysis of web browsing and health-related data.
Source: theaustralian