MIT researchers say mobility data is a “double-edged sword” – it could be possible for individuals to be identified.
We often hear that data is the key to more integrated mobility services which could help us move around cities more easily. However, researchers at MIT have highlighted privacy risks related to the growing practice of compiling massive, anonymised datasets about people’s movement patterns.
Companies, researchers and other entities are beginning to collect, store and process anonymised data that contains “location stamps” (geographical coordinates and time stamps) of users. Data can be grabbed from mobile phone records, credit card transactions, public transportation smart cards, Twitter accounts and mobile apps.
Merging those datasets could provide rich information which helps to optimise transportation and urban planning, among other things.
A new MIT study has found that with only a few randomly selected points in mobility datasets, someone could identify and learn sensitive information about individuals.
A new MIT study has found that with only a few randomly selected points in mobility datasets, someone could identify and learn sensitive information about individuals. With merged mobility datasets, this becomes even easier. An agent could potentially match users’ trajectories in anonymised data from one dataset, using de-anonymised data in another.
In a paper published in IEEE Transactions on Big Data, the MIT researchers show how this can happen in the first-ever analysis of so-called user “matchability” in two large-scale datasets from Singapore – one from a mobile network operator and one from a local transportation system.
The researchers use a statistical model that tracks location stamps of users in both datasets and provides a probability that data points in both sets come from the same person.
In experiments, the researchers found the model could match around 17 per cent of individuals in one week’s worth of data, and more than 55 per cent of individuals after one month of collected data.
Explaining more about the study, the researchers give the following example: “To understand how matching location stamps and potential deanonymisation works, consider this scenario: I was at Sentosa Island in Singapore two days ago, came to the Dubai airport yesterday, and am on Jumeirah Beach in Dubai today. It’s highly unlikely another person’s trajectory looks exactly the same. In short, if someone has my anonymised credit card information, and perhaps my open location data from Twitter, they could then deanonymise my credit card data.”
Daniel Kondor, a postdoctoral researcher in the Future Urban Mobility Group at the Singapore-MIT Alliance for Research and Technology, said: “As researchers, we believe that working with large-scale datasets can allow discovering unprecedented insights about human society and mobility, allowing us to plan cities better. Nevertheless, it is important to show if identification is possible, so people can be aware of potential risks of sharing mobility data.”
"We felt that it was important to warn people about these new possibilities [of data merging] and [to consider] how we might regulate it."
Co-author Carlo Ratti, a professor of the practice in MIT’s Department of Urban Studies and Planning and director of MIT’s Senseable City Lab, commented: "We felt that it was important to warn people about these new possibilities [of data merging] and [to consider] how we might regulate it.”
Ratti, Kondor, and the research team are working on the ethical and moral issues around big data.
In 2013, the Senseable City Lab at MIT launched an initiative called Engaging Data, which involves leaders from government, privacy rights groups, academia, and business, who study how city data can and should be used by today’s data-collecting firms.
“The world today is awash with big data,” Kondor said. “In 2015, mankind produced as much information as was created in all previous years of human civilisation. Although data means a better knowledge of the urban environment, currently much of this wealth of information is held by just a few companies and public institutions that know a lot about us, while we know so little about them. We need to take care to avoid data monopolies and misuse.”
You might also like: