Skip to content

A Social Contract for Data Collection

I just returned from Strata + Hadoop World in San Jose, where I gave a talk entitled “Kosher Collection: Best Practices in Data Handling“.  I really had an amazing time at Strata this year and major kudos to the organizers for putting on a great show.

The central premise of my talk is that in today’s world, there is a social contract between data collectors and consumers.  Essentially the agreement is that consumers give their personal data to a data collector in exchange for mutual benefit.  The problem is that consumers, in general, lack an understanding of the technology as well as data collection and as a result, are unable to provide informed consent.  Furthermore, this issue is likely to be exacerbated in the future as the opportunity to opt-out of mass data collection is disappearing.

Good and Bad Data Collectors

A good example of this social contract working well is Waze–an app which uses mass collection to identify traffic patterns and provide you with the optimal route to your destination.  Waze is able to calculate this by collecting your movement, and the movement of all the Waze users around you to identify areas with bad traffic.  This data, is incredibly sensitive as if it were to fall into the wrong hands, yet I feel comfortable sharing my data with Waze because I derive benefit from it, and I feel that this benefit is worth it.

To illustrate the opposite, consider the flashlight apps that were popular before iOS7.  Several researchers discovered that some of these apps were collecting data which the app did not need to function, presumably to sell this data for advertising purposes.  (Read about it here at PCMAG).  It remains unclear what actually happened to the data that these apps collected, but there is no doubt that they did have the ability to do so.  In any event, this is a prime example of where that social contract has broken down.  The flashlight app does not provide any benefit that warrants anything more than access to the flash, and yet it was acting as a data parasite.

The fundamental problem is that data collectors aren’t really required to tell their customers what data they are collecting and how it is being used.  But wait… what about the privacy policy/EULA?  The reality is that in most cases, these documents are too long and nearly incomprehensible for most people to truly grasp what they are agreeing to.

What to do about it?

As with any contract, there are multiple parties.  Let’s look at the consumer side first. Consumers fundamentally need to do two things:

  1. Educate themselves about data: We as the drivers of the data economy can help with that, but the bottom line really is that if people care about how data is being used, then they need to understand so that they know where the industry is going.  Since data collection affects us all,
  2. Be selective about whom to give data:  Consumers need to realize that they do have a choice about which companies and services they give their data to.  If enough consumers decide not to give data to a collector, then it can influence data collectors’ behavior.

On the data collectors’ side, data collectors need to do a much better job of informing customers about how their products use customers’ data, and what data is being collected.

  • Create a code of ethics for data collection:  Basically, I think there needs to be a universally agreed to code of ethics for data collection.  As a data geek, I fully understand the need and desire for data collection platforms to want to collect as much data as possible.  Access to raw data is the fuel for innovation, and without that you really would stifle innovation.  However, this desire does need to be balanced with consumers’ rights to privacy.   Basically, speaking as a customer, I want to know, in plain language: what data is being collected, how is it being used, how/where is it being stored, who has access, how long is the data being kept, and that the data is being secured using industry standard procedures.  I also believe that companies should commit not to share data with any company that does not agree to these practices.
  • Label data collection products so that consumers know what data is being collected BEFORE they purchase the product:  In my Strata talk, I compared this with the food industries who have a series of certifications and labels so that consumers know what is in the product before they purchase it.  In the ideal world, devices and apps would also adopt a labeling system to inform customers of how much sensitive data they collect, and to do so in a way which consumers can understand.

Voluntary Ethics Enforcement can Prevent Government Intervention

I believe that it is in the interest for companies that rely on data to create an industry wide standard of ethics, as well as label their products simply because it increases trust in their products.  Customers like doing business with companies that they trust and therefore it can be a key differentiator in the marketplace.   Usually data collection isn’t really on most people’s mind until the event of a data breach.  I would really hate to see some heavy handed legislation come into being as a knee-jerk reaction to a data breach and I suspect having solid voluntary measures in place could largely prevent that from happening.

Share the joy

One Comment

Leave a Reply to The Morality of Data Collection | Open Data Science Cancel reply

Your email address will not be published. Required fields are marked *