In the last few days I’ve been pleased to see that the issue of privacy which I wrote about last week about ISPs being able to sell your browsing history is getting a lot more coverage in the tech blogs such as Wired, Gizmodo, ArsTechnica, FOSSBytes and many others. I read a lot of different news sources and I was hard pressed to find ANY writers supporting this government action.
This morning, DJ Patil, former US Chief Data Scientist, posted some comments about how much damage a data scientist could do with everyone’s browsing history. (Complete thread here) It occurred to me that data science is perhaps one of the most powerful professions of the 21st century and yet, there is no certifying body to determine who is and who is not a data scientists, nor is there any licensing, nor any professional society to establish rules of professional conduct. Virtually every profession which can have an impact on the public has some sort of licensing or professional conduct code, and yet data science does not. After all, even lawyers have a code of professional ethics.
As more information is collected, and our ability to store and analyze that ever expanding amount of information is also increasing exponentially, the impact data science can have–for good or bad–is likewise increasing at exponential rates. Yet nobody is asking the question: How to we make sure this power is used for good and for legal purposes?
I would therefore like to propose that the code of conduct below be used as a starting point for a discussion about ethics of data science.
A data scientist should act honorably, honestly, reasonably and legally in their professional capacity and use data science for the betterment of society.
I will respect privacy rights of individuals and organizations by:
- seeking permission from the subjects before collecting data, when appropriate
- only using data that has been lawfully collected
- informing the subjects of data collection, what is being collecting and what will be done with the data
- take reasonable efforts to protect the data from unauthorized use
I will uphold the highest standards of analytic integrity by:
- be transparent in model design, especially when the models have serious implications or consequences
- understand the models I deploy
- explain the assumptions, strengths, weaknesses and limitations of model, algorithm or analysis to decision makers
- never misrepresent or exaggerate the capabilities or limitations of a model, algorithm or analysis
- make every effort to insure that model, algorithm or analysis do not serve to reinforce discrimination, racism, sexism or any other injustice