Data Practitioners and Their Obligation to the Public

Data scraping is the process of obtaining technological information about an entity from a digital space. I view data scraping in today’s world as a process for obtaining information about a person or group of people. More often than not, this information is obtained without people’s knowledge or consent, ultimately because it is a relatively new technological concept that only recently has been brought to the public eye.

This is the heart of the problem with data scraping: the majority of people have no idea this practice exists or what this means for their personal information. Moreover, since there is such a lack of worldly knowledge about data scraping/practitioners, there are no sufficient laws or regulations to aid in the prevention of these practices (Vallor, 3).

We arrive at the issue of how technological advancement surpasses general world knowledge, ultimately introducing the ethical obligations data practitioners have to the public in which they scrape data from.

The specific problems introduced by data scraping are those of security and privacy at a macro and micro level. Focusing on the work of Shannon Vallor, a professor of Philosophy at Santa Clara University, case study 4 discusses the ethical obligations that were violated by data practitioners (Vallor, 33).

On a micro level, the users of OkCupid disclosed personal information under the impression that only other users would be utilizing this information with the intention of forming a relationship. The technologists behind the study obtained the personal information of OkCupid users with the intention of publicizing the data, in a context that these users did not agree or consent to. It is significant to add that the data sets that were publicized, paired with other data sets, drastically increases the odds of the identities of these users being exposed as well. The authors wrote in their paper regarding their study:

“Some may object to the ethics of gathering and releasing this data…However, all the data found in the dataset are or were already publicly available, so releasing this dataset merely presents it in a more useful form”

This statement acknowledges that information was taken and purposefully presented in a different form, however, the consent to take this information in the first place was never granted by the users of OkCupid. Irrespective of the legality of this technology, data practitioners have an obligation to the public to not expose or present information in an unintended context.

On a macro level, the data practitioners should have acted under ethical obligation instead of justifying their actions through a lack of technological laws/regulations. The authors of the study were not cautious of the terms and policy of the site either, ultimately justifying unlawful practices for future data practitioners to follow, given that very few regulations are currently in place. One of the authors, Kirkegaard, when asked about the site and their policies for data scraping tweeted:

“Don’t know, don’t ask. :)”

Ultimately, the data practitioners in this case study had a complete disregard for ethical reasoning on both a macro and micro level. They had no regard for the users of OkCupid, their understanding of the site, or the site’s terms and conditions. They also clearly acknowledged that they understand the lack of regulations and enforcement within data scraping through their public responses, which is ethically irresponsible and not right.

The solution to data scraping is that of a contextual situation: use data ini the context it is presented in. Of course this technology can be used to benefit the public. However, the vast majority of people do not now what this practice is or how it can be used. It is imperative that data practitioners act with understanding that one cannot object if they are unaware to begin with: ethical reasoning is must be the true guidance in the industry of data scraping.

Van Schie, Gerwin, et al. “Get Your Hands Dirty: Emerging Data Practices as Challenge for Research Integrity.” The Datafied Society: Studying Culture through Data, edited by Mirko Tobias Schäfer and Karin Van Es, Amsterdam University Press, Amsterdam, 2017, pp. 183–200. JSTOR, www.jstor.org/stable/j.ctt1v2xsqn.18. Accessed 16 Mar. 2021.

Vallor, Shannon, and William J. Rewak. “An Introduction to Data Ethics MODULE AUTHOR: Shannon …” Https://Www.scu.edu/Media/Ethics-Center/Technology-Ethics/IntroToDataEthics.pdf, www.scu.edu/media/ethics-center/technology-ethics/IntroToDataEthics.pdf.

Gramlich, John. “10 Facts about Americans and Facebook.” Pew Research Center, Pew Research Center, 31 July 2020, www.pewresearch.org/fact-tank/2019/05/16/facts-about-americans-and-facebook/.

Haselton, Todd. “Zuckerberg Says Most Facebook Users Should Assume They Have Had Their Public Info Scraped.” CNBC, CNBC, 4 Apr. 2018, www.cnbc.com/2018/04/04/facebook-most-people-could-have-had-their-public-profile-scraped.html.

Stella, Shiva. “Public Knowledge Responds to Hyp3r Scraping Instagram User Data Without Consent.” Public Knowledge, 22 Aug. 2019, www.publicknowledge.org/press-release/public-knowledge-responds-to-hyp3r-scraping-instagram-user-data-without-consent/.