Michigan Law Review recently published my Note, Unfair Collection: Reclaiming Control of Publicly Available Personal Information from Data Scrapers.1Andrew M. Parks, Note, Unfair Collection: Reclaiming Control of Publicly Available Personal Information from Data Scrapers, 120 Mich. L. Rev. 913 (2022). If you’d like to read the piece online, it is available at: https://michiganlawreview.org/journal/unfair-collection-reclaiming-control-of-publicly-available-personal-information-from-data-scrapers/. And the PDF can be downloaded at: https://repository.law.umich.edu/mlr/vol120/iss5/5/. I’d like to thank the journal and all of its members for all of the effort that went into editing and publishing my piece. I’m very proud of it, and I encourage anyone reading this post to read that instead. But if you’re interested in diving a bit deeper into data scraping, privacy law, and the First Amendment, feel free to keep reading. This post proceeds in two parts: First, it quickly summarizes my Note. Next, it provides a (potentially) better way of looking at the free speech concerns outlined in my Note with respect to protecting publicly available personal information.
Summary of my Note in Michigan Law Review
My Note is about reforming California’s consumer data protection legislation to guard individuals’ publicly available personal information from internet data scrapers.2Id. In the piece, I argue that the California Consumer Privacy Act (CCPA), as amended by 2020’s California Privacy Rights and Enforcement Act (CPRA), isn’t sufficiently aligned with the protections afforded by the EU’s General Data Privacy Regulation (GDPR).3Id. And because it’s not, it leaves open a window for scrapers to collect, monetize, and sell individuals’ personal data without providing notice to the individual.4Id. In other words, it leaves data scrapers the unrestricted ability to collect personal information, whereas this activity would be limited in Europe.
This is possible for two reasons: (1) California’s definition of “personal information” excludes publicly available information;5Cal. Civ. Code § 1798.140(v)(2) (effective Jan. 1, 2023) (“ ‘Personal information’ does not include publicly available information or lawfully obtained, truthful information that is a matter of public concern. For purposes of this paragraph, ‘publicly available’ means: information that is lawfully made available from federal, state, or local government records, or information that a business has a reasonable basis to believe is lawfully made available to the general public by the consumer or from widely distributed media, or by the consumer; or information made available by a person to whom the consumer has disclosed the information if the consumer has not restricted the information to a specific audience. ‘Publicly available’ does not mean biometric information collected by a business about a consumer without the consumer’s knowledge.”). and (2) the legislation’s notice requirement does not apply when businesses collect personal information indirectly from a consumer6Cal. Code Regs. tit. 11, § 999.305 (West 2021) (“A business that does not collect personal information directly from the consumer does not need to provide a notice to the consumer if it does not sell the consumer’s personal information.”).—for example, from an intermediary source like as a public website. As my Note explains, similar deficiencies exist in the consumer data privacy bills passed in Colorado7Colorado Privacy Act, Colo. S.B. 21-190 § 6-1-1303(17)(b) (2021), https://leg.colorado.gov/sites/default/files/2021a_190_signed.pdf (excluding publicly available information from the scope of its regulations). and Virginia,8Va. Code Ann. § 59.1-575 (excluding publicly available information from its definition of “personal data”). rendering individuals’ personal information equally susceptible to extraction by data scrapers.
Data privacy laws in Europe fill these gaps. Personal data in Europe is governed by the General Data Protection Regulation (GDPR).9Regulation 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC, 2016 O.J. (L 119) 1 [hereinafter General Data Protection Regulation]. The GDPR’s definition of “personal data” contains no exemption for publicly available information,10General Data Protection Regulation, supra note 9, art. 4(1). and its notice requirement, found in Article 14, applies even when the data is collected from a party other than the data subject11Id. art. 14(1) (“Where personal data have not been obtained from the data subject, the controller shall provide the data subject with the following information: (a) the identity and the contact details of the controller and, where applicable, of the controller’s representative; (b) the contact details of the data protection officer, where applicable; (c) the purposes of the processing for which the personal data are intended as well as the legal basis for the processing; (d) the categories of personal data concerned; (e) the recipients or categories of recipients of the personal data, if any; (f) where applicable, that the controller intends to transfer personal data to a recipient in a third country or international organisation . . . .”).— for instance, by scraping it from a publicly accessible website.
In practice, these regulations work to deter and to remedy scraping personal data. In April 2021, Spain’s data protection authority (Agencia Española de Protección de Datos, or “AEPD”) fined Equifax for its scraping activity, conducted in violation of the GDPR.12Catherina Stupp, Data Scraping in EU Regulators’ Sights as Spain Orders Equifax to Delete Information, Wall St. J. (May 6, 2021 5:30 AM), https://www.wsj.com/articles/data-scraping-in-eu-regulators-sights-as-spain-orders-equifax-to-delete-information-11620293400. Equifax scraped data involving individuals’ outstanding debts from publicly accessible government web sources and then included the data in credit reports.13Id. The AEPD found Equifax violated Article 14 of the GDPR because it did not notify individuals whose data it collected, even though the data were publicly available.14Id. It fined Equifax $1.1 million and ordered it to delete all the data it collected.15Id.
This example demonstrates that the GDPR’s protections function to deter companies from scraping individuals’ personal data, even when publicly accessible, and to at least require notice when they do.16For more on how Europe’s GDPR functions to deter scraping publicly available personal information, I recommend reading this piece: Fiona Campbell, Data Scraping – Considering the Privacy Issues, Fieldfisher (Aug. 27, 2019), https://www.fieldfisher.com/en/services/privacy-security-and-information/privacy-security-and-information-law-blog/data-scraping-considering-the-privacy-issues. For U.S. persons’ publicly available personal data, though, no such deterrence or remedy exists. Scrapers are free to collect, monetize, and sell our personal information, all without providing notice. As my Note explains, unfettered scraping of individuals’ personal data comes with dangerous and undesirable consequences.17Parks, supra note 1, at 917–19, 925–29 (providing examples to explain how data scraping can be used for malicious purposes and can lead to dangerous consequences).
To deter scraping personal information, my Note suggests California amend its privacy legislation to require companies to notify individuals when they collect their data, even when the data are publicly available.18Id. at 937–41. My proposal would then exempt from the notice requirement data that have been collected “fairly”—as determined by considering the personal nature of the information collected, the volume of the information collected, and the purpose of the collection (including, among other considerations, whether it was collected in bulk and for commercial purposes).19Id. at 939–41.
Better arguments: First Amendment deep-dive
Although I think my Note’s proposal would help guard our personal data, I didn’t dedicate enough thought to at least one argument. On page 942, my Note addresses the First Amendment concerns associated with regulating the collection of publicly available personal information.20Id. at 942. I stated that, if scraping qualifies as speech, my proposal would be subject to intermediate scrutiny because it regulates scraping personal information for commercial purposes, and commercial speech regulations are subject to intermediate scrutiny.21See id. at 943. I now realize there’s probably a better way of analyzing and framing this aspect of my proposal.
Speech doesn’t necessarily become commercial speech just because it’s done for commercial purposes. Typically, commercial speech is defined as “speech that proposes a commercial transaction.”22Andrew J. Wolf, Detailing Commercial Speech: What Pharmaceutical Marketing Reveals About Bans on Commercial Speech, 21 Wm. & Mary Bll Rights J. 1291, 1294 (2013); Cent. Hudson Gas & Elec. Corp. v. Pub. Serv. Comm’n of N.Y., 447 U.S. 557, 562 (1980) (“[O]ur decisions have recognized ‘the commonsense distinction between speech proposing a commercial transaction , which occurs in an area traditionally subject to government regulation, and other varieties of speech.’”) (citations omitted). Advertisements, sales promotions, and the like. But distinguishing commercial from noncommercial speech can be tricky—most commercial speech “contains forms of both commercial and noncommercial speech.”23Wolf, supra note 22, at 1294. See generally Nat Stern, In Defense of the Imprecise Definition of Commercial Speech, 58 Md. L. Rev. 55 (1999).
Conversely, speech we would almost certainly call “noncommercial”—political, even—may have commercial interests behind it. As the Supreme Court has noted: “The interests of the contestants in a labor dispute are primarily economic, but it has long been settled that both the employee and the employer are protected by the First Amendment when they express themselves on the merits of the dispute in order to influence its outcome.”24Virginia State Bd. of Pharm. v. Virginia Citizens Consumer Council, Inc., 425 U.S. 748, 762 (1976). No one would characterize stories in The New York Times “commercial speech,” though each article is written at least in part for a commercial purpose—the authors write to get paid, the newspaper publishes to sell subscriptions. So my proposed regulation, directed at scraping personal data in bulk for commercial purposes, would not necessarily count as a commercial speech regulation.25Further, because considering the commercial purpose is only one factor (but not the exclusive factor) in my proposed regulation, its analysis as commercial speech likely wouldn’t apply. See City of Austin v. Reagan Nat. Advert. of Austin, LLC, 596 U.S. __, __ (2022) (slip op., at 5 n.3) (explaining that a regulation that applies to both commercial and noncommercial messages alike does not qualify as a commercial speech regulation subject to intermediate scrutiny).
No, if a law that regulates scraping personal data counts as a speech restriction (an important if, which I’ll address in a bit), then it may very well be content-based, triggering strict scrutiny. This means that, to survive a challenge to its constitutionality, the regulation must be narrowly tailored to serve a compelling state interest.26Reed v. Town of Gilbert, 576 U.S. 155, 163 (2015); R.A.V. v. St. Paul, 505 U.S. 377, 395 (1992). My proposed law is a content-based restriction for two reasons: First, whether my proposal regulates a scraper’s activity depends on what kind of information the scraper collects—if it’s personal information, it’s regulated; if it’s statistical information about a pandemic, it’s unregulated.27See Reed, 576 U.S. at 163 (explaining that a regulation is content-based if it applies to particular speech because of its particular subject matter or cannot be justified without reference to the content of the regulated speech). Second, it depends also on whether the activity is done for commercial purposes, another content-based distinction.28See City of Austin, 596 U.S. __ (slip op., at 11) (explaining that “overt subject-matter discrimination is facially content based” and using the example of a regulation on “‘Ideological Sign[s],’ defined as those communicating a message for ideas for noncommercial purposes”). Because surviving strict scrutiny is a Herculean feat, my proposal would probably fail. That is, of course, if regulating scraping is regulating speech.
What is the government’s interest?
A better justification for my proposal, potentially, is this: when the government regulates scraping, it’s not interested in the suppression of speech. It’s interested only in regulating a particular activity for reasons having nothing to do with speech. As the Supreme Court has explained, it does not abridge freedom of speech to “make a course of conduct illegal merely because the conduct was in part initiated, evidenced, or carried out by means of language, either spoken, written, or printed.”29Ohralik v. Ohio State Bar Ass’n, 436 U.S. 447, 456 (1978) (quoting Giboney v. Empire Storage & Ice Co., 336 U.S. 490, 502 (1949)). In my Note, I implicitly alluded to the idea that scraping might not count as speech,30Parks, supra note 1 at 942 (including the limiting language, “If scraping qualifies as speech, . . . .”). but I didn’t elaborate further on the idea. A Note by Geoffrey Xiao addresses that precise question—is scraping speech?—but doesn’t come to a definitive answer.31See Geoffrey Xiao, Note, Bad Bots: Regulating the Scraping of Public Personal Information, Harv. J. L. & Tech. 701, 728–29 (2021).
As an aside, I’d like to dedicate some space to addressing a fantastic controversy: My Note is quite similar to Geoffrey Xiao’s Note, published in the Spring 2021 issue of the Harvard Journal of Law & Technology. See id. This exemplifies the case of two people developing similar ideas independently and unbeknownst to each other. I suppose, as the saying goes, great minds think alike.
Obviously, I was unaware of Xiao’s piece while writing mine. I began writing my piece in the fall of 2020. Email to Barbara McQuade (Sept. 16, 2020, 11:25 AM) (on file with author) (sending Professor McQuade a draft idea of my Note topic). And I submitted the final draft of my piece to the Michigan Law Review Notes office on March 23, 2021. Email from Andrew Parks to Michigan Law Review Notes Office (March 23, 2021, 5:04 PM) (email on file with author). I uploaded a revised draft to SSRN on June 15, 2021. Andrew Parks, Unfair Collection: Reclaiming Control of Publicly Available Personal Information from Internet Data Scrapers (June 15, 2021) (manuscript), https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3866297.
Xiao’s piece was published in the Spring 2021 issue of the Harvard Journal of Law & Technology. I’m uncertain of the exact date of its publication. However, a search on the Internet Archive’s Wayback Machine (which, as if only to tease us both, is a website that functions by using scraping technology to archive web pages over time), reveals that the Spring 2021 issue was not yet published online as of August 6, 2021. Compare Volume 34, Harv. J. L. & Tech. (Aug. 6, 2021), https://web.archive.org/web/20210806070135/https://jolt.law.harvard.edu/volumes/volume-34 (showing that only the Fall 2020 issue—and not the Spring 2021 issue—of Volume 34 was published online as of August 6, 2021) with Volume 34, Harv. J. L. & Tech. (Oct. 21, 2021), https://web.archive.org/web/20211021113528/https://jolt.law.harvard.edu/volumes/volume-34 (showing that the Spring 2021 issue had been published by this time in October 2021).
Presumably, this means the Spring 2021 issue was published sometime between August 6th and October 21st, 2021. So even estimating conservatively, my March 23 final draft and submission to the Michigan Law Review Notes office predated the publication of Xiao’s piece by at least four months. Either way, I’d like to congratulate Geoffrey Xaio on his wonderfully written and researched piece and for beating me to publication. Geoffrey, I now must (obviously) consider you my rival and legal arch-nemesis! Kidding—great work! But asking whether scraping is speech is likely the wrong question. Here’s a better one: what is the government’s interest in regulating scraping?
Two tests for incidental burdens on speech
When a regulation incidentally burdens speech, courts will often apply the Supreme Court’s four-part test in United States v. O’Brien to assess its constitutionality.32Michael C. Dorf, Incidental Burdens on Fundamental Rights, 109 Harv. L. Rev. 1175, 1201-02 (1996). In O’Brien, the government prosecuted the defendant for burning his draft card—an act of political protest—in violation of a law prohibiting the knowing destruction or mutilation of draft cards.33United States v. O’Brien, 391 U.S. 367, 369-70 (1968). Although O’Brien’s conduct was a communicative act to protest the Vietnam War, that fact was not dispositive because otherwise “an apparently limitless variety of conduct can be labeled ‘speech’ whenever the person engaging in the conduct intends thereby to express an idea.”34Id. at 376. Instead, the Supreme Court held that a regulation that incidentally burdens speech does not violate the First Amendment if it (1) is within the constitutional power of the government; (2) furthers an important or substantial government interest; which (3) is unrelated to the suppression of free expression; and (4) is no greater than essential to the furtherance of that interest.35Id. at 377. The Supreme Court has held that O’Brien’s “least restrictive means” analysis is essentially the same as its standard under content-neutral time, place, and manner restrictions. Ward v. Rock Against Racism, 491 U.S. 781, 797-98 (1989). Although resembling heightened scrutiny, this test is apparently toothless in practice and is applied “in a manner that is extremely deferential to the government.”36See Dorf, supra note 32, at 1204–1208. Scholars have criticized O’Brien for producing an “incidental burden doctrine with a too-broad domain and a too-weak practical effect.”37Id. at 1208. But if you’re a litigant, at least you get a test. (Spoiler alert: O’Brien lost.)
In some cases, however, incidental burdens on speech receive no First Amendment scrutiny at all.38Id. at 1204-05; Ashutosh Bhagwat, Purpose Scrutiny in Constitutional Analysis, 85 Calif. L. Rev. 297, 305 (1997). In Arcara v. Cloud Books, New York shut down an adult bookstore because its premises were being used for prostitution and illicit sexual activity.39Arcara v. Cloud Books, Inc., 478 U.S. 697, 698-99 (1986). Even though the statute authorizing the building’s closure obviously burdened speech—it was used to shut down a bookstore, after all—the Supreme Court didn’t even apply the O’Brien test. The Court explained that, unlike the expressive draft card burning in O’Brien, the sexual activity in Arcara manifested “absolutely no element of protected expression.”40Id. at 705. The burden on speech “could have been entirely avoided had the bookstore owner refrained from engaging in the unprotected, noncommunicative activity.”41Dorf, supra note 32, at 1206-07. In Justice O’Connor’s concurrence, she explained it would be absurd to supply a First Amendment analysis to every government action having “some conceivable speech-inhibiting consequences, such as the arrest of a newscaster for a traffic violation.”42Arcara, 478 U.S. at 708 (O’Connor, J., concurring). In other words, states and the federal government may burden (and completely shut down) speech when their reasons for doing so have nothing to do with suppressing speech or expressive activity.
Incidental burdens on speech facilitated by scraping
Two recent cases brought by the American Civil Liberties Union (ACLU) shed light on how the reasoning in O’Brien and Arcara might apply to scraping regulations. In a multidistrict litigation against Clearview AI, the ACLU alleged Clearview scraped over three billion photographs of facial images from the internet “to harvest the individuals’ unique biometric identifiers and corresponding biometric information,” in violation of the Illinois Biometric Information Privacy Act (BIPA).43In re Clearview AI, Inc., No. 21-cv-0135, 2022 WL 444135, at *1 (N.D. Ill. Feb. 14, 2022). BIPA prohibits an entity from obtaining individuals’ biometric data without notice and consent. Clearview argued that interpreting BIPA to prohibit its scraping activity would violate the First Amendment because the images Clearview scrapes are public, and BIPA inhibits its ability to collect and analyze public information.44Id. at *3.
In response, the ACLU argued in its initial state court briefs that its lawsuit “challenges Clearview’s conduct, not its speech.”45Plaintiff’s Response to Defendant’s Motion to Dismiss at 14, ACLU v. Clearview AI, No. 2020-CH-04353 (Ill. Cir. Ct. Nov. 2, 2020), available at https://www.aclu.org/plaintiffs-response-defendants-motion-dismiss. It explained that “Clearview can gather information from the public internet and it can run a search engine without violating BIPA. What it cannot do is capture the faceprints, or ‘scan[s] of . . . face geometry.”46Id. To ignore the difference between capturing a public photograph and capturing a faceprint “would be to hold that publishing a photograph of people’s hands should be treated no differently than collecting their fingerprints.”47Id. at 16. The ACLU asserted that Clearview’s business model “is not based on the collection of public photographs from the internet,” but the “additional conduct of harvesting nonpublic, personal biometric data.”48In re Clearview AI, 2022 WL 444135, at *3.
The ACLU argued that, even if BIPA has an incidental effect on Clearview’s speech, the court should analyze its constitutionality under the highly deferential O’Brien test.49Plaintiff’s Response to Defendant’s Motion to Dismiss at 15-16, ACLU v. Clearview AI, No. 2020-CH-04353 (Ill. Cir. Ct. Nov. 2, 2020), available at https://www.aclu.org/plaintiffs-response-defendants-motion-dismiss; United States v. O’Brien, 391 U.S. 367, 376 (1968). The court handling the multidistrict litigation did just that. After applying O’Brien, the court denied Clearview’s motion to dismiss, holding that BIPA is narrowly tailored to protect Illinois residents’ highly sensitive biometric information from unauthorized collection and disclosure.50In re Clearview AI, 2022 WL 444135, at *3.
Now compare that reasoning to a recent case the ACLU filed on behalf of the South Carolina NAACP.51ACLU of South Carolina, ACLU, and NAACP Represent SC NAACP in Lawsuit Challenging South Carolina Court System Ban on Automated Collection of Public Court Records, ACLU (March 30, 2022), https://www.aclu.org/press-releases/aclu-south-carolina-aclu-and-naacp-represent-sc-naacp-lawsuit-challenging-south. In this case, the ACLU argues that the South Carolina Court Administration’s categorial ban on scraping court records from its public index violates the First Amendment. In a press release, the ACLU states that “[s]craping is a legitimate method of collecting information online that is often necessary to efficiently and systematically gather records that might not otherwise be possible to record.”52Id. Specifically, the lawsuit argues that the prohibition on scraping the public index violates free speech by restricting the right to record public information and the right of access to judicial records.53Id. If you think this sounds a lot like what Clearview was arguing in the previous case, you’re not alone. How can the ACLU adopt two positions that plainly conflict with one another?
Perhaps the distinction between the reasoning in these cases lies in how we view the government’s interest. In the Clearview case, the government is purportedly trying to prevent harvesting biometric data: Clearview is free to speak about individuals in ways that identify them, but it cannot maintain a database of their biometric information to do so. This sounds like the government’s interest isn’t to suppress speech, but merely to regulate conduct, and even an Arcara analysis would suffice. Conversely, in the NAACP case, the conduct at issue appears to be accessing public court records. It’s unclear what the government’s interest is, but in my view, it feels closer to suppressing speech-related activity than it does to regulating conduct. But then again, maybe the South Carolina court system blocks automated bots because they can bog down its servers, crash its pages, or increase the time it takes to load documents. If that were true, it’s the conduct of scraping—because the method affects its servers—that South Carolina cares about, not suppressing speech by limiting access to its records. The NAACP can still access the public files, it just can’t use bots to do so.
If there’s a line separating these cases, I’m not sure which side my proposal would fall on. Whether a court were to analyze my proposal under Arcara or O’Brien, I would argue the government doesn’t have interests related to the suppression of speech. The government is interested in preventing the conduct of harvesting individuals’ data in bulk, for privacy reasons, because bulk collection introduces risks that aren’t present when companies collect personal information in small bits. And the government also cares about the risks stemming from commercializing this data, which also presents unique risks not present when personal data is collected for newsreporting or academic purposes. These risks are examined in depth on pages 925–929 of my Note.54Parks, supra note 1, at 925-29. The harms to privacy, liberty, security, and democracy are what the government cares about preventing, not companies’ ability to speak in ways that identify individuals.
Conclusion
In the remaining space, I’d like express my sincerest gratitude to all of the people who helped me with my Note. First, I am grateful to have joined the Michigan Journal of Law Reform at the start of my 2L year. The journal pushed me to dive deep into researching areas of the law that I was interested in and, of course, to write a Note proposing a legal reform. I am immensely grateful for Emily Grau—my Notes Editor at JLR—who helped me fine-tune my research and writing along the way.
Next, I’d like to thank two professors whose invaluable feedback dramatically improved my thinking about this topic. Thank you to Professor Barb McQuade, who supervised me in writing my Note and who graciously allowed me to bug her and send her drafts throughout my writing process. And thank you to Professor Don Herzog, who helped me think deeper about the First Amendment issues involving scraping. My conversations with Don inspired me to write this follow-up.
Finally, I’d like to thank everyone at Michigan Law Review who worked on my Note. Thank you to all of the members of the Notes Office for seeing the potential in my piece, for selecting it for publication, and for helping shape and improve its structure before it entered the publication process. And thank you to Ahmed Rizk (Editor-in-Chief, Vol. 120) and Dashaya Foreman (Editor-in-Chief, Vol. 121), who invested many hours during the final reads of the piece before it went to print. Thank you, greatly.
This post is for informational, educational, and entertainment purposes only. It is not intended and should not be construed as legal advice.
Footnotes
- 1Andrew M. Parks, Note, Unfair Collection: Reclaiming Control of Publicly Available Personal Information from Data Scrapers, 120 Mich. L. Rev. 913 (2022). If you’d like to read the piece online, it is available at: https://michiganlawreview.org/journal/unfair-collection-reclaiming-control-of-publicly-available-personal-information-from-data-scrapers/. And the PDF can be downloaded at: https://repository.law.umich.edu/mlr/vol120/iss5/5/.
- 2Id.
- 3Id.
- 4Id.
- 5Cal. Civ. Code § 1798.140(v)(2) (effective Jan. 1, 2023) (“ ‘Personal information’ does not include publicly available information or lawfully obtained, truthful information that is a matter of public concern. For purposes of this paragraph, ‘publicly available’ means: information that is lawfully made available from federal, state, or local government records, or information that a business has a reasonable basis to believe is lawfully made available to the general public by the consumer or from widely distributed media, or by the consumer; or information made available by a person to whom the consumer has disclosed the information if the consumer has not restricted the information to a specific audience. ‘Publicly available’ does not mean biometric information collected by a business about a consumer without the consumer’s knowledge.”).
- 6Cal. Code Regs. tit. 11, § 999.305 (West 2021) (“A business that does not collect personal information directly from the consumer does not need to provide a notice to the consumer if it does not sell the consumer’s personal information.”).
- 7Colorado Privacy Act, Colo. S.B. 21-190 § 6-1-1303(17)(b) (2021), https://leg.colorado.gov/sites/default/files/2021a_190_signed.pdf (excluding publicly available information from the scope of its regulations).
- 8Va. Code Ann. § 59.1-575 (excluding publicly available information from its definition of “personal data”).
- 9Regulation 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC, 2016 O.J. (L 119) 1 [hereinafter General Data Protection Regulation].
- 10General Data Protection Regulation, supra note 9, art. 4(1).
- 11Id. art. 14(1) (“Where personal data have not been obtained from the data subject, the controller shall provide the data subject with the following information: (a) the identity and the contact details of the controller and, where applicable, of the controller’s representative; (b) the contact details of the data protection officer, where applicable; (c) the purposes of the processing for which the personal data are intended as well as the legal basis for the processing; (d) the categories of personal data concerned; (e) the recipients or categories of recipients of the personal data, if any; (f) where applicable, that the controller intends to transfer personal data to a recipient in a third country or international organisation . . . .”).
- 12Catherina Stupp, Data Scraping in EU Regulators’ Sights as Spain Orders Equifax to Delete Information, Wall St. J. (May 6, 2021 5:30 AM), https://www.wsj.com/articles/data-scraping-in-eu-regulators-sights-as-spain-orders-equifax-to-delete-information-11620293400.
- 13Id.
- 14Id.
- 15Id.
- 16For more on how Europe’s GDPR functions to deter scraping publicly available personal information, I recommend reading this piece: Fiona Campbell, Data Scraping – Considering the Privacy Issues, Fieldfisher (Aug. 27, 2019), https://www.fieldfisher.com/en/services/privacy-security-and-information/privacy-security-and-information-law-blog/data-scraping-considering-the-privacy-issues.
- 17Parks, supra note 1, at 917–19, 925–29 (providing examples to explain how data scraping can be used for malicious purposes and can lead to dangerous consequences).
- 18Id. at 937–41.
- 19Id. at 939–41.
- 20Id. at 942.
- 21See id. at 943.
- 22Andrew J. Wolf, Detailing Commercial Speech: What Pharmaceutical Marketing Reveals About Bans on Commercial Speech, 21 Wm. & Mary Bll Rights J. 1291, 1294 (2013); Cent. Hudson Gas & Elec. Corp. v. Pub. Serv. Comm’n of N.Y., 447 U.S. 557, 562 (1980) (“[O]ur decisions have recognized ‘the commonsense distinction between speech proposing a commercial transaction , which occurs in an area traditionally subject to government regulation, and other varieties of speech.’”) (citations omitted).
- 23Wolf, supra note 22, at 1294. See generally Nat Stern, In Defense of the Imprecise Definition of Commercial Speech, 58 Md. L. Rev. 55 (1999).
- 24Virginia State Bd. of Pharm. v. Virginia Citizens Consumer Council, Inc., 425 U.S. 748, 762 (1976).
- 25Further, because considering the commercial purpose is only one factor (but not the exclusive factor) in my proposed regulation, its analysis as commercial speech likely wouldn’t apply. See City of Austin v. Reagan Nat. Advert. of Austin, LLC, 596 U.S. __, __ (2022) (slip op., at 5 n.3) (explaining that a regulation that applies to both commercial and noncommercial messages alike does not qualify as a commercial speech regulation subject to intermediate scrutiny).
- 26Reed v. Town of Gilbert, 576 U.S. 155, 163 (2015); R.A.V. v. St. Paul, 505 U.S. 377, 395 (1992).
- 27See Reed, 576 U.S. at 163 (explaining that a regulation is content-based if it applies to particular speech because of its particular subject matter or cannot be justified without reference to the content of the regulated speech).
- 28See City of Austin, 596 U.S. __ (slip op., at 11) (explaining that “overt subject-matter discrimination is facially content based” and using the example of a regulation on “‘Ideological Sign[s],’ defined as those communicating a message for ideas for noncommercial purposes”).
- 29Ohralik v. Ohio State Bar Ass’n, 436 U.S. 447, 456 (1978) (quoting Giboney v. Empire Storage & Ice Co., 336 U.S. 490, 502 (1949)).
- 30Parks, supra note 1 at 942 (including the limiting language, “If scraping qualifies as speech, . . . .”).
- 31See Geoffrey Xiao, Note, Bad Bots: Regulating the Scraping of Public Personal Information, Harv. J. L. & Tech. 701, 728–29 (2021).
As an aside, I’d like to dedicate some space to addressing a fantastic controversy: My Note is quite similar to Geoffrey Xiao’s Note, published in the Spring 2021 issue of the Harvard Journal of Law & Technology. See id. This exemplifies the case of two people developing similar ideas independently and unbeknownst to each other. I suppose, as the saying goes, great minds think alike.
Obviously, I was unaware of Xiao’s piece while writing mine. I began writing my piece in the fall of 2020. Email to Barbara McQuade (Sept. 16, 2020, 11:25 AM) (on file with author) (sending Professor McQuade a draft idea of my Note topic). And I submitted the final draft of my piece to the Michigan Law Review Notes office on March 23, 2021. Email from Andrew Parks to Michigan Law Review Notes Office (March 23, 2021, 5:04 PM) (email on file with author). I uploaded a revised draft to SSRN on June 15, 2021. Andrew Parks, Unfair Collection: Reclaiming Control of Publicly Available Personal Information from Internet Data Scrapers (June 15, 2021) (manuscript), https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3866297.
Xiao’s piece was published in the Spring 2021 issue of the Harvard Journal of Law & Technology. I’m uncertain of the exact date of its publication. However, a search on the Internet Archive’s Wayback Machine (which, as if only to tease us both, is a website that functions by using scraping technology to archive web pages over time), reveals that the Spring 2021 issue was not yet published online as of August 6, 2021. Compare Volume 34, Harv. J. L. & Tech. (Aug. 6, 2021), https://web.archive.org/web/20210806070135/https://jolt.law.harvard.edu/volumes/volume-34 (showing that only the Fall 2020 issue—and not the Spring 2021 issue—of Volume 34 was published online as of August 6, 2021) with Volume 34, Harv. J. L. & Tech. (Oct. 21, 2021), https://web.archive.org/web/20211021113528/https://jolt.law.harvard.edu/volumes/volume-34 (showing that the Spring 2021 issue had been published by this time in October 2021).
Presumably, this means the Spring 2021 issue was published sometime between August 6th and October 21st, 2021. So even estimating conservatively, my March 23 final draft and submission to the Michigan Law Review Notes office predated the publication of Xiao’s piece by at least four months. Either way, I’d like to congratulate Geoffrey Xaio on his wonderfully written and researched piece and for beating me to publication. Geoffrey, I now must (obviously) consider you my rival and legal arch-nemesis! Kidding—great work! - 32Michael C. Dorf, Incidental Burdens on Fundamental Rights, 109 Harv. L. Rev. 1175, 1201-02 (1996).
- 33United States v. O’Brien, 391 U.S. 367, 369-70 (1968).
- 34Id. at 376.
- 35Id. at 377. The Supreme Court has held that O’Brien’s “least restrictive means” analysis is essentially the same as its standard under content-neutral time, place, and manner restrictions. Ward v. Rock Against Racism, 491 U.S. 781, 797-98 (1989).
- 36See Dorf, supra note 32, at 1204–1208.
- 37Id. at 1208.
- 38Id. at 1204-05; Ashutosh Bhagwat, Purpose Scrutiny in Constitutional Analysis, 85 Calif. L. Rev. 297, 305 (1997).
- 39Arcara v. Cloud Books, Inc., 478 U.S. 697, 698-99 (1986).
- 40Id. at 705.
- 41Dorf, supra note 32, at 1206-07.
- 42Arcara, 478 U.S. at 708 (O’Connor, J., concurring).
- 43In re Clearview AI, Inc., No. 21-cv-0135, 2022 WL 444135, at *1 (N.D. Ill. Feb. 14, 2022).
- 44Id. at *3.
- 45Plaintiff’s Response to Defendant’s Motion to Dismiss at 14, ACLU v. Clearview AI, No. 2020-CH-04353 (Ill. Cir. Ct. Nov. 2, 2020), available at https://www.aclu.org/plaintiffs-response-defendants-motion-dismiss.
- 46Id.
- 47Id. at 16.
- 48In re Clearview AI, 2022 WL 444135, at *3.
- 49Plaintiff’s Response to Defendant’s Motion to Dismiss at 15-16, ACLU v. Clearview AI, No. 2020-CH-04353 (Ill. Cir. Ct. Nov. 2, 2020), available at https://www.aclu.org/plaintiffs-response-defendants-motion-dismiss; United States v. O’Brien, 391 U.S. 367, 376 (1968).
- 50In re Clearview AI, 2022 WL 444135, at *3.
- 51ACLU of South Carolina, ACLU, and NAACP Represent SC NAACP in Lawsuit Challenging South Carolina Court System Ban on Automated Collection of Public Court Records, ACLU (March 30, 2022), https://www.aclu.org/press-releases/aclu-south-carolina-aclu-and-naacp-represent-sc-naacp-lawsuit-challenging-south.
- 52Id.
- 53Id.
- 54Parks, supra note 1, at 925-29.

You must be logged in to post a comment.