There’s at least one library product for sale that can merge your patron and circulation data with Big Data, the exponentially expanding trove of information about our individual daily lives, providing good insights into building neighborhood branch collections, developing relevant programs and services, or improving promotions. But increasingly powerful computers combined with widely available personal information means that no one’s individual privacy and anonymity can be assumed or taken for granted.
This is a problem for the US Census, which is required by law to ensure that the data it publishes can’t be used to identify individual respondents. But Census researchers, using only partial Census data and a few commercial datasets, were able to accurately match more than a third of the population to the confidential information they shared on Census surveys.
In 2017, the bureau decided to implement differential privacy to protect the anonymity of individual survey responses. Differential privacy introduces random noise to the datasets, and while the high-level counts (a state’s population, for example) will be accurate, as you try to zoom in on smaller groupings those counts deviate further and further from the truth.
- For The U.S. Census, Keeping Your Data Anonymous And Useful Is A Tricky Balance [NPR] “The Census Bureau has relied on the promise of confidentiality to get many of the country’s residents to volunteer their information once a decade, especially among people of color, immigrants and other historically undercounted groups who may be unsure about how their responses could be used against them. But it is becoming harder for the bureau to uphold that pledge and continue releasing statistics from the census. Advances in computing and access to voter registration lists and commercial data sets that can be cross-referenced have made it easier to trace purportedly anonymized information back to an individual person.”
- Will New Privacy Changes Protect Census Data or Make Things Worse? [The Markup] “When Washington State officials examined an early demonstration set… it found 401 Census blocks where the entire population was over 85 years old and 3,353 where the entire population was under 14. An Alabama analysis of the same dataset showed 13,000 blocks where there were children but no adults.”
- 16 states back Alabama’s challenge to Census privacy tool [AP] “The 16 states supporting Alabama said that differential privacy’s use in the redistricting numbers will make the figures inaccurate for all states, especially at small geographic levels, and the Census Bureau could use other methods to protect people’s privacy.”
- What Should Librarians Know About Differential Privacy and the 2020 Census? [Federal Depository Library Program] “Librarians tend to work with users; they work with the general public. People come to you and ask questions about what data are available, whether or not that data is comparable to data that we are collecting in 2000 or 2010 or 1990. It is critical for librarians to understand what types of changes are going on to the census data and what that means for data users that come to you for support.”
From the Ohio Web Library:
- Chen, Angela. “Differential Privacy.” MIT Technology Review, vol. 123, no. 2, Mar. 2020, p. 27.
- Weiss, Todd R. “Apple to Use Differential Privacy to Get User Insights Without IDs.” EWeek, June 2016, p. 1.
- Krieger, Nancy, et al. “Impact of Differential Privacy and Census Tract Data Source (Decennial Census Versus American Community Survey) for Monitoring Health Inequities.” American Journal of Public Health, vol. 111, no. 2, Feb. 2021, pp. 265–268.