Article

Inaccurate age and sex data in the Census PUMS files: Evidence and Implications

Public Opinion Quarterly (Impact Factor: 2.25). 01/2010; DOI: 10.2307/40927730
Source: RePEc

ABSTRACT We examine the physical and mental health effects of providing care to an elderly mother on the adult child caregiver. We address the endogeneity of the selection in and out of caregiving using an instrumental variable approach, and carefully control for baseline health and work status of the adult child using fixed effects and Arellano-Bond estimation techniques. Continued caregiving over time increases depressive symptoms for married women and married men. In addition, the increase in depressive symptoms is persistent for married men. Depressive symptoms for single men and women are not affected by continued caregiving. There is a small protective effect on the likelihood (10%) of having any heart conditions among married women who continue caregiving. Robustness checks confirm that the increase in depressive symptoms and decrease in likelihood of heart conditions can be directly attributable to caregiving behavior, and not due to a direct effect of the death of the mother. The initial onset of caregiving, by contrast, has no immediate effects on physical or mental health for any subgroup of caregivers.

0 Followers
 · 
125 Views
  • Source
    • "In 2009, researchers associated with the MPC identified significant errors in age and sex distributions in many public use data sets released by the Census Bureau (Alexander, Davern, and Stevenson 2010). The ACS data for 2003–8, comprising 14.4 million person records, were recently corrected to amend these and other errors. 1 Recent substantive additions to the integrated ACS data in IPUMS-USA include health insurance variables beginning in 2008 (with logical edits in the IPUMS version to improve accuracy and comparability across time); historically compatible variables on race designed to address multiple-race responses in the ACS and the 2000 census; a set of harmonized subfamily indicators that stretch back to 1880 and are more consistent than the Census Bureau's; new variables on disability and grandparents' responsibility for grandchildren added by the Census Bureau for 2000 forward; and additional variables on place of work, migration, and geography. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The Minnesota Population Center (MPC) provides aggregate data and microdata that have been integrated and harmonized to maximize crosstemporal and cross-spatial comparability. All MPC data products are distributed free of charge through an interactive Web interface that enables users to limit the data and metadata being analyzed to samples and variables of interest to their research. In this article, the authors describe the integrated databases available from the MPC, report on recent additions and enhancements to these data sets, and summarize new online tools and resources that help users to analyze the data over time. They conclude with a description of the MPC's newest and largest infrastructure project to date: a global population and environment data network.
    Historical Methods A Journal of Quantitative and Interdisciplinary History 04/2011; 44(2):61-68. DOI:10.1080/01615440.2011.564572
  • Source
    • "Adding random noise to protect salary information will most likely result in computing incorrect values of average salaries, thereby destroying utility of this computation. Recent cases with U.S. Census show that applying data privacy leads to incorrect results [5] [8]. Multiple studies demonstrate that even modest privacy gains require almost complete destruction of the data-mining utility [1] [9] [18] [19] [28]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Database-centric applications (DCAs) are common in enterprise computing, and they use nontrivial databases. Testing of DCAs is increasingly outsourced to test centers in order to achieve lower cost and higher quality. When proprietary DCAs are released, their databases should also be made available to test engineers. However, different data privacy laws prevent organizations from sharing this data with test centers because databases contain sensitive information. Currently, testing is performed with anonymized data, which often leads to worse test coverage (such as code coverage) and fewer uncovered faults, thereby reducing the quality of DCAs and obliterating benefits of test outsourcing. To address this issue, we offer a novel approach that combines program analysis with a new data privacy framework that we design to address constraints of software testing. With our approach, organizations can balance the level of privacy with needs of testing. We have built a tool for our approach and applied it to nontrivial Java DCAs. Our results show that test coverage can be preserved at a higher level by anonymizing data based on their effect on corresponding DCAs.
    SIGSOFT/FSE'11 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-19) and ESEC'11: 13rd European Software Engineering Conference (ESEC-13), Szeged, Hungary, September 5-9, 2011; 01/2011
  • Source
    • "From this example we can see that applying data privacy is not generally good for software testing. Recent cases with U.S. Census show that applying data privacy leads to incorrect results [8], [9]. Therefore preserving test coverage while achieving desired data anonymity is not an easy task. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Database-centric applications (DCAs) are common in enterprise computing, and they use nontrivial databases. Testing of DCAs is increasingly outsourced to test centers in order to achieve lower cost and higher quality. When releasing proprietary DCAs, its databases should also be made available to test engineers, so that they can test using real data. Testing with real data is important, since fake data lacks many of the intricate semantic connections among the original data elements. However, different data privacy laws prevent organizations from sharing these data with test centers because databases contain sensitive information. Currently, testing is performed with fake data that often leads to worse code coverage and fewer uncovered bugs, thereby reducing the quality of DCAs and obliterating benefits of test outsourcing. We show that a popular data anonymization algorithm called k-anonymity seriously degrades test coverage of DCAs. We propose an approach that uses program analysis to guide selective application of k-anonymity. This approach helps protect sensitive data in databases while retaining testing efficacy. Our results show that for small values of k = 7, test coverage drops to less than 30% from the original coverage of more than 70%, thus making it difficult to achieve good quality when testing DCAs while applying data privacy.
    IEEE 21st International Symposium on Software Reliability Engineering, ISSRE 2010, San Jose, CA, USA, 1-4 November 2010; 01/2010
Show more

Preview

Download
0 Downloads
Available from