Fadak.IR Фадак Решения
English Русский العربية فارسی
Статьи Управление Исследования Язык


/ О Веб-сайт

The Role of Data in Research and Policy


   Abstract
   INTRODUCTION
   FOUNDATIONAL RESEARCH
   CUTTING-EDGE RESEARCH
   Availability and accuracy of data for new research and re-analysis while protecting human subjects
   Problems with the estimation of indicators based on non-generalizable or flawed data
   Accuracy of past projections, use of data to develop models for projecting the future, and assumptions on which those models are based
   KEY ISSUES FOR FUTURE RESEARCH
   Availability and accuracy of data for new research and re-analysis while protecting human subjects
   Problems with the estimation of indicators based on non-generalizable or flawed data
   Accuracy of past projections, the use of data to develop models for projecting the future, and the assumptions on which those models are based
   References
   Further Readings

A version of this paper is scheduled to appear in Robert Scott and Nancy Pinkerton, Editors, Emerging Trends in the Social and Behavioral Sciences, an on-line resource.

 

Theresa Anderson, Heather King, Emily Marshall, Emily Merchant, John H. Romani, Michelle Steinmetz, and Victoria Velkoff have made helpful comments and suggestions.

 

This research was supported by NICHD center grant to the Population Studies Center at the University of Michigan (R24 HD041028).

 

Abstract

 

Data are essential for scientific research and policy planning.   However, there needs to be attention to data quality and the estimates and models based on those data.  In addition, data need to be freely available for researchers to test new ideas and validate the work of others through replication, while respondents who provide data need to be protected.  Three issues concerning data are addressed: 1) Availability and accuracy of data for new research and re-analysis while protecting human subjects, 2) Problems with the estimation of indicators based on flawed or nongeneralizable data, and 3) The use of data to develop models for projecting the future, the assumptions on which those models are based, and assessment of the accuracy of past projections.  In each of these areas, increased attention as to how data are used, interpreted, and made available to the scholarly and policy community is necessary.

 

INTRODUCTION

Data are essential for scientific research and policy planning.  In light of this, there has been an increasing call for evidence-based policy in numerous areas.  However, there needs to be attention to data quality and to the characteristics of estimates and models based on those data.   

Three issues of special concern are:

In each of these areas, increased attention to how data are used, interpreted and made available to the scholarly and policy community are necessary. 

 

FOUNDATIONAL RESEARCH

Among the main principles of data collection and analysis are:

journals require that data used in an article be available at an accessible data repository (Anderson et al., 2008).  Data that are collected using U. S. government funds are to be made available to other researchers in some form, usually within two years of the completion of data collection. 

An example of this is the Coale-Demeny regional model life tables (1966).   These mortality models were based on 326 life tables that were thought to have high quality data.  Of these, 324 were from Europe or North America.  The life tables with high mortality were almost all from historical Europe.  There was awareness that the life tables used might not represent all of world experience, but it was viewed as too risky to include life tables from low-income questions where the data quality might be poor.   

 

Population projections are important in and of themselves for planning the future but also are part of the input information for many other purposes, such as estimating the adequacy of future food supply in a region.  There have been noted instances in the past when projections were very far off, due to an unknown changeable future.

 

CUTTING-EDGE RESEARCH

Research has pointed out shortcomings and considerations about data, models and estimation.  Such studies highlight further work that needs to be done.

 

Availability and accuracy of data for new research and re-analysis while protecting human subjects

Data access for original research and for replicability of findings remains an issue in both low-income and high-income countries.  In both settings, issues of researcher or institutional hoarding and of protection of respondent confidentiality arise.

In many low income countries there remains a lack of high quality demographic data upon which to base population estimates and to look at interrelations between demographic and social variables.   To address this problem, 49 demographic surveillance sites have been established in 20 low income countries, especially in sub-Saharan Africa. This is called the INDEPTH Network (2014).  In these sites intensive efforts are made to collect information on demographic events at frequent intervals.  

There is a concern about access to data from demographic surveillance sites.  Data collection requires a huge effort.  Persons directing and conducting data collection are often reluctant to turn the data over to external researchers who have not devoted the same amount of energy to this effort.  On the other hand, scientific standards require that data be available for independent examination and validation.  If other researchers do not have access to the data, it is not possible for alternative explanations to be investigated, and it can lead to questioning of the value of research results based on the data.  In addition, since data collection is so time-intensive, and researchers often need to turn their attention to the next round of data collection as soon as one round is completed.  Thus, much data from such sites is only analyzed locally to a limited extent.  These issues have led to a lively debate about the conditions under which data from a demographic surveillance site should be available to the larger scholarly community (Baiden et al., 2006; Carrel & Rennie, 2008; Chandramohan et al., 2008).  

The balance between protecting respondents, giving those who collect data a fair chance to benefit from publication and allowing replicability remains a source of tension.  In the INDEPTH research sites in many low income countries, there has been a debate concerned with whether in principle, researchers who were not involved in data collection should have access and whether their research aims should be required to be in line with or whether they should be required to collaborate with project researchers.  These potential requirements conflict with principles of independent assessment and replicability in science.

There also remain concerns about sharing of data in the United States.  Even as more and more journals require deposit of data in an archive, this had not at least in 2009 affected many open access journals, which are advocates for total openness in research (McCullough, 2009).  Even when data have been deposited in an archive, McCullough, McGeary & Harrison (2008) found it was only possible to replicate the work in 14 out of 62 articles. 

As more and more analyses for policy purposes are based on surveys and microdata from censuses, there is a concern with protecting the identity of individuals.  This often leads to masking data in various ways., including perturbation n of data by introduction of a random factor, grouping dates of occurrence of events into five-year or ten-year periods, and masking aggregating geographic location to a fairly large area. These respondent protection measures can lead to erroneous conclusions, though, in analysis of publicly available data. 

A slight adjustment of ages through introduction of a random factor can sometimes lead to anomalous results, such as unreasonable sex ratios (U. S. Census Bureau, 2010; Alexander, Davern & Stevenson, 2010).  Adjustment of the reported data to mask the identity of those over age 65 can also lead to inaccurate estimates of characteristics of the elderly, such as their income (Fisher, 2010).  On event history analysis, the detailed dating of events and the sequencing of events is important, which is not possible with aggregated times of occurrence (Freedman, 1988), a common way of masking identity through grouping time into five or ten year periods and reporting data for fairly large geographic units calls for rethinking of how respondent identity should be masked.  

Masking geographic detail can help protect respondents (Sherman & Fetters, 2007). However, researchers have increasingly incorporated detailed information about the characteristics of small geographic areas in order to identify clusters of people with particular diseases or who are studying attitudes or behavior  need very detailed geographic information to do so (Berg, Stewart, Stewart & Simons, 2013; Cox, 1996; Armstrong, Rushton and

Zimmerman, 1999).  A researcher can apply to the body that controls the data and ask for more detailed information.  If the controlling body sees the proposed research as sufficiently valuable, the researcher could obtain the more detailed data, but the approval process can take a long time and is often not successful.

 

Problems with the estimation of indicators based on non-generalizable or flawed data

Estimates of the number of persons with a disease are sometimes based on results of a survey.  In order for the estimates to be accurate, the survey respondents must be representative of the population as a whole, or the way in which the respondents differ from the population as a whole must be well-understood so that estimates for the entire population can be made.  

UNAIDS revised downward its estimate of the number of HIV-positive people in India from 5.7 million for 2006 to 2.5 million for 2007.  This downward revision was not due to an actual enormous decline in HIV, but rather due to a change in the basis for the estimates.  UNAIDS changed from basing their estimates on clinic data for high risk groups (pregnant women, injection drug users, commercial sex workers) to basing their estimates on a more representative population-based survey.  It was clear that the earlier estimates had greatly overestimated the prevalence of HIV in the general Indian population (Steinbrook, 2008; UNAIDS, 2007, 2008).

The new estimates are clearly more accurate than the old estimates.  However, some are not happy about this change because they think it could lead to less attention and less money being allocated to fight HIV.  Also, some interpret this reported change as real and thus exaggerate the extent of real declines in HIV.  This example shows that how data are collected and how survey respondents are chosen for collection of the data can have a large impact on what conclusions are drawn.

The United Nations (1982) developed models of mortality patterns based on data from 22 less developed countries.  These new mortality models were intended to improve upon the earlier Coale-Demeny models that were mainly based on data from Europe or North America.  They were intended to provide models of mortality that were more relevant to the situation in low income countries.  The United Nations developed a General Model based on data from all 22 countries, as well as 4 additional models based on data from subsets of the countries.  Unfortunately, it was later concluded that the male model for the Latin American pattern was substantially a model of data error, due to problems with the data for males at the older ages in the contributing life tables (Dechter & Preston, 1991).  

 

Accuracy of past projections, use of data to develop models for projecting the future, and assumptions on which those models are based

Despite the importance of population projections, there has been fairly little work assessing their accuracy.  Keilman’s (1998) research is especially interesting.  He assessed the accuracy of United Nations population projections 1951-1988.  Sometimes inaccuracy was been because of error in the estimation of the population at the first time.  After the results of the 1953 Chinese Census were released in 1954, the estimate of the population of the world for 1950 was increased, because it was seen that the population of China was more than 100 million persons larger than had earlier been thought.  At other times, assumptions about the future were inaccurate.  Throughout the world, mortality declined more rapidly than had been projected.  Also, fertility declined more quickly after the 19670s than had been expected, partially due to policies implemented because of alarm about high rates of population growth in many low income countries in the 1960s and 1970s (Keilman, 1998).

Keilman (2008) also showed that in the period 1950-2001, population projections of the expected future demographic situation (total population, mortality, fertility, international migration) by European national statistical offices did not become more accurate.  This was true even as the amount and the quality of the data on which these forecasts were based improved.

The United Nations Population Division is the main producer of authoritative estimates and projections of the total population and of demographic processes, such as mortality and fertility.  The most important part of a population projection is the future fertility assumption.  Between 2004 and 2012, the UN Population Division changed its basic assumption about the course of fertility decline to a level that would result in zero population growth and about the fertility trajectory after that four times (Anderson, 2014; Basten, 2013b).  With low mortality, zero population growth would require a total fertility rate (abbreviated TFR) of 2.07.  TFR resulting in eventual zero population growth is also called replacement fertility.

These changes were based on observation of the history of some high income, low fertility countries.  Before 2004, the UN Population Division had long projected TFR to asymptotically reach replacement level, TFR=2.07.  This assumed that all countries would eventually have low mortality, low fertility stationary populations.  

In the 1990s, many countries had sustained below replacement fertility (TFR<2.07), sometimes falling to lowest-low fertility (TFR<=1.3).  After extensive consultation, fertility projection assumptions were changed in 2004, and all countries were then projected to asymptotically approach TFR=1.86, which implies long term population decline.  This was a major departure from the earlier eventual zero population growth assumption.

In the 2000s, TFR increased across at least three five-year in twenty-one below replacement fertility countries.  Based on fertility increases in those countries, in 2010 assumptions were again changed, so that in the new model TFR in below replacement fertility countries increased toward replacement, with the pace of increase more rapidly the farther TFR was below replacement.  For countries with above replacement fertility in 2005-2010 such as Algeria, TFR was projected to fall below replacement and then increase toward replacement. 

This marked a return to the eventual stationary population assumption.

There was a loud outcry about the unreasonableness of the fertility projections for some Asian countries where fertility was very low and where there had been no indication of any increase (Basten, 2013a, 2013b; Basten, Coleman & Gu, 2012).

Partly in response to complaints about the 2010 estimates, in 2012, TFR projection assumptions were again changed.  By 2012, TFR had increased across at least three five-year periods in 25 low fertility countries.  The new low fertility projection model for many individual countries was based both on the experience of these 25 countries and on the TFR record of the individual country.  The 2012 projections resulted in less extreme departures from earlier projected TFRs than occurred between the 2004 and 2010 projections.   

 

KEY ISSUES FOR FUTURE RESEARCH

Progress of the issues highlighted here requires scientific research but also discourse and discussion in philosophy and ethics.  Just as in a trial there is usual some merit on each side of an argument, there are conflicting considerations and values so that a perfect resolution  in many areas is probably not possible.

 

Availability and accuracy of data for new research and re-analysis while protecting human subjects

Despite rules by journals about access to data, Tenopir et al. (2011) document that data hoarding is still common.  There needs to be consideration about what further steps could be taken while avoiding negative unintended consequences, such as discouraging data collection by researchers.  Similarly, although there exists a procedure for external researchers to apply for use of INDEPTH data, the process is complicated and it is yet to be seen how open access will be.

The U. S. Census Bureau has established Research Data Centers (RDCs) at more than 15 universities and research centers.  At these centers researchers with approved projects can research results of computer runs based on analysis of detailed individual data that are not available in as detailed a form in public use data sets.  The RDCs help resolve the issue of data access and respondent confidentiality, but they are flawed by the requirement that projects using an RDC must “provide benefit to Census Bureau programs (U. S. Census Bureau, 2012).”  This is an impediment to free scientific work and to the range of studies that can be pursued.  More thought needs to go into this program, which seems to be affected by some of the same inclinations that have impeded data sharing among researchers. 

The balance between data access and protection of respondents is a value-laden issue of public policy.  More discussion between those concerned with research and those concerned with ethics could be fruitful to clarify what the guiding principles should be.  These discussions seem evert more necessary with increasing emphasis on “big data” to address many scientific and policy questions (Schuurman, 2000; United States. White House. Office of Science and Technology Policy, 2012).

Problems with the estimation of indicators based on non-generalizable or flawed data

The development of indicators and models, such as for mortality that are relevant to the situation in parts of the world where data are lacking or poor quality remains a challenge.  There is an understandable urge to include all appropriate data in developing ways to make estimates, but there remains the danger of including data that include serious error.

The need for some basis for estimates, such as of mortality, is clear.  A country wants to know things such as the average length of life (also called expectation of life at birth) for many purposes.  However, in 2012, The United Nations Population Division reported that 26% of the countries in the world and 60% of the countries in Africa had no reliable data on adult mortality, and two countries had no reliable data on mortality at any age.  In these situations, estimates based on the situation elsewhere are essential (United Nations, 2014: 14).

 

Accuracy of past projections, the use of data to develop models for projecting the future, and the assumptions on which those models are based

How to use the past and thoughts about the future to model the future is a difficult problem.  Additional assessment of the accuracy of past projections of the total population as well as of fertility and mortality could contribute to more informed decisions.  In any case, it probably is not wise to change assumptions frequently in a major way, as users of the results would easily assume real change where there has been none.  For Singapore, in 2008, the UN projected that the total fertility rate in Singapore in 2040-2045 would be 1.59; in 2010, with new fertility assumptions, it projected that the total fertility rate in Singapore in 2040-2045 would be higher than was earlier thought only two years earlier at 1.80; in 2012, the UN projected that the total fertility rate in 2040-2045 would be lower than had been thought two years earlier at 1.38.  Across the period 2008-2012, the estimated total fertility rate in Singapore declined from 1.33 to

1.25.  These changes in projected total fertility rate had no relation to actual fertility changes in Singapore (Anderson, 2014).  What changed over time was thinking about the future of fertility, based on the trajectory in less than ½ of the low fertility countries rather than an observed change in many countries for which fertility was projected.  

           

References

Alexander, J. T., Davern, M., & Stevenson, B. (2010). Inaccurate age and sex data in the Census PUMS files: Evidence and implications. National Bureau of Economic Research Working Paper No, 15703. Retrieved from http://www.nber.org/papers/w15703.pdf  Accessed September 4, 2011.

Anderson, B. A. 2014. Projecting low fertility: Some thoughts about the plausibility and implications of assumptions. University of Michigan Population Studies Center Research Report 14-815 (February) Retrieved from http://www.psc.isr.umich.edu/pubs/pdf/rr14-815.pdf Accessed April 20, 2014. Anderson, R. G., Greene, W. H., McCullough, B. D. & Vinod, H. D. (2008). The role of data/code archives in the future of economic research." Journal of Economic Methodology 15. 99-119.

Baiden, F., Hodgson, A., & Binka, F. N. (2006). Demographic surveillance sites and emerging challenges in international health, Editorial. Bulletin of the World Health Organization, 86: 163-164.

Basten, S. (2013a). Re-examining the fertility assumptions for Pacific Asia in the UN's 2010 World Population Prospects. University of Oxford Department of Social Policy and Intervention, Barnett Papers in Social Research: 2013/1 (June 7). Retrieved from http://dx.doi.org/10.2139/ssrn.2275938 Accessed June 15, 2013.

Basten, S. (2013b). Comparing projection assumptions of fertility in six sdvanced Asian economies; or ‘Thinking beyond the medium variant.’  Asian Population Studies, 9: 322-331.

Basten, S. A., Coleman, D. A., & Gu, B. (2012). Re-examining the fertility assumptions in the UN’s 2010

World Population Prospects: Intentions and fertility recovery in East Asia. a paper presented at the Annual Meeting of the Population Association of America, San Francisco  Retrieved from http://paa2012.princeton.edu/papers/122426  Accessed June 13, 2013.

Berg, M.T., Stewart, E. A., Stewart, E. & Simons, R. L. (2013). A multilevel examination of neighborhood social processes and college enrollment, Social Problems, 60: 513-534

Caldwell, J. & Caldwell, P. 1988. Is the Asian family planning program model suited to Africa? Studies in Family Planning, 19: 19-28. 

Carrel, M. & Rennie, S. (2008). Demographic and health surveillance: Longitudinal ethical considerations. Bulletin of the World Health Organization, 86: 612-616.

Chandramohan, D., Shibuya, K., Satel, P., Cairncross, S., Lopez, A. D., Murray, C. D. L., Zaba, B, Snow, R. W., & Binka, F.  (2008). Should data from demographic surveillance systems be made more widely available to researchers? PLoS Medicine, 5: 0169-0170.

Coale, A. J., & Demeny, P. (1966). Regional model life tables and stable populations.  Princeton: Princeton University Press. 

Dechter, A. R., & Preston, S. H. (1991). Age misreporting and its effect on adult mortality estimates in Latin America. Population Bulletin of the United Nations, 31/32: 1-16.

Fairchild, A. L. & Bayer, R. (1999).  Uses and abuses of Tuskegee. Science, 284: 919-921. 

Fisher, T. L. (2010). The income of the elderly: The effect of changes to reported age in the Current Population Survey, a paper presented at the annual conference of the Association for Public Policy

Analysis and Management, Boston, October 13 Retrieved from

https://www.appam.org/conferences/fall/boston2010/sessions/downloads/4555.1.pdf  Accessed July 9, 2011.

Freedman, D., Thornton, A., Camburn, D., Alwin, D. & Young-DeMarco, L. (1988). The life history calendar: a technique for collecting retrospective event-history data. Sociological Methodology, 18: 37-68.

INDEPTH Network. (2014). INDEPTH Network: Better Health Data for Better Health Policy. Retrieved from http://www.indepth-network.org/ Accessed April 20, 2014.

Jones, J. H. (1981). Bad blood: The Tuskegee syphilis experiment, New York: The Free Press.

Keilman, N. (1998). How accurate are the United Nations world population projections? Population and Development Review, 24 Supplement: Frontiers of Population Forecasting: 15-41.

Keilman, N. (2008). European demographic forecasts have not become more accurate over the past 25 years. Population and Development Review, 34: 137-153.

McCullough, B. D. (2009). Open access economics journals and the market for reproducible economic research. Economic analysis & policy, 39: 117-126.

McCullough, B. D., McGeary, K. A. & Harrison, T. D. (2008). Do economics journal archives promote replicable research? Canadian journal of economics, 41: 1406-1420.

Sherman, J. E. & L. Fetters, T. L. (2007). “Confidentiality concerns with mapping survey data in reproductive health research. Studies in Family Planning, 38: 309-321. 

Schuurman, N. (2000). Trouble in the heartland: GIS and its critics in the 1990s. Progress in human geography, 24: 569-590.

Steinbrook, R. (2008). HIV in India — A downsized epidemic. New England Journal of Medicine, 358: 107-109.

Tenopir, C, Allard, S., Douglas, K., Aydinoglu, A. U. Read, E, Manoff, M. & Frame, M. (2011). Data sharing by scientists: Practices and perceptions,” PloS One, 6: 1-21.

United Nations. (1982). Model life tables for developing countries. New York: United Nations. United Nations. (2011). Assumptions underlying the 2010 revision.  Retrieved from http://esa.un.org/unpd/World Population Prospects/Documentation/WORLD POPULATION PROSPECTS2010_ASSUMPTIONS_AND_VARIANTS.pdf  Accessed July 29, 2011.

United Nations. (2014). World population prospects: The 2012 revision, Methodology of the United Nations population estimates and projections, New York: United Nations Retrieved from http://esa.un.org/unpd/wpp/Documentation/pdf/WPP2012_Methodology.pdf  Accessed April 19, 2014.

UNAIDS. (2007). 2.5 million people living with HIV in India. Retrieved from http://www.unaids.org/en/KnowledgeCentre/Resources/FeatureStories/archive/2007/20070704_India _new_data.asp  Accessed April 19, 2014.

UNAIDS website. (2008). Q + A on India’s revised AIDS estimates. Retrieved from http://data.unaids.org/pub/InformationNote/2007/070701_india%20external_qa_en.pdf Accessed June 8, 2010 .

  1. S. Census Bureau. (2010). Analysis of perturbed and unperturbed age estimates: 2008.” Available at http://www.census.gov/cps/user_note_age_estimates.html Accessed July 2, 2010 .
  2. S. Census Bureau. (2012). RDC research opportunities. Center for Economic Studies Retrieved from https://www.census.gov/ces/rdcresearch/ Accessed April 20, 2014.

United States. White House. Office of Science and Technology Policy. 2012. Obama administration unveils “Big Data” initiative: Announces $200 million in new R&D investments, Retrieved at http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_press_release_final_2.pdf  Accessed April 19, 2014

 

Further Readings

Coale, A. J. & Trussell, T. J. (1996). The development and use of demographic models,” Population Studies, 50: 469-484.

National Academy of Science. (1995). On being a scientist: Responsible conduct of research, Second Edition, Washington, D. C.: National Academies Press.

Preston, S. H. (1993). The contours of demography: Estimates and projections.  Demography, 30: 593606.

United States, National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. (1979). The Belmont Report: Ethical principles and guidelines for the protection of human subjects of research Retrieved from http://www.fda.gov/ohrms/dockets/ac/05/briefing/20054178b_09_02_Belmont%20Report.pdf  Accessed April 20, 2014.

VanWey, L, Rindfuss, R. R., Gutman, M. P., Entwisle, B., & Balk, D. L. (2005). Confidentiality and spatially explicit data: Concerns and challenges, Proceedings of the National Academy of Sciences of the United States of America, 102: 15337-15342. Retrieved from http://www.pnas.org/content/102/43/15337.full Accessed April 19, 2014.

 

 

 


Статьи
Цифровые медиа
Русский язык
Религия
Другый предмет
Продукты и Услуги
Про Фадак
О Веб-сайт
Управление
Журнал современного менеджмента
Управленческие стихи
Цитаты о фотографии
Фото написано
Банк исследователей управления
Тема статей по менеджменту
Образовательные ресурсы (семинары и университеты)
Исследования
Обсерватория - деятели
Обсерватория - Культурные
Обсерватория - Академическая
Обсерватория - СМИ
Обсерватория - научные мероприятия
Язык
Словарь
Тест по русскому языку
Русская пословица
Английская пословица
Четыре языковых предложения
logo-samandehi
О | Свяжитесь с нами | Политика конфиденциальности | Условия | Политика в отношении файлов cookie |
Версия (пре-альфа) 2000-2022 CMS Fadak. ||| Version : 5.2 ||| By: Fadak Solutions Старая версия