Research
JOURNAL ARTICLES
Kautz, Tim and Wladimir Zanoni. (2024). "Measurement and Development of Noncognitive Skills in Adolescence: Evidence from Chicago Public Schools and the OneGoal Program." Journal of Human Capital, 18(2): 272-303. [Abstract]
Abstract
Using administrative data, we develop measures of noncognitive skills and evaluate OneGoal, an intervention designed to help disadvantaged students complete college by teaching themnoncognitive skills.We (1) compare the outcomes of participants and nonparticipants with similar characteristics and (2) use a difference-in-differences approach exploiting that OneGoal was introduced into different schools at different times.We estimate that OneGoal increases college enrollment by 10–20 percentage points for males and females and reduces arrest rates by 5 percentage points for males. Through a mediation analysis, we find that improvements in noncognitive skills account for 13%–32% of these effects.
Feng, Shuaizhang, Yu Gan, Yujie Han, and Tim Kautz. (2024). "Shorter Can Be Better: Balancing Length and Predictive Power when Measuring Noncognitive Skills to Predict Academic Outcomes." Economics Letters, 236: 111598. [Abstract]
Abstract
We develop shorter versions of a Big Five survey designed to measure students' noncognitive skills and predict students' later academic outcomes. We find that measures with fewer items can better predict students' outcomes, suggesting that using shorter versions of a Big Five Inventory may be cost-effective in large-scale social surveys.
Lira, Benjamin, Joseph M. O'Brien, Pablo A. Peña, Brian M. Galla, Sidney D'Mello, David S. Yeager, Amy Defnet, Tim Kautz, Kate Munkacsy, and Angela L. Duckworth. (2022). "Large Studies Reveal How Reference Bias Limits Policy Applications of Self-Report Measures.'' Scientific Reports, 12: 19189. [Abstract]
Abstract
There is growing policy interest in identifying contexts that cultivate self-regulation. Doing so often entails comparing groups of individuals (e.g., from different schools). We show that self-report questionnairesthe—most prevalent modality for assessing self-regulation—are prone to reference bias, defined as systematic error arising from differences in the implicit standards by which individuals evaluate behavior. In three studies, adolescents (N = 229,685) whose peers performed better academically rated themselves lower in self-regulation and held higher standards for self-regulation. This effect was not observed for task measures of self-regulation and led to paradoxical predictions of college persistence 6 years later. These findings suggest that standards for self-regulation vary by social group, limiting the policy applications of self-report questionnaires.
Feng, Shuaizhang, Yujie Han, James J. Heckman, and Tim Kautz. (2022). "Comparing the Reliability and Predictive Power of Child, Teacher, and Guardian Reports of Noncognitive Skills." Proceedings of the National Academy of Sciences, 119(6): e2113992119. [Abstract]
Abstract
Children’s noncognitive or socioemotional skills (e.g., persistence and self-control) are typically measured using surveys in which either children rate their own skills or adults rate the skills of children. For many purposes—including program evaluation and monitoring school systems—ratings are often collected from multiple perspectives about a single child (e.g., from both the child and an adult). Collecting data from multiple perspectives is costly, and there is limited evidence on the benefits of this approach. Using a longitudinal survey, this study compares children’s noncognitive skills as reported by themselves, their guardians, and their teachers. Although reports from all three types of respondents are correlated with each other, teacher reports have the highest internal consistency and are the most predictive of children’s later cognitive outcomes and behavior in school. The teacher reports add predictive power beyond baseline measures of Intelligence Quotient (IQ) for most outcomes in schools. Measures collected from children and guardians add minimal predictive power beyond the teacher reports.
Schochet, Peter Z., Nicole E. Pashley, Luke W. Miratrix, and Tim Kautz. (2022). "Design-Based Ratio Estimators and Central Limit Theorems for Clustered, Blocked RCTs." Journal of the American Statistical Association, 117(540): 2135-2146. [Abstract]
Abstract
This article develops design-based ratio estimators for clustered, blocked randomized controlled trials (RCTs), with an application to a federally funded, school-based RCT testing the effects of behavioral health interventions. We consider finite population weighted least-square estimators for average treatment effects (ATEs), allowing for general weighting schemes and covariates. We consider models with block-by-treatment status interactions as well as restricted models with block indicators only. We prove new finite population central limit theorems for each block specification. We also discuss simple variance estimators that share features with commonly used cluster-robust standard error estimators. Simulations show that the design-based ATE estimator yields nominal rejection rates with standard errors near true ones, even with few clusters.
Milkman, Katherine L., Dena Gromet, Hung Ho, Joseph S. Kay,..., Tim Kautz,..., and Angela L. Duckworth. (2021). "Megastudies Improve the Impact of Applied Behavioural Science." Nature, 600: 478-483. [Abstract]
Abstract
Policy-makers are increasingly turning to behavioural science for insights about how to improve citizens’ decisions and outcomes. Typically, different scientists test different intervention ideas in different samples using different outcomes over different time intervals. The lack of comparability of such individual investigations limits their potential to inform policy. Here, to address this limitation and accelerate the pace of discovery, we introduce the megastudy—a massive field experiment in which the effects of many different interventions are compared in the same population on the same objectively measured outcome for the same duration. In a megastudy targeting physical exercise among 61,293 members of an American fitness chain, 30 scientists from 15 different US universities worked in small independent teams to design a total of 54 different four-week digital programmes (or interventions) encouraging exercise. We show that 45% of these interventions significantly increased weekly gym visits by 9% to 27%; the top-performing intervention offered microrewards for returning to the gym after a missed workout. Only 8% of interventions induced behaviour change that was significant and measurable after the four-week intervention. Conditioning on the 45% of interventions that increased exercise during the intervention, we detected carry-over effects that were proportionally similar to those measured in previous research. Forecasts by impartial judges failed to predict which interventions would be most effective, underscoring the value of testing many ideas at once and, therefore, the potential for megastudies to improve the evidentiary value of behavioural science.
Duckworth, Angela L., Tim Kautz, Amy Defnet, Emma Satlof-Bedrick, Sean Talamas, Benjamin Lira Luttges, and Laurence Steinberg. (2021). "Students Attending School Remotely Suffer Socially, Emotionally, and Academically." Educational Researcher, 50(7): 479-482. [Abstract]
Abstract
What is the social, emotional, and academic impact of attending school remotely rather than in person? We address this urgent policy issue using survey data collected from N = 6,576 high school students in a large, demographically diverse school district that allowed families to choose either format in fall 2020. Controlling for baseline measures of well-being collected one month before the onset of the COVID-19 pandemic, as well as student demographics and other administrative data from official school records, students who attended school remotely reported lower levels of social, emotional, and academic well-being (ES = 0.10, 0.08, and 0.07 standard deviations, respectively) than classmates who attended school in person—differences that were consistent across gender, race and ethnicity, and socioeconomic status subgroups but significantly wider for older compared to younger students.
Hock, Heinrich, Dara Lee Luca, Tim Kautz, and David Stapleton. (2021). "Improving the Outcomes of Youth with Medical Limitations: Evidence from the National Job Corps Study." Journal of Economics & Management Strategy, 32(3): 636-656. [Abstract]
Abstract
Improving work outcomes for youth with disabilities and reducing their reliance on disability benefits are important policy priorities, but existing interventions have shown limited promise. We provide new evidence to inform this discussion by re-analyzing data from the 1990s National Job Corps Study, a randomized field experiment conducted nationwide in the United States. Job Corps, which provides comprehensive training to economically disadvantaged youth, is the nation's largest youth program outside of the school system. We examine youth who had medical limitations when they enrolled in the experiment, a group that has not previously been studied. During the 4 years after random assignment, participation in Job Corps increased the earnings of youth with medical limitations—substantially more so than for youth without medical limitations—and additionally reduced their receipt of disability cash benefits. Interventions designed specifically for such youth have not typically demonstrated reductions in benefit receipt. Hence, our re-analysis of the field experiment suggests that Job Corps could be a promising model for helping some youth with disabilities gain a foothold in the labor market and achieve greater self-sufficiency.
Deke, John, Thomas Wei, and Tim Kautz. (2021). "Asymdystopia: The Threat of Small Biases in Evaluations of Education Interventions that Need to be Powered to Detect Small Impacts." Journal of Research on Educational Effectiveness, 14(1): 207-240. [Abstract]
Abstract
Evaluators of education interventions are increasingly designing studies to detect impacts much smaller than the 0.20 standard deviations that Cohen characterized as “small.” While the need to detect smaller impacts is based on compelling arguments that such impacts are substantively meaningful, the drive to detect smaller impacts may create a new challenge for researchers: the need to guard against smaller biases. The purpose of this article is twofold. First, we examine the potential for small biases to increase the risk of making false inferences as studies are powered to detect smaller impacts, a phenomenon we refer to as asymdystopia. We examine this potential for two of the most rigorous designs commonly used in education research—randomized controlled trials and regression discontinuity designs. Second, we recommend strategies researchers can use to avoid or mitigate these biases.
Chen, Yuanyuan, Shuaizhang Feng, James J. Heckman, and Tim Kautz. (2020). "Sensitivity of Self-Reported Non-Cognitive Skills to Survey Administration Conditions." Proceedings of the National Academy of Sciences, 117(2): 931-935. [Abstract]
Abstract
Noncognitive skills (e.g., persistence and self-control) are typically measured using self-reported questionnaires in which respondents rate their own skills. In many applications—including program evaluation and school accountability systems—such reports are assumed to measure only the skill of interest. However, self-reports might also capture other dimensions aside from the skill, such as aspects of a respondent's situation, which could include incentives and the conditions in which they complete the questionnaire. To explore this possibility, this study conducted 2 experiments to estimate the extent to which survey administration conditions can affect student responses on noncognitive skill questionnaires. The first experiment tested whether providing information about the importance of noncognitive skills to students directly affects their responses, and the second experiment tested whether incentives tied to performance on another task indirectly affect responses. Both experiments suggest that self-reports of noncognitive skills are sensitive to survey conditions. The effects of the conditions are relatively large compared with those found in the program evaluation literature, ranging from 0.05 to 0.11 SDs. These findings suggest that the effects of interventions or other social policies on self-reported noncognitive skills should be interpreted with caution.
Heckman, James J. and Kautz, Tim. (2012). "Hard Evidence on Soft Skills." Labour Economics, 19(4): 451-64. [Abstract]
Abstract
This paper summarizes recent evidence on what achievement tests measure; how achievement tests relate to other measures of "cognitive ability" like IQ and grades; the important skills that achievement tests miss or mismeasure, and how much these skills matter in life. Achievement tests miss, or perhaps more accurately, do not adequately capture, soft skills—personality traits, goals, motivations, and preferences—that are valued in the labor market, in school, and in many other domains. The larger message of this paper is that soft skills predict success in life, that they causally produce that success, and that programs that enhance soft skills have an important place in an effective portfolio of public policies.
Kautz, Tim, Eran Bendavid, Jay Bhattacharya, and Grant Miller. (2010). "AIDS and Declining Support for Dependent Elderly People in Africa: Retrospective Analysis Using Demographic and Health surveys." British Medical Journal, 340: c2841. [Abstract]
Abstract
Objectives: To determine the relation between the HIV/ AIDS epidemic and support for dependent elderly people in Africa. Design: Retrospective analysis using data from Demographic and Health Surveys. Setting: 22 African countries between 1991 and 2006. Participants: 123,176 individuals over the age of 60. Main outcome measures: We investigated how three measures of the living arrangements of older people have been affected by the HIV/AIDS epidemic: the number of older individuals living alone (that is, the number of unattended elderly people); the number of older individuals living with only dependent children under the age of 10 (that is, in missing generation households); and the number of adults age 18-59 (that is, prime age adults) per household where an older person lives. Results: An increase in annual AIDS mortality of one death per 1000 people was associated with a 1.5% increase in the proportion of older individuals living alone (95% CI 1.2% to 1.9%) and a 0.4% increase in the number of older individuals living in missing generation households (95% CI 0.3% to 0.6%). Increases in AIDS mortality were also associated with fewer prime age adults in households with at least one older person and at least one prime age adult (P<0.001). These findings suggest that in our study countries, which encompass 70% of the sub-Saharan population, the HIV/AIDS epidemic could be responsible for 582 200-917 000 older individuals living alone without prime age adults and 141 000-323 100 older individuals being the sole caregivers for young children. Conclusions: Africa’s HIV/AIDS epidemic might be responsible for a large number of older people losing their support and having to care for young children. This population has previously been under-recognised. Efforts to reduce HIV/AIDS deaths could have large “spillover” benefits for elderly people in Africa.
BOOK CHAPTERS
Heckman, James J., Tomas Jagelka, and Tim Kautz. (2021). "Some Contributions of Economics to the Study of Personality." In Handbook of Personality, Vol. 4, edited by O.P. John and R.W. Robins. New York, NY: Guilford Press. pp. 853-892. [Abstract]
Abstract
This chapter synthesizes recent research in economics and psychology on the measurement and empirical importance of personality skills and preferences. They predict and cause important life outcomes such as wages, health, and longevity. Skills develop over the life cycle and can be enhanced by education, parenting, and environmental influences to different degrees at different ages. Economic analysis clarifies psychological studies by establishing that personality is measured by performance on tasks which depends on incentives and multiple skills. Identification of any single skill therefore requires isolation of confounding factors, accounting for measurement error using rich data and application of appropriate statistical techniques. Skills can be inferred not only by questionnaires and experiments but also from observed behavior. Economists advance the analysis of human differences by providing anchored measures of economic preferences and studying their links to personality and cognitive skills. Connecting the research from the two disciplines promotes understanding of the number and nature of skills and preferences required to characterize essential differences.
Heckman, James J. and Kautz, Tim. (2014). "Achievement Tests and the Role of Character in American Life," In The Myth of Achievement Tests: The GED and the Role of Character in American Life, edited by J.J. Heckman, J.E. Humphries, and T. Kautz. Chicago, IL: University of Chicago Press. pp. 3-56. [Abstract]
Abstract
This chapter discusses the predictive power of achievement tests. It shows that achievement tests do not explain much variation in meaningful later-life outcomes, partly because achievement tests miss character skills, such as persistence, curiosity, and self-control. It reviews the history of achievement tests and the role of character in American education. It discusses the GED, a prominent achievement test that is used to certify high school equivalency. It provides an overview of the characteristics of GED recipients and the returns to the GED.
Heckman, James J., John Eric Humphries, and Tim Kautz. (2014). "Who Are The GEDs," In The Myth of Achievement Tests: The GED and the Role of Character in American Life, edited by J.J. Heckman, J.E. Humphries, and T. Kautz. Chicago, IL: University of Chicago Press. pp. 139-170. [Abstract]
Abstract
This chapter discusses the characteristics of GED recipients. It shows that they are smart (relative to high school graduates who do not go on to college) but lack character skills. They come from more disadvantaged backgrounds than ordinary high school graduates. Their deficits in character emerge as early as age six. It also examines the life events surrounding GED certification.
Heckman, James J., John Eric Humphries, and Tim Kautz. (2014). "The Economic and Social Benefits of GED Certification," In The Myth of Achievement Tests: The GED and the Role of Character in American Life, edited by J.J. Heckman, J.E. Humphries, and T. Kautz. Chicago, IL: University of Chicago Press. pp. 171-267. [Abstract]
Abstract
This chapter surveys the previous literature on the effectiveness of the GED. It also conducts original empirical studies using seven data sets collected in different time periods to evaluate the effectiveness of the GED testing program using a variety of outcome measures. All data show that GED recipients do not perform at the level of high school graduates. After controlling for cognition and background, the vast majority of male GED recipients do no better than uncertified dropouts. For some women, there is evidence of an apparent benefit, but the interpretation to be placed on these estimates is ambiguous. We argue that it is primarily due to uncontrolled selective factors and is not a causal benefit of GED certification. Any gain appears to come from their greater labor force attachment and not because of higher hourly wages compared to those of other dropouts. Their life-cycle wage growth is the same as that of dropouts. For both men and women, skills present before certification receive the same market wages and earnings before and after GED certification, so the GED does not serve a signaling function.
Heckman, James J. and Kautz, Tim. (2014). "Fostering and Measuring Skills: Interventions that Improve Character and Cognition," In The Myth of Achievement Tests: The GED and the Role of Character in American Life, edited by J.J. Heckman, J.E. Humphries, and T. Kautz. Chicago, IL: University of Chicago Press. pp. 293-317. [Abstract]
Abstract
This chapter reviews the recent literature on measuring and boosting cognitive and noncognitive skills. The literature establishes that achievement tests do not adequately capture character skills-personality traits, goals, motivations, and preferences-that are valued in the labor market, in school, and in many other domains. Their predictive power rivals that of cognitive skills. Reliable measures of character have been developed. All measures of character and cognition are measures of performance on some task. In order to reliably estimate skills from tasks, it is necessary to standardize for incentives, effort, and other skills when measuring any particular skill. Character is a skill, not a trait. At any age, character skills are stable across different tasks, but skills can change over the life cycle. Character is shaped by families, schools, and social environments. Skill development is a dynamic process, in which the early years lay the foundation for successful investment in later years. High-quality early childhood and elementary school programs improve character skills in a lasting and cost-effective way. Many of them beneficially affect later-life outcomes without improving cognition. There are fewer long-term evaluations of adolescent interventions, but workplace-based programs that teach character skills are promising. The common feature of successful interventions across all stages of the life cycle through adulthood is that they promote attachment and provide a secure base for exploration and learning for the child. Successful interventions emulate the mentoring environments offered by successful families.
Heckman, James J., John Eric Humphries, and Tim Kautz. (2014). "What Should be Done?," In The Myth of Achievement Tests: The GED and the Role of Character in American Life, edited by J.J. Heckman, J.E. Humphries, and T. Kautz. Chicago, IL: University of Chicago Press. pp. 431-436. [Abstract]
Abstract
This chapter presents policy recommendations about the GED program and the more general problem of promoting skills in the American economy. It argues that any successful approach going forward should recognize the power of character skills. They can be measured, and effective interventions are available to shape them.
Almlund, Mathilde, Angela Lee Duckworth, James J. Heckman, and Tim Kautz. (2011). "Personality Psychology and Economics," In Handbook of the Economics of Education, Vol. 4, edited by E. Hanushek, S. Machin, and L. Woessman. Amsterdam: Elsevier. pp. 1-181. [Abstract]
Abstract
This chapter explores the power of personality traits both as predictors and as causes of academic and economic success, health, and criminal activity. Measured personality is interpreted as a construct derived from an economic model of preferences, constraints, and information. Evidence is reviewed about the “situational specificity” of personality traits and preferences. An extreme version of the situationist view claims that there are no stable personality traits or preference parameters that persons carry across different situations. Those who hold this view claim that personality psychology has little relevance for economics. The biological and evolutionary origins of personality traits are explored. Personality measurement systems and relationships among the measures used by psychologists are examined. The predictive power of personality measures is compared with the predictive power of measures of cognition captured by IQ and achievement tests. For many outcomes, personality measures are just as predictive as cognitive measures, even after controlling for family background and cognition. Moreover, standard measures of cognition are heavily influenced by personality traits and incentives. Measured personality traits are positively correlated over the life cycle. However, they are not fixed and can be altered by experience and investment. Intervention studies, along with studies in biology and neuroscience, establish a causal basis for the observed effect of personality traits on economic and social outcomes. Personality traits are more malleable over the life cycle compared to cognition, which becomes highly rank stable around age 10. Interventions that change personality are promising avenues for addressing poverty and disadvantage.
SELECTED POLICY REPORTS AND BRIEFS
Kautz, Tim, Christina Kent, and Dan Thal. (2024). "Using Bayesian Methods to Conduct Subgroup Analysis in Evaluations of Employment Programs." OPRE Report 2024-027. Washington, DC: Office of Planning, Research, and Evaluation, Administration for Children and Families, U.S. Department of Health and Human Services. [Abstract]
Abstract
Many Temporary Assistance for Needy Families (TANF) recipients and other individuals with low incomes seek employment and training programs to help them find jobs or improve their earnings, which could, in turn, allow them to better support their families. However, these programs do not necessarily benefit all participants equally. Program evaluations that include subgroup analysis can inform how employment programs provide their services and help them improve equity by identifying who needs more tailored services. This report details new and promising approaches to subgroup analysis for evaluators of employment programs. It discusses how two Bayesian methods—a Bayesian hierarchical linear model and a Bayesian causal forest—can potentially address limitations of standard subgroup analysis. The report uses data from four experimental evaluations of employment programs in the Evaluation of Employment Coaching for TANF and Related Populations, a project sponsored by the Office of Planning, Research, and Evaluation in the Administration for Children and Families, U.S. Department of Health and Human Services. The results suggest that Bayesian methods can complement traditional methods of conducting subgroup analyses in impact evaluations.
Kautz, Tim and Julius Anastasio. (2024). "The Predictive Power of Measures of Self-Regulation Skills Among Adults with Low Incomes." OPRE Report 2024-008. Washington, DC: Office of Planning, Research, and Evaluation, Administration for Children and Families, U.S. Department of Health and Human Services. [Abstract]
Abstract
The purpose of this study is to examine how measures of self-regulation skills relate to future outcomes. The results can inform researchers who work with employment programs on how to promote self-regulation skills or who plan to use measures of selfregulation skills in evaluations of employment programs. The study provides evidence on (1) how individual self-regulation measures relate to longer-term outcomes; (2) how using multiple self-regulation measures predicts longer-term outcomes compared to using individual measures; and (3) the extent to which using self-regulation measures can improve the prediction of longer-term outcomes above and beyond using only sociodemographic characteristics.
Moore, Quinn, Tim Kautz, Sheena McConnell, Owen Schochet, and April Wu. (2023). "Can a Participant-Centered Approach to Setting and Pursuing Goals Help Adults with Low Incomes Become Economically Stable?" OPRE Report 2023-139. Washington, DC: Office of Planning, Research, and Evaluation, Administration for Children and Families, U.S. Department of Health and Human Services. [Abstract]
Abstract
Policy makers, practitioners, researchers, and others are interested in the potential of employment coaching to help Temporary Assistance for Needy Families (TANF) recipients and other adults with low incomes to become economically secure. Employment coaching is based on the idea that coaches can help people use and strengthen the skills that enable them to stay organized, finish tasks, and control emotions. Improving these skills, which we refer to as self-regulation skills, can in turn help them improve their economic security. Coaches work collaboratively with participants to help them set individualized goals directly or indirectly related to employment and provide motivation, support, and feedback as participants work toward those goals. Unlike most traditional case managers, coaches work in partnership with participants and do not tell participants what goals to set or what actions to take to work toward them. Despite growing interest in employment coaching programs for adults with low incomes, there is no rigorous evidence of their effectiveness.
This report presents short-term impact findings from an experimental study conducted as part of the Evaluation of Employment Coaching for TANF and Related Populations, which is sponsored by the Administration for Children and Families. This evaluation includes an impact study of four employment coaching programs. It uses an experimental design to assess the impacts of each program on study participants' self-regulation skills, employment, earnings, and other measures of personal and family well-being during the first 9 or 12 months, depending on the program, after study enrollment. In doing so, it offers the first look at program impacts at a time when most participants have received a substantial amount of coaching, but when many continue to engage in coaching. Future reports will document whether and how these impacts change over time as participants receive more coaching services and complete their programs.
Kautz, Tim, Kathleen Feeney, Hanley Chiang, Sarah Lauffer, Maria Bartlett, and Charles Tilley. (2021). "Using a Survey of Social and Emotional Learning and School Climate to Inform Decisionmaking." OPRE Report 2020-138. Washington, DC: Office of Planning, Research, and Evaluation, Administration for Children and Families, U.S. Department of Health and Human Services. [Abstract]
Abstract
The District of Columbia Public Schools (DCPS) has prioritized efforts to support students’ social and emotional learning (SEL) competencies, such as perseverance and social awareness. To measure students’ SEL competencies and the school experiences that promote SEL competencies (school climate), DCPS began administering annual surveys to students, teachers, and parents in 2017/18. DCPS partnered with the Mid-Atlantic Regional Educational Laboratory to study how the district could use these surveys to improve students’ outcomes. The study found the following:
Students’ SEL competencies and school experiences are the most favorable in elementary school and the least favorable in middle school and the beginning of high school. This pattern suggests that schools might provide targeted supports before or during grades 6–10 to promote SEL competencies and school experiences when students need the most support.
The trajectories of students’ SEL competencies and school experiences differed in different schools, to a similar degree as trajectories in academic measures like test scores. To understand why changes in SEL competencies and school experiences differ across schools, DCPS could explore differences in practices between schools with better and worse trajectories. In addition, DCPS could provide targeted support to schools with lower levels of positive change.
Of the SEL competencies and school experiences in DCPS’s survey, self-management—how well students control their emotions, thoughts, and behavior—is most related to students’ later academic outcomes. Programs or interventions that target self-management might have the most potential for improving students’ outcomes compared to those that target other SEL competencies or school experiences.
In statistical models designed to predict students’ future academic outcomes, SEL competency and school experience data add little accuracy beyond prior academic outcomes (such as achievement test scores and attendance) and demographic characteristics. Prior academic outcomes and demographic characteristics predict later outcomes with a high degree of accuracy, and they may implicitly incorporate the SEL competencies and school experiences. These findings suggest that DCPS would not need to use SEL competencies and school experiences to identify whether or not students are at risk of poor academic outcomes.
Student, teacher, and parent reports on SEL competencies and school experiences are positively related across schools, but they also exhibit systematic differences, suggesting that some respondent groups may not be aligned in their view of SEL competencies and school experiences. These differences may serve as a tool to help DCPS target efforts to improve communication among students, teachers, and parents.
Kautz, Tim, and Quinn Moore. (2020). "Selecting and Testing Measures of Self-Regulation Skills Among Low-Income Populations." OPRE Report 2020-138. Washington, DC: Office of Planning, Research, and Evaluation, Administration for Children and Families, U.S. Department of Health and Human Services. [Abstract]
Abstract
This report discusses issues related to selecting and testing measures of self-regulation skills in evaluations of employment programs for low-income populations. First, it presents an overview of criteria for selecting measures of self-regulation skills. Second, through a presentation of empirical evidence, this report demonstrates a process for developing and testing self-regulation measures in the context of an impact evaluation of employment coaching programs for low-income populations. Third, it discusses how the process could be adapted to other studies.
Goldring, Ellen, Melissa A. Clark, Mollie Rubin, Laura K. Rogers, Jason A. Grissom, Brian Gill, Tim Kautz, Moira McCullough, Michael Neel, and Alyson Burnett. (2020). "Changing the Principal Supervisor Role to Better Support Principals: Evidence from the Principal Supervisor Initiative." Princeton, NJ: Mathematica. [Abstract]
Abstract
In 2014, The Wallace Foundation launched the Principal Supervisor Initiative (PSI), a four-year, $24 million effort to redefine principal supervision in six urban school districts. The PSI aimed to help districts overhaul a position traditionally focused on administration, operations, and compliance to one dedicated to developing and supporting principals to be effective instructional leaders who had the skills to foster high quality instruction and learning. In this study report, researchers from Mathematica and Vanderbilt University describe the PSI experiences of districts, principal supervisors, and principals; the PSI's effects on teachers’ perceptions of principals' performance; and lessons learned from the initiative.
Kautz, Tim, Charles Tilley, Christine Ross, and Natalie Larkin. (2020). "Development of a School Survey and Index as a School Performance Measure in Maryland: A REL-MSDE Research Partnership." Washington, DC: Regional Educational Laboratory Mid-Atlantic, National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. [Abstract]
Abstract
Fueled by evidence on the strong relationships between school climate and academic achievement, teacher satisfaction, health outcomes, and social-emotional skills, states and districts are increasingly trying to measure school climate (Brand, Felner, Shim, Seitsinger, & Dumas, 2003; Gase et al., 2017; Lacireno-Paquet, Bocala, & Bailey, 2016; Voight & Hanson, 2017). School climate encompasses both tangible and intangible attributes, including relationships among students and staff, school discipline, student engagement, and safety. The Maryland State Department of Education (MSDE) partnered with Regional Educational Laboratory (REL) Mid-Atlantic to co-develop, validate, and benchmark a school climate index based upon the Maryland School Survey. The climate index will serve as a measure of school quality and student success in Maryland's school accountability framework. MSDE administered the survey statewide for accountability purposes beginning in spring 2019 following a field test in fall 2018.
This report details an approach to examining survey reliability and validity as well as converting each individual respondent's answers on the survey to an overall measure of climate for each school in Maryland. After validating the survey against standard criteria of reliability and validity, the study team used a Rasch model to develop benchmarks for each topic in the survey along four categories of school climate favorability. Based on these benchmarks, each respondent's survey responses were converted to a 1-to-10–point scale score for each topic in the survey. These topic scores were combined into an overall school climate index separately for students and instructional staff. Maryland is one of the first states to develop a measure of school climate for the state's school accountability system, and its experience may serve as a guiding example for other states and education agencies.
Herrmann, Mariesa, Melissa Clark, Susanne James-Burdumy, Christina Tuttle, Tim Kautz, Virginia Knechtel, Dallas Dotter, Claire Smither Wulsin, and John Deke. (2019). "The Effects of a Principal Professional Development Program Focused on Instructional Leadership." NCEE Report 2020-0002. Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. [Abstract]
Abstract
Principals can play a key role in improving instruction and student achievement. The Institute of Education Sciences conducted a random assignment study of a professional development program for elementary school principals to support state and local efforts to improve school leadership. The program focused on helping principals conduct structured observations of teachers’ classroom instruction and provide targeted feedback. It provided nearly 200 hours of professional development over two years, half of it through individualized coaching. Key findings include:
(1) Despite substantially increasing the amount of professional development principals received, the program did not affect student achievement or most teacher or school outcomes. For example, the professional development did not affect school climate or principal retention.
(2) The program did not have the intended effects on principal practices that it targeted, which may explain its lack of effects on key student, teacher, and school outcomes. For example, it decreased the frequency of instructional support and feedback teachers received from principals, and it did not affect the number of teacher observations principals conducted or the usefulness of the feedback as reported by teachers.
Heckman, James J., Tim Kautz, and Charles Tilley. (2019). "Advancing the Measurement of Non-Cognitive Skills: Evidence from Chicago Public Schools." Princeton, NJ: Mathematica. [Abstract]
Abstract
Non-cognitive skills, such as persistence and social awareness, are important determinants of life outcomes and can be shaped through education and interventions. For this reason, schools and districts have started to measure non-cognitive skills, primarily relying on self-reports in which students assess their own skills. However, recent research suggests that these self-reports might suffer from biases. To address these biases, researchers have developed innovative measurement techniques, including advanced survey-based measures and measures based on more objective academic indicators (for example, absences and credits earned). However, no studies have directly compared the properties of standard self-reports, advanced survey-based measures, and measures based on academic indicators. We fill this gap by (1) examining the relationships among these three approaches; (2) calculating their predictive power for later outcomes; and (3) exploring their susceptibility to biases. Our findings suggest that standard self-reported measures of non-cognitive skills suffer from substantive biases, but that innovative measurement approaches can address these biases and yield predictive measures of non-cognitive skills.
Moore, Quinn, Sheena McConnell, Alan Werner, Tim Kautz, Kristen Joyce, Kelley Borradaile, and Bethany Bolland. (2019). "Evaluation of Employment Coaching for TANF and Related Populations: Evaluation Design Report." OPRE Report 2019-65. Washington, DC: Office of Planning, Research, and Evaluation, Administration for Children and Families, U.S. Department of Health and Human Services. [Abstract]
Abstract
This report describes the design of the Evaluation of Employment Coaching. It identifes the types of employment coaching interventions that are the focus of this evaluation and how coaching is expected to afect participants' economic security. Next, it discusses the process for selecting employment coaching interventions to study in the evaluation. It provides details on the design of the impact study, including the process for conducting random assignment, data needs and sources, and the analytic approach to estimating intervention impacts. It also describes the implementation study, including the research questions to be addressed, the data collection strategy, and the analytic approach. The report concludes with an overview of the evaluation and reporting schedule.
The research design for the Evaluation of Employment Coaching will provide rigorous estimates of the effectiveness of employment coaching in improving the economic security of low-income populations, and important lessons on the implementation of employment coaching. The impact study will estimate intervention impacts using an experimental design. The implementation study will document and analyze the implementation of the employment coaching interventions with three key purposes: (1) to describe the program design and operations of each employment coaching intervention and the conditions necessary for replication; (2) to help interpret the impact analysis results; and (3) to identify lessons learned for purposes of program refinement and replication.
Kautz, Tim, and Quinn Moore. (2018). "Measuring Self-Regulation Skills in Evaluations of Employment Programs for Low-Income Populations: Challenges and Recommendations." OPRE Report 2018-83. Washington, DC: Office of Planning, Research, and Evaluation, Administration for Children and Families, U.S. Department of Health and Human Services. [Abstract]
Abstract
Four challenges arise when measuring self-regulation skills in evaluations of employment programs for low-income populations. First, measures of self-regulation skills can reflect aspects of a person's situation (for example, his or her background or financial resources) in addition to his or her skills. Second, most existing measures were developed for purposes other than program evaluation, such as describing characteristics of populations generally or for diagnosing people with severe problems. Third, most existing measures were not designed for use with low-income populations. Fourth, some measures take a long time to administer or require special technology.
For use in evaluations of employment programs, we suggest that measures of self-regulation should: (1) relate to employment outcomes of interest; (2) capture skills that could be influenced by the program; (3) account for confounding factors that affect measurement but not skills, and (4) be feasible to administer in an evaluation. To meet these criteria, we suggest using a set of both general measures of self-regulation as well as ones that are specific to the employment context, collecting information on other aspects of the participants' situations that can be affected by the program, modifying measures to fit the target population, and conducting analyses to assess the reliability and validity of selected measures.
Kautz, Tim, and Russel Cole. (2017). "Selecting Benchmark and Sensitivity Analyses." Evaluation Technical Assistance Brief. Rockville, MD: Office of Adolescent Health, U.S. Department of Health and Human Services. [Abstract]
Abstract
Despite best efforts to be independent and impartial, and to let the data speak clearly, researchers must make difficult decisions that play a role in the findings that they produce from their impact evaluations. After specifying a research question about the effectiveness of a program, researchers face many decisions about how to operationalize the analysis—for example, how to clean contradictory data or which statistical approach they should use to estimate the program's impact. Such decisions are challenging, because there are often several justifiable but competing approaches, each of which can lead to different results. Researchers could stumble on a potentially erroneous result that depends on an arbitrary modeling decision. As a consequence, they might inadvertently highlight a finding that does not reflect the true effect of the program, rather the finding is an artifact of their analytic decisions. Findings that are highly sensitive to research methods are considered less credible (Leamer 1985).
This brief discusses how to estimate and present a set of analyses that reveal how sensitive the results are to the researcher's analytic decisions. We propose using a benchmark analysis that will serve as the primary answer to the research question and a set of sensitivity analyses that will summarize how that answer might change under different assumptions. We draw attention to common situations in which teen pregnancy prevention (TPP) researchers make decisions that could influence their findings, and we highlight how sensitivity analyses can help protect the integrity of the results and paint a more comprehensive picture about the effects of a program. This approach also helps avoid any appearance of "fishing for results" or "p-hacking," which arises when researchers privately conduct many different analyses but publicly report only the most favorable or statistically significant results (Wasserstein and Lazar 2016).
Deke, John, Thomas Wei, and Tim Kautz. (2017). "Asymdystopia: The Threat of Small Biases in Evaluations of Education Interventions that Need to be Powered to Detect Small Impacts." NCEE Report 2018-4002. Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. [Abstract]
Abstract
Evaluators of education interventions are increasingly designing studies to detect impacts much smaller than the 0.20 standard deviations that Cohen (1988) characterized as "small." While the need to detect smaller impacts is based on compelling arguments that such impacts are substantively meaningful, the drive to detect smaller impacts may create a new challenge for researchers: the need to guard against smaller biases. The purpose of this paper is twofold. First, we examine the potential for small biases to increase the risk of making false inferences as studies are powered to detect smaller impacts, a phenomenon we refer to as asymdystopia. We examine this potential for two of the most rigorous designs commonly used in education research—randomized controlled trials (RCTs) and regression discontinuity designs (RDDs). Second, we recommend strategies researchers can use to avoid or mitigate these biases.
Kautz, Tim, Peter Z. Schochet, and Charles Tilley. (2017). "Comparing Impact Findings from Design-based and Model-based Methods: An Empirical Investigation." NCEE Report 2017-4026. Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. [Abstract]
Abstract
A new design-based theory has recently been developed to estimate impacts for randomized controlled trials (RCTs) and basic quasi-experimental designs (QEDs) for a wide range of designs used in social policy research (Imbens & Rubin, 2015; Schochet, 2016). These methods use the potential outcomes framework and known features of study designs to connect statistical methods to the building blocks of causal inference. They differ from model-based methods that have commonly been used in education research, including hierarchical linear model (HLM) methods and robust cluster standard error (RCSE) methods for clustered designs. In comparison to model-based methods, the design-based methods tend to make fewer assumptions about the nature of the data and also more explicitly account for known information about the experimental and sampling designs. While these theoretical differences suggest the corresponding estimates might differ, it is unclear how much of a practical difference it makes to use design-based methods versus more conventional model-based methods.
This study addresses this question by re-analyzing nine past RCTs in the education area using both design- and model-based methods. The study uses real data, rather than simulated data, to better explore the differences that would arise in practice. In order to investigate the full scope of differences between the methods, the study uses data generated from different types of randomization designs commonly used in social policy research: (1) non-clustered designs in which individuals are randomized, (2) clustered designs in which groups are randomized, (3) non-blocked designs in which randomization is conducted for a single population, and (4) blocked (stratified) designs in which randomization is conducted separately within partitions of the sample. The study conducts the design-based analyses using RCT-YES, a free software package funded by the Institute of Education Sciences (IES) that applies design-based methods to a wide range of RCT designs (www.rct-yes.com).
Kautz, Tim, James J. Heckman, Ron Diris, Bas ter Weel, and Lex Borghans. (2014). "Fostering and Measuring Skills: Improving Cognitive and Non-Cognitive Skills to Promote Lifetime Success." Paris, France: Organisation of Economic Co-operation and Development. [Abstract]
Abstract
IQ tests and achievement tests do not adequately capture non-cognitive skills — personality traits, goals, character, motivations, and preferences that are valued in the labor market, in school, and in many other domains. For many outcomes, their predictive power rivals or exceeds that of cognitive skills. Skills are stable across situ- ations with different incentives, although manifestations of skills vary with incentives. Skills are not immutable over the life cycle; they have a genetic basis but are also shaped by environments, including families, schools, and peers. Skill development is a dynamic process. The early years are important in shaping all skills and in laying the foundations for successful investment and intervention in the later years. During the early years, both cognitive and noncognitive skills are highly malleable. During the adolescent years, non-cognitive skills are more malleable than cognitive skills. The differential plasticity of different skills by age has important implications for the design of effective policies.
This report reviews a variety of interventions targeted to different stages of the life cycle. We interpret all of the studies we examine within an economic model of skill development. Many effective programs work because they foster non-cognitive skills. Some have annual rates of return that are comparable to those from investments in the stock market. Parental involvement is an important component of successful early interventions just as successful adolescent mentoring is an age-appropriate version of parental involvement. The most successful adolescent remediation programs are not as effective as the most successful early childhood and elementary school programs. Building an early base of skills that promote later-life learning and engagement in school and society is often a better strategy than waiting for problems to occur. Prevention is more effective than remediation if at-risk populations are sufficiently well targeted.
Macurdy, Thomas, Jonathan Gibbs, Tim Kautz, Thomas Deleire, and Margaret O'Brian-Strain. (2009). "Geographic Variation in Drug Prices and Spending in the Part D Program." Baltimore: Centers for Medicare and Medicaid Services. [Abstract]
Abstract
This report investigates the extent to which there is regional variation in drug prices in Medicare's Part D Program. We construct regional price indices for different drug classifications. We examine different percentiles of the price distribution across regions, distinguishing between best available prices and typical prices for each region. There is little evidence of regional price variation in either best available or typical prices. There is some evidence of regional variation in average per-capita drug expenditures. This variation is driven by the most intensive users. Differences in population composition explain more than one-third of the difference in per-capita drug expenditures across regions.
WORKS IN PROGRESS
Lira, Benjamin, Maria Bartlett, Tim Kautz, and Angela L. Duckworth. "Remote Schooling Depresses Course Grades for the Most Vulnerable Students." Under review.