+ Welcome!

mail_outline Email: melody.huang@yale.edu

Twitter: @melodyyhuang

I'm currently an Assistant Professor of Political Science and Statistics & Data Science at Yale. My research broadly focuses on developing robust statistical methods to credibly estimate causal effects under real-world complications.

Before this, I was a Postdoctoral Fellow at Harvard, working with Kosuke Imai. I received my Ph.D. in Statistics at the Unversity of California, Berkeley, where I was fortunate to be advised by Erin Hartman.

Recent News

  • [Apr. 2024]
My paper on a sensitivity framework for considering overlap violations in external validity is now available on ArXiv (link).
  • [Mar. 2024]
Our new working paper on evaluating the impact of AI-assisted decision-making systems is now available on ArXiv (link).
  • [Feb. 2024]
My paper on sensitivity analysis for generalizability is forthcoming in the Journal of the Royal Statistical Society: Series A!
  • [Jul. 2023]
My paper with Dan Soriano and Sam Pimentel on design sensitivity for weighted observational studies is now available on ArXiv (link).
  • [Mar. 2023]
My paper with Erin Hartman on sensitivity analysis for survey weighting will be appearing in Political Analysis!

+ Research


Variance-based sensitivity analysis for weighted estimators result in more informative bounds
with Sam Pimentel


Weighting methods are popular tools for estimating causal effects; assessing their robustness under unobserved confounding is important in practice. In the following paper, we introduce a new set of sensitivity models called "variance-based sensitivity models." Variance-based sensitivity models characterize the bias from omitting a confounder by bounding the distributional differences that arise in the weights from omitting a confounder, with several notable innovations over existing approaches. First, the variance-based sensitivity models can be parameterized with respect to a simple R^2 parameter that is both standardized and bounded. We introduce a formal benchmarking procedure that allows researchers to use observed covariates to reason about plausible parameter values in an interpretable and transparent way. Second, we show that researchers can estimate valid confidence intervals under a set of variance-based sensitivity models, and provide extensions for researchers to incorporate their substantive knowledge about the confounder to help tighten the intervals. Last, we highlight the connection between our proposed approach and existing sensitivity analyses, and demonstrate both, empirically and theoretically, that variance-based sensitivity models can provide improvements on both the stability and tightness of the estimated confidence intervals over existing methods. We illustrate our proposed approach on a study examining blood mercury levels using the National Health and Nutrition Examination Survey (NHANES).

Design sensitivity and its implication on weighted observational studies
with Dan Soriano and Sam Pimentel


Sensitivity to unmeasured confounding is not typically a primary consideration in designing treated-control comparisons in observational studies. We introduce a framework allowing researchers to optimize robustness to omitted variable bias at the design stage using a measure called design sensitivity. Design sensitivity, which describes the asymptotic power of a sensitivity analysis, allows transparent assessment of the impact of different estimation strategies on sensitivity. We apply this general framework to two commonly-used sensitivity models, the marginal sensitivity model and the variance-based sensitivity model. By comparing design sensitivities, we interrogate how key features of weighted designs, including choices about trimming of weights and model augmentation, impact robustness to unmeasured confounding, and how these impacts may differ for the two different sensitivity models. We illustrate the proposed framework on a study examining drivers of support for the 2016 Colombian peace agreement.

Does AI help humans make better decisions? A methodological framework for experimental evaluation
with Eli Ben-Michael, D. James Greiner, Kosuke Imai, Zhichao Jiang, and Sooahn Shin


The use of Artificial Intelligence (AI) based on data-driven algorithms has become ubiquitous in today's society. Yet, in many cases and especially when stakes are high, humans still make final decisions. The critical question, therefore, is whether AI helps humans make better decisions as compared to a human alone or AI an alone. We introduce a new methodological framework that can be used to answer experimentally this question with no additional assumptions. We measure a decision maker's ability to make correct decisions using standard classification metrics based on the baseline potential outcome. We consider a single-blinded experimental design, in which the provision of AI-generated recommendations is randomized across cases with a human making final decisions. Under this experimental design, we show how to compare the performance of three alternative decision-making systems--human-alone, human-with-AI, and AI-alone. We apply the proposed methodology to the data from our own randomized controlled trial of a pretrial risk assessment instrument. We find that AI recommendations do not improve the classification accuracy of a judge's decision to impose cash bail. Our analysis also shows that AI-alone decisions generally perform worse than human decisions with or without AI assistance. Finally, AI recommendations tend to impose cash bail on non-white arrestees more often than necessary when compared to white arrestees.

Overlap violations in external validity


Estimating externally valid causal effects is a foundational problem in the social and biomedical sciences. Generalizing or transporting causal estimates from an experimental sample to a target population of interest relies on an overlap assumption between the experimental sample and the target population--i.e., all units in the target population must have a non-zero probability of being included in the experiment. In practice, having full overlap between an experimental sample and a target population can be implausible. In the following paper, we introduce a framework for considering external validity in the presence of overlap violations. We introduce a novel bias decomposition that parameterizes the bias from an overlap violation into two components: (1) the proportion of units omitted, and (2) the degree to which omitting the units moderates the treatment effect. The bias decomposition offers an intuitive and straightforward approach to conducting sensitivity analysis to assess robustness to overlap violations. Furthermore, we introduce a suite of sensitivity tools in the form of summary measures and benchmarking, which help researchers consider the plausibility of the overlap violations. We apply the proposed framework on an experiment evaluating the impact of a cash transfer program in Northern Uganda.


Sensitivity analysis for the generalization of experimental results
Journal of the Royal Statistical Society: Series A (2024+)


Randomized controlled trials (RCT’s) allow researchers to estimate causal effects in an experimental sample with minimal identifying assumptions. However, to generalize or transport a causal effect from an RCT to a target population, researchers must adjust for a set of treatment effect moderators. In practice, it is impossible to know whether the set of moderators has been properly accounted for. In the following paper, I propose a two parameter sensitivity analysis for generalizing or transporting experimental results using weighted estimators. The contributions in the paper are three-fold. First, I show that the sensitivity parameters are scale-invariant and standardized, and introduce an estimation approach for researchers to simultaneously account for the bias in their estimates from omitting a moderator, as well as potential changes to their inference. Second, I propose several tools researchers can use to perform sensitivity analysis: (1) different numerical measures to summarize the uncertainty in an estimated effect to unobserved confounding; (2) graphical summary tools for researchers to visualize the sensitivity in their estimated effects, as the confounding strength of the omitted variable changes; and (3) a formal benchmarking approach for researchers to estimate potential sensitivity parameter values using existing data. Finally, I demonstrate that the proposed framework can be easily extended to the class of doubly robust, augmented weighted estimators. The sensitivity analysis framework is applied to a set of Jobs Training Program experiments.

Leveraging population outcomes to improve the generalization of experimental results
with Naoki Egami, Erin Hartman, and Luke Miratrix
Annals of Applied Statistics (2023)

Article Pre-Print

Randomized control trials are often considered the gold standard in causal inference due to their high internal validity. Despite its importance, generalizing experimental results to a target population is challenging in social and biomedical sciences. Recent papers clarify the assumptions necessary for generalization and develop various weighting estimators for the population average treatment effect (PATE). However, in practice, many of these methods result in large variance and little statistical power, thereby limiting the value of the PATE inference. In this article, we propose post-residualized weighting, in which information about the outcome measured in the observational population data is used to improve the efficiency of many existing popular methods without making additional assumptions. We empirically demonstrate the efficiency gains through simulations and apply our proposed method to a set of jobs training program experiments.

Improving precision in the design and analysis of experiments with non-compliance
with Erin Hartman
Political Science Research and Methods (2023)

Article Code

Even in the best-designed experiment, noncompliance with treatment assignment can complicate analysis. Under one-way noncompliance, researchers typically rely on an instrumental variables approach, under an exclusion restriction assumption, to identify the complier average causal effect (CACE). This approach suffers from high variance, particularly when the experiment has a low compliance rate. The following paper suggests blocking designs that can help overcome precision losses in the face of high rates of noncompliance in experiments when a placebo-controlled design is infeasible. We also introduce the principal ignorability assumption and a class of principal score weighted estimators, which are largely absent from the experimental political science literature. We then introduce the ''block principal ignorability'' assumption which, when combined with a blocking design, suggests a simple difference-in-means estimator for estimating the CACE. We show that blocking can improve precision of both IV and principal score weighting approaches, and further show that our simple, design-based solution has superior performance to both principal score weighting and instrumental variables under blocking. Finally, in a re-evaluation of the Gerber, Green, and Nickerson (2003) study, we find that blocked, principal ignorability approaches to estimation of the CACE, including our blocked difference-in-means and principal score weighting estimators, result in confidence intervals roughly half the size of traditional instrumental variable approaches.

Sensitivity Analysis for Survey Weighting
with Erin Hartman
Political Analysis (2023)

Article Pre-Print Code

Survey weighting allows researchers to account for bias in survey samples, due to unit nonresponse or convenience sampling, using measured demographic covariates. Unfortunately, in practice, it is impossible to know whether the estimated survey weights are sufficient to alleviate concerns about bias due to unobserved confounders or incorrect functional forms used in weighting. In the following paper, we propose two sensitivity analyses for the exclusion of important covariates: (1) a sensitivity analysis for partially observed confounders (i.e., variables measured across the survey sample, but not the target population) and (2) a sensitivity analysis for fully unobserved confounders (i.e., variables not measured in either the survey or the target population). We provide graphical and numerical summaries of the potential bias that arises from such confounders, and introduce a benchmarking approach that allows researchers to quantitatively reason about the sensitivity of their results. We demonstrate our proposed sensitivity analyses using state-level 2020 U.S. Presidential Election polls.

Higher Moments for Optimal Balance Weighting in Causal Estimation
with Brian Vegetabile, Lane Burgette, Claude Setodji, and Beth Ann Griffin
Epidemiology (2022)


We expand upon a simulation study that compared three promising methods for estimating weights for assessing the average treatment effect on the treated for binary treatments: generalized boosted models, covariate-balancing propensity scores, and entropy balance. The original study showed that generalized boosted models can outperform covariate-balancing propensity scores, and entropy balance when there are likely to be non-linear associations in both the treatment assignment and outcome models and when the other two models are fine-tuned to obtain balance only on first-order moments. We explore the potential benefit of using higher-order moments in the balancing conditions for covariate-balancing propensity scores and entropy balance. Our findings showcase that these two models should, by default, include higher order moments and focusing only on first moments can result in substantial bias in estimated treatment effect estimates from both models that could be avoided using higher moments.

+ Teaching

Yale University

  • PLSC 500: Foundations of Statistical Inference (Fall 2024)

University of California, Berkeley (Graduate Student Instructor)

  • STAT 232: Experimental Design (Spring 2023)
  • POLI SCI 236B: Quantitative Methodology in the Social Sciences (Spring 2022)

University of California, Los Angeles (Teaching Assistant)

  • STAT 100C: Linear Models (Spring 2019)
  • ECON 412: Fundamentals of Big Data (Spring 2019)