This is a selection of my recent research projects.

Ph.D. Dissertation: Modeling Preferential Recruitment for Respondent-Driven Sampling

Advisor: Mark S. Handcock

Keywords: social network analysis; survey sampling methodology; network sampling; respondent-driven sampling; rational-choice models; Markov Chain Monte Carlo (MCMC)

Abstract: Respondent-driven sampling (RDS) is a network sampling methodology used worldwide to sample key populations at high risk for HIV/AIDS who are not typically reachable by conventional sampling techniques. In RDS, study participants recruit members of their social network to enroll, resulting in a sampling mechanism that is unknown to researchers. Current estimators for RDS data require many assumptions about the sampling process. A common assumption is that recruiters choose people from their network uniformly at random to participate in the study. However, in practice people likely recruit preferentially based on covariates such as age, race, or frequency of interaction.

I develop a sequential two-sided rational-choice framework to model preferential recruitment. At each wave of recruitment, each recruiter has a utility for selecting each peer, and simultaneously each peer has a utility for being recruited by each recruiter. People in the network make choices that maximize their utility. I model the unobserved utilities as functions of observable nodal or dyadic covariates plus unobserved heterogeneities. I develop inference for this model within a Bayesian framework by approximating the posterior distribution of the preference coefficients via Markov chain Monte Carlo (MCMC). The algorithm is a form of constrained Metropolis-Hastings. My framework results in a tractable generative model for the RDS sampling mechanism. This greatly enhances both design-based and model-based inference.

Other Respondent-Driven Sampling Projects

I have also been involved with a variety of projects in the area of respondent-driven sampling (RDS), including:

Undergraduate Honors Thesis: Workload Estimates for Risk-Limiting Audits of Large Contests

Advisor: Philip B. Stark

Abstract: We compare the expected number of ballots that must be counted by hand for two risk-limiting auditing methods, Canvass Audits by Sampling and Testing (CAST) and Kaplan-Markov (KM). The methods use different sampling designs to select batches of ballots to count by hand and different test statistics to decide when the audit can stop. The comparisons are based on the 2008 U.S. House of Representatives contests in California. The comparisons include hypothetical errors in the precinct vote totals, but errors are assumed to be small enough that the electoral outcomes are still correct. KM requires auditing fewer ballots than CAST. The workload for CAST can be reduced modestly by optimizing the number of precincts drawn from each county. Stratification by county is necessary for the practical implementation of risk-limiting audit methods in cross-jurisdictional contests. Workload can be reduced substantially, for both KM and CAST, by tallying ballots in batches smaller than precincts: Workload is roughly proportional to the average size of the batches. We discuss several methods to reduce batch sizes using current vote tabulation systems.

Other Projects and Work Experience

Energy Information Administration, U.S. Department of Energy
Child Health and Development Studies, Public Health Institute
zEconomy
California State University, San Marcos