I am an associate professor of statistics at the School of Computational Sciences. My research interests are, algorithmic model building for large data, epidemiology and medical data analysis, time series analysis, longitudinal data analysis, case-control studies with complex sampling designs, statistical genetics, survey data analysis, and nonparametric methods.
I and my coauthors have developed a new family-based genetic multimarker test that simultaneously tests several genes for association with a disease, an optimal linear combination family-based test that optimally combines information from multiple genes into a single degree of freedom test, a novel case-control test for association that empirically estimates kinship between subjects and incorporates this information into a proper adjustment of the test statistic that eliminates the inflation of the type I error rates.
I and my coauthor have developed a novel statistical approach for improving statistical efficiency of counter-matched case-control data in second stage analyses. These designs have efficiency advantages over simple random sampling in first stage analysis when counter-matching is done with respect to a covariate correlated with the analyzed variable of interest but lose efficiency when second stage analyses are performed with respect to variables uncorrelated with the counter-matching variable. Our method uses a modified partial likelihood achieves optimal efficiency in both stages of analyses and gains over 80% statistical efficiency over existing approaches.
I have proposed and implemented a novel mixed effect modeling approach applicable to a complex but commonly occurring experimental data setting, repeated measurements recorded pre and post treatment or repeated measurements longitudinal data. This method properly models the intricate correlation structure induced by repeated measurement recorded on the same subjects over time.
I and my coauthors have developed and implemented one of the first comprehensive approaches for analysis of large complexly sampled survey data that ensures that the effect sizes are properly adjusted for all important confounders. We analyzed HINT III, a nationwide, weighted, cluster sampled survey data with 50 jackknife resamples using automatic linear regression model selection approach. We found new predictors of the efficiency of information seeking in the US such as numeracy, education, general health, health care satisfaction, health information dissatisfaction, and psychological distress.