Research Projects

My method reserach has been mostly motivated by problems in genetics, genoimics and metagenomics. I have worked on a variety of problems in statistical genetics and genomics, including methods for family-based genetic linkage and association analysis, methods for admixture mapping, methods for genome-wide association analysis, methods for analysis of microarray time course gene expression data, high dimensional regression analysis for genomic data, methods for copy number variation analysis and methods for analysis of next generation sequence data. I have published both statistical methodological research in top statistics/biostatistics journals (JASA, AOS, AOAS, Biometrika, Biometrics, Biostatistics etc ) and in top genetics journals (AJHG, Plos Genetics, etc) and collaborative research in top scientific journals ( Science, NEJM, Nature, Nature Genetics, Nature Methods, PNAS, Developmental Cell, Cancer Cell etc).


My research covers several broad areas in statistics genetics,genomics and metagenomics: methods for mapping genes for complex diseases, methods for integrative analysis of genomic data and methods for microbiome and metagenomics. From 1998 to 2012, I was supported by an NIH R01 grant (ES009911) to develoop methods for mapping genes for complex diseases, including methods for linkage, association, haplotype and admixture mapping. My method research on genomic data analysis was supported by a NIH R01 grant, focusing on developing statistical and computational methods for analysis of genomic data with graphical structure, and another R01 grant focusing on analysis of next generation sequencing data. I was also the PI of a T-32 traininng grant "Training in Ophthalmic Statistical Genetics and Bioinformatics" to train investigators in the area of statistical genetics for ophthalmic genetics.

My current research focuses on
(1) statistical and computational issues related to analysis of microbiome and metagenomics data.
(2) methods for integrative analysis of genomics data, especially in joint analysis of GWAS, eQTL data and single cell RNA-seq data.
(3) high dimensional statistics and inference.
(4) causal inference and high dimensional mediation analysis.
(5) transfer learning and domain generalization.
(6) Optimal transport and analysis of non-Euclidean object data.

I also collaborate with Penn investigators on genetic and genomics studies of complex diseases and biological systems. My most recent collaborations are in the area of (1) microbiome and its association with diseases such as IBD, obesity and cancer; (2) multiomics studies of chronic kidney disease risk and progression; (3) precision nutrition.

I was/am PI of the following NIH grants:

NIH R01-ES009911, 1998-2001: Survival Models for Mapping Genes for Complex Diseases.
NIH R01-ES009911, 2002-2006: Survival Models for Mapping Genes for Complex Diseases.
NIH R01-ES009911, 2007-2012: Survival Analysis Methods in Genetic Studies.
NIH P01-AG025532, 2007-2008: A Mitochondrial Longevity Pathway: P66Shc Mechanisms -Biostatistics Core.
NIH R01-CA127334, 2007-2012: Methods for Genomic Data with Graphical Structures.
NIH R01-CA127334, 2012-2017: Methods for Genomic Data with Graphical Structures.
NIH T32-EY021451, 2011-2016: Training in Ophthalmic Statistical Genetics and Bioinformatics.
NIH R01-GM097505, 2012-2016: Statistical Methods for Next-Generation Sequence Data.
NIH R01-GM123056, 2017-2021: Statistical Methods for Microbiome and Metagenomics.
NIH R01-GM129781, 2018-2022: Methods for Integrative Genomic Data Analysis.
NIH R01-GM123056, 2022-2026: Statistical Methods for Microbiome and Metagenomics.
NIH P30-DK050306, 2022-2027: Center for Molecular Studies in Digestive and Liver Diseases - BDSC Core.
NIH R01-GM129781, 2023-2027: Methods for Integrative Genomic Data Analysis.