Welcome to the Statistical Genetics and Genomics Laboratory, headed by Prof. Hongzhe Li. Our lab is within the Department of Biostatistics, Epidemiology and Informatics (DBEI) at the University of Pennsylvania Perelman School of Medicine and is conducting both methodological and collaborative research in the area of statistical genetics/genomics and metagenomics, with the goal of understanding the genetic and genomic bases of complex biological systems, including initiation and development of complex human diseases. Our research has been continuously supported by NIH R01 grants since 1998.

The mere formulation of a problem is far more often essential than its solution, which may be merely a matter of mathematical or experimental skill. To raise new questions, new possibilities, to regard old problems from a new angle requires creative imagination and marks real advances in science. -Albert Einstein

Working with Penn collaborators, we are currently developing methods for analysis of high-throughout genomic and metagenomics data. Our application areas include genome-wide association studies of neuroblastoma, integrative analysis of human heart failure and chronic kidney disease, and metagenomic data analysis of human gut microbiome. In the area of statistical genomics, our recent research has focused on developing statistical and computational methods for analysis of genetic pathways and networks, novel methods for analysis of eQTL data and methods for analysis of microbiome and metagenomics data. These collaborations have led to publications in Science, Nature, Nature Genetics, Nature Medicine, Developmental Cell, PNAS etc and have motivated many of our methodological research projects.

The focus of our methodological research is to formulate the problems in genetics and genomics as interesting statistcal problems and to develop novel statistical models and computational methods to solve these problems. We are particuarly interested in developing high dimensional statistical methods for analysis of genomic and metagenojics data. Our major methodological contributions include additive genetic gamma frailty models for genetic linkage analysis, sparse signal detection problems for copy number variants analysis, Hidden Markov random field models for network-based analysis of genomic data, methods for high dimensional regression analysia and methods for analysis of high dimensional compositional data. Our recent research focuses on transfer learning, domain adaptation, and causal inference, and their applications in integrative genomics data analysis. We have published statistical methodological and theoretical papers in JASA, JRSS-B, Biometrika, Annals of Applied Statistics, Annals of Statistics etc.

We are also interested in developing statistical and computational methods for big data, especially in health data sciences. We have developed methods for analysis of wearable device data, including physical activity data and continuous glucose monitoring data. Prof. Li is the Director of Center of Statistics in Biomedical Big Data.

Our lab is actively recruiting students in Biostatisics, Applied Mathematics and Computational Science and Genomics and Computational Biology to work on the following problems:

Artificial intelligence (AI), Machine learning, transfer learning, and high dimensional statistics
Causal inference, mediation analysis, and analysis of integrative genomics data
Optimal transport and analysis of object data, and analysis of single cell genomics