skip to main content
Home  /  Undergraduate Research  /  Programs  /  Amgen Scholars  /  Announcements of Opportunity

Amgen Scholars: Announcements of Opportunity

Below are Announcements of Opportunity posted by Caltech faculty for the Amgen Scholars program.

Announcements of Opportunity are posted as they are received. Please check back regularly for new AO submissions! Remember: This is just one way that you can go about identifying a suitable project and/or mentor. For additional tips on identifying a mentor click here.

Please remember:

  • Students pursuing Amgen must be U.S. citizens, U.S. permanent residents, or students with DACA status.
  • Students pursuing Amgen must complete the 10-week program from June 18 - August 23, 2024. Students must commit to these dates. No exceptions will be made.
  • Accepted students must live in provided Caltech housing.


<< Prev    Record 52 of 63    Next >>           Back To List


Project:  Using Machine Learning to explore the role of climate change, air pollution, and genetics in the aging of people with HIV.
Disciplines:  Computer Science, In any field, it's essential to excel in programming, geocode and machine learning while having an interest in genetic data.
Mentor:  R. Michael Alvarez, Professor, (HSS), rma@caltech.edu
AO Contact:  Dr. Cong Cao, congc@caltech.edu
Background:  People living with HIV (PLWH) exhibit accelerated aging compared to age-matched individuals living without HIV (PLWoH). Several studies have attempted to explore the role of DNA methylation age and environmental influences in environmental epidemiology. However, to the best of our knowledge, no studies have attempted to identify key genetic and environmental changes associated with aging in PLWH. Genes and environment are not separate factors. Environmental exposures include, but are not limited to, meteorological factors caused by climate change, as well as air pollutants. The single-factor effects of air pollution, meteorological factors, and genes, as well as their interactions, work together to influence the health consequences of aging. Many studies have confirmed that heat waves can lead to increased mortality and have differential health consequences. However, other meteorological factors caused by climate change are also important for research on aging in PLWH. For example, in addition to heat, global warming will lead to reduced relative humidity, increased air pressure, extreme precipitation, changes in global wind speeds, and shortwave radiation changes. A 2020 report by the Lancet proposed that as a group, the elderly is vulnerable to climate change due to poor adaptability to extremely high temperatures. Many studies have demonstrated a correlation between extreme heat and mortality in the elderly. Climate change is already causing extreme weather factors such as high temperatures, which will cause the elderly population to face higher health risks than the general population. This research should lend valuable insights into epigenetic-by-environment interactions and help to provide effective policy recommendations. Effective public policy relies on a more comprehensive exploration of this cause-and-effect relationship. Integrating advances in genetics into public policy research is one of our goals, as many previous policy interventions have not been able to incorporate epigenetic associations effectively. Our research will include the meteorological and air pollution factors, with an emphasis on the causal influence of the interaction of environmental and genetic factors on biological aging, as measured by epigenetics, with the ultimate goal of leading to public policy recommendations.
Existing research on genetic variation and economics is limited by the problem of small sample sizes (dozens or even fewer) and too many variables (tens of thousands), resulting in high-dimensional small sample datasets. Since collecting new samples is costly and time-consuming, re-running experiments and obtaining new gene expression data is impractical. For such data, traditional statistical methods often inevitably produce overfitting or increase computational complexity; however, machine learning has the potential to address these problems. Machine learning is novel as it uses artificial intelligence to develop algorithms and statistical models capable of learning from and making predictions or decisions based on data. It is important for several reasons: 1. Data-driven insights: Enables us to extract valuable insights and patterns from large and complex datasets—such as the MWCCS—that would be challenging or impossible to uncover through traditional statistical methods; 2. Automation and efficiency: It can automate tasks and processes that would otherwise require significant human effort and time, leading to increased efficiency, productivity, and cost savings; and 3. Predictive capabilities: Makes accurate predictions and forecasts about future events or outcomes.
Compared with other types of data, the advantage of MWCCS-based data is that it contains well-characterized comprehensive biologic, socioeconomic longitudinal data. The variation and volume of this dataset are ideal for machine learning. Another advantage of MWCCS-based data is that it is geographically informed and therefore can be easily linked to high spatial-resolution environmental data. We will correlate existing genome-wide association study (GWAS) and multi-omics data using the new MWCCS clinical registry data, as well as combine air pollution data and meteorological data with high temporal frequency and high spatial resolution from publicly available sources [i.e., the United States Environmental Protection Agency and the Gridded Surface Meteorological (gridMET) dataset. Data that need to be considered in the future include socioeconomic variables, such as cognition, sleep, nutrition, diet, exercise, etc.
The Specific Aims are:
1) Integrate genetic and environmental data to explore key genetic and environmental factors in the aging of PLWH; with an emphasis is the interaction between these factors.
2) Explore the causal effects of weather variables other than heat waves on aging;
3) Use advanced machine learning techniques to examine causal links in environmental and epigenetic factors on aging.
4) While emphasizing causal effects, explore correlations and predictions of epigenetic aging based on machine learning.
5) Understand the impact of climate change on the aging of PLWH.
6) Generate effective policy recommendations to address the role of climate change in aging of PLWH.
Description:  1, Merge genetic and environmental data from diverse sources, employing machine learning algorithms to fill in missing values. Innovate existing methods to suit biomarker data settings.
2, Create new features by combining existing ones, generating lag variables, and developing novel features.
3, Assign varying weights to features across dimensions, seeking the best features subset or model selection.
4, Use advanced causal machine learning methods to investigate the causal relationship between environmental, epigenetic factors, and aging. Compare findings with the result from the selected model in step (3).
5, Analyze the correlation between epigenetic aging and predict aging using machine learning. Evaluate the algorithms' predictive performance.
References:  Conn, D., Ngun, T., Li, G., & Ramirez, C. M. (2019). Fuzzy forests: Extending random forest feature selection for correlated, high-dimensional data. Journal of Statistical Software, 91(9). https://doi.org/10.18637/jss.v091.i09
Student Requirements:  In any field, it's essential to excel in programming, geocode and machine learning while having a keen interest in genetic data.
Programs:  This AO can be done under the following programs:

  Program    Available To
       SURF    Caltech students only 

Click on a program name for program info and application requirements.



<< Prev    Record 52 of 63    Next >>           Back To List