Amgen Scholars: Announcements of Opportunity
Below are Announcements of Opportunity posted by Caltech faculty for the Amgen Scholars program.
Announcements of Opportunity are posted as they are received. Please check back regularly for new AO submissions! Remember: This is just one way that you can go about identifying a suitable project and/or mentor. For additional tips on identifying a mentor click here.
Please remember:
- Students pursuing Amgen must be U.S. citizens, U.S. permanent residents, or students with DACA status.
- Students pursuing Amgen must complete the 10-week program from June 21 - August 25, 2023. Students must commit to these dates. No exceptions will be made.
- Accepted students must live in provided Caltech housing.
<< Prev
Record
53 of
59
Next >>
Back To List
Project: | Transfer Learning Leveraging Large-Scale Transcriptomics to Model Diseases With Limited Data | ||||||||
Disciplines: | Computer Science, Biology | ||||||||
Mentor: | Christina Theodoris, Assistant Professor , (BBE), christina.theodoris@gladstone.ucsf.edu | ||||||||
Mentor URL: | https://gladstone.org/people/christina-theodoris (opens in new window) | ||||||||
Background: |
NOTE: This project is being offered by a Caltech alum and is open only to Caltech students. The project will be conducted at Gladstone Institutes/University of California, San Francisco in San Francisco, California. Mapping the gene regulatory networks driving human disease enables the design of network-correcting treatments that target the core disease mechanism rather than merely managing symptoms. However, computationally inferring the network map requires large amounts of transcriptomic data to learn the connections between genes, which impedes network-correcting drug discovery in settings with limited data including rare disease and disease affecting clinically inaccessible tissues. Although data remains limited in these settings, recent advances in sequencing technologies have driven a rapid expansion in the amount of transcriptomic data available from human tissues more broadly. Recently, the concept of transfer learning has revolutionized fields such as natural language understanding and computer vision by leveraging deep learning models pretrained on large-scale general datasets that can then be fine-tuned towards a vast array of downstream tasks with limited application-specific data that would be too limited to yield meaningful predictions in isolation. To test whether an analogous approach could enable gene network predictions with limited data, we developed and pretrained our novel deep learning model, Geneformer, with a large-scale pretraining corpus we assembled with ~30 million human single cell transcriptomes, thereby generating an invaluable checkpoint from which fine-tuning towards a broad range of downstream applications could be pursued to accelerate discovery of key network regulators and candidate network-correcting therapies. Geneformer consistently boosted predictive accuracy in a diverse panel of downstream tasks using just a limited set of task-specific training examples. We will now leverage Geneformer’s learned understanding of contextual gene network dynamics to map the dysregulated gene network and discover candidate network-correcting therapeutics for hypertrophic cardiomyopathy, a prototypical rare disease affecting clinically inaccessible tissue where progress has been impeded by limited data. |
||||||||
Description: | The proposed SURF project will involve computational innovation of multi-task learning approaches to fine-tune the pretrained model to understand the contextual dysregulation driving disease in each of the affected cell types within the heart. Interpretation of the resulting gene embeddings and attention weights in the disease model will reveal the network rewiring that occurs in hypertrophic cardiomyopathy and accelerate the discovery of a much-needed targeted therapeutic for this life-threatening progressive disease. | ||||||||
Student Requirements: |
Required background/skills: machine learning, Python, Bash Suggested background/skills: deep learning, single cell transcriptomics, pytorch, cluster computing, distributed GPU training |
||||||||
Programs: |
This AO can be done under the following programs:
|
<< Prev Record 53 of 59 Next >> Back To List