SURF: Announcements of Opportunity
Below are Announcements of Opportunity posted by Caltech faculty and JPL technical staff for the SURF program. Each AO indicates whether or not it is open to non-Caltech students. If an AO is NOT open to non-Caltech students, please DO NOT contact the mentor. Announcements of Opportunity are posted as they are received. Please check back regularly for new AO submissions!
Remember: This is just one way that you can go about identifying a suitable project and/or mentor. Click here for more tips on finding a mentor. Announcements for external summer programs are listed here.
*Students applying for JPL projects should complete a SURF@JPL application instead of a "regular" SURF application.
*Students pursuing opportunities at JPL must be U.S. citizens or U.S. permanent residents.
|Project:||Transfer Learning Leveraging Large-Scale Transcriptomics to Model Diseases With Limited Data|
|Disciplines:||Computer Science, Biology|
|Mentor:||Christina Theodoris, Assistant Professor , (BBE), email@example.com|
|Mentor URL:||https://gladstone.org/people/christina-theodoris (opens in new window)|
NOTE: This project is being offered by a Caltech alum and is open only to Caltech students. The project will be conducted at Gladstone Institutes/University of California, San Francisco in San Francisco, California.
Mapping the gene regulatory networks driving human disease enables the design of network-correcting treatments that target the core disease mechanism rather than merely managing symptoms. However, computationally inferring the network map requires large amounts of transcriptomic data to learn the connections between genes, which impedes network-correcting drug discovery in settings with limited data including rare disease and disease affecting clinically inaccessible tissues. Although data remains limited in these settings, recent advances in sequencing technologies have driven a rapid expansion in the amount of transcriptomic data available from human tissues more broadly. Recently, the concept of transfer learning has revolutionized fields such as natural language understanding and computer vision by leveraging deep learning models pretrained on large-scale general datasets that can then be fine-tuned towards a vast array of downstream tasks with limited application-specific data that would be too limited to yield meaningful predictions in isolation. To test whether an analogous approach could enable gene network predictions with limited data, we developed and pretrained our novel deep learning model, Geneformer, with a large-scale pretraining corpus we assembled with ~30 million human single cell transcriptomes, thereby generating an invaluable checkpoint from which fine-tuning towards a broad range of downstream applications could be pursued to accelerate discovery of key network regulators and candidate network-correcting therapies. Geneformer consistently boosted predictive accuracy in a diverse panel of downstream tasks using just a limited set of task-specific training examples. We will now leverage Geneformer’s learned understanding of contextual gene network dynamics to map the dysregulated gene network and discover candidate network-correcting therapeutics for hypertrophic cardiomyopathy, a prototypical rare disease affecting clinically inaccessible tissue where progress has been impeded by limited data.
|Description:||The proposed SURF project will involve computational innovation of multi-task learning approaches to fine-tune the pretrained model to understand the contextual dysregulation driving disease in each of the affected cell types within the heart. Interpretation of the resulting gene embeddings and attention weights in the disease model will reveal the network rewiring that occurs in hypertrophic cardiomyopathy and accelerate the discovery of a much-needed targeted therapeutic for this life-threatening progressive disease.|
Required background/skills: machine learning, Python, Bash
Suggested background/skills: deep learning, single cell transcriptomics, pytorch, cluster computing, distributed GPU training
This AO can be done under the following programs:
<< Prev Record 50 of 56 Next >> Back To List