Home  /  Undergraduate Research  /  Programs  /  Amgen Scholars  /  Announcements of Opportunity

Amgen Scholars: Announcements of Opportunity

Below are Announcements of Opportunity posted by Caltech faculty for the Amgen Scholars program.

Announcements of Opportunity are posted as they are received. Please check back regularly for new AO submissions! Remember: This is just one way that you can go about identifying a suitable project and/or mentor. For additional tips on identifying a mentor click here.

Please remember:

  • Students pursuing Amgen must be U.S. citizens, U.S. permanent residents, or students with DACA status.
  • Students pursuing Amgen must complete the 10-week program from June 21 - August 25, 2023. Students must commit to these dates. No exceptions will be made.
  • Accepted students must live in provided Caltech housing.

<< Prev    Record 22 of 59    Next >>           Back To List

Project:  Comparing gene variant detection methods - a Data Science Approach
Disciplines:  Data Science, Biology
Mentor:  Ashish Mahabal, Lead Computational Scientist, (PMA), aam@astro.caltech.edu, Phone: 6263954201
Mentor URL:  http://www.astro.caltech.edu/~aam  (opens in new window)
Background:  We are exploring data science methods to investigate the efficacy of different methods when it comes to detecting variations in DNA portions, both on and off target. For the base study, mixture schemes consisting of DNA and cell based control samples have been designed and sent to several labs to assess differences in results from the various participants. These control samples were validated with digital droplet polymerase chain reaction (ddPCR) and next-generation sequencing (NGS) to evaluate the accuracy of the study participants' reporting on the size, sequence, and frequency of the DNA variants.We would like to go beyond the primary results provided by the methods to determine the limits of these methods in terms of base-pair detections, number of variants, and samples etc.
Description:  There are several DNA detection technologies that are used for a variety of applications, including forensics, disease diagnosis, and gene expression analysis. Some of the most common DNA detection technologies include: (1) Polymerase chain reaction (PCR): This sensitive method allows for the amplification of specific DNA sequences, (2) NGS allows for the detection of specific DNA sequences, (3) DNA microarrays can be used to detect the presence or absence of specific DNA sequences, (4) Capillary electrophoresis separates DNA fragments based on size, (5) Fluorescence in situ hybridization (FISH) involves the use of fluorescently labeled probes to detect specific DNA sequences within cells, (6) Mass spectrometry allows for the identification and quantification of DNA molecules based on their mass.
A method to consider is fitting a mixture of probability distribution to genome coverage profiles. From this can be computed the Genome Dataset Validity (GDV) score. The GDV scores can then be regressed to understand the validity of the presence and quantity of variants. Different models can be used in this connection: Convolutional Neural Networks (CNNs), Long Short-Term Memory Networks (LSTMs), Ridge Linear Classifiers etc.
References:  DNA detection methods mentioned above
Ridge Classifier: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeClassifier.html
Student Requirements:  Proficiency in python, jupyter notebooks (Google Colab), and git. Conversant with basics of machine learning and statistics, knowledge about linux/unix. Basic biology knowledge will be a plus. Knowledge about deep learning, GPUs, Mongo DB and AWS will also be a bonus.
Programs:  This AO can be done under the following programs:

  Program    Available To
       SURF    Caltech students only 

Click on a program name for program info and application requirements.

<< Prev    Record 22 of 59    Next >>           Back To List