Announcements of Opportunity

SURF: Announcements of Opportunity
Below are Announcements of Opportunity posted by Caltech faculty and JPL technical staff for the SURF program. Each AO indicates whether or not it is open to non-Caltech students. If an AO is NOT open to non-Caltech students, please DO NOT contact the mentor. Announcements of Opportunity are posted as they are received. Please check back regularly for new AO submissions!
Remember: This is just one way that you can go about identifying a suitable project and/or mentor. Click here for more tips on finding a mentor. Announcements for external summer programs are listed here.
*Students applying for JPL projects should complete a SURF@JPL application instead of a "regular" SURF application.
*Students pursuing opportunities at JPL must be U.S. citizens or U.S. permanent residents.
<< Prev
Record
22 of
59
Next >>
Back To List
Project: | Comparing gene variant detection methods - a Data Science Approach | ||||||||
Disciplines: | Data Science, Biology | ||||||||
Mentor: |
Ashish Mahabal,
Lead Computational Scientist, (PMA),
aam@astro.caltech.edu, |
||||||||
Mentor URL: | http://www.astro.caltech.edu/~aam (opens in new window) | ||||||||
Background: | We are exploring data science methods to investigate the efficacy of different methods when it comes to detecting variations in DNA portions, both on and off target. For the base study, mixture schemes consisting of DNA and cell based control samples have been designed and sent to several labs to assess differences in results from the various participants. These control samples were validated with digital droplet polymerase chain reaction (ddPCR) and next-generation sequencing (NGS) to evaluate the accuracy of the study participants' reporting on the size, sequence, and frequency of the DNA variants.We would like to go beyond the primary results provided by the methods to determine the limits of these methods in terms of base-pair detections, number of variants, and samples etc. | ||||||||
Description: |
There are several DNA detection technologies that are used for a variety of applications, including forensics, disease diagnosis, and gene expression analysis. Some of the most common DNA detection technologies include: (1) Polymerase chain reaction (PCR): This sensitive method allows for the amplification of specific DNA sequences, (2) NGS allows for the detection of specific DNA sequences, (3) DNA microarrays can be used to detect the presence or absence of specific DNA sequences, (4) Capillary electrophoresis separates DNA fragments based on size, (5) Fluorescence in situ hybridization (FISH) involves the use of fluorescently labeled probes to detect specific DNA sequences within cells, (6) Mass spectrometry allows for the identification and quantification of DNA molecules based on their mass. A method to consider is fitting a mixture of probability distribution to genome coverage profiles. From this can be computed the Genome Dataset Validity (GDV) score. The GDV scores can then be regressed to understand the validity of the presence and quantity of variants. Different models can be used in this connection: Convolutional Neural Networks (CNNs), Long Short-Term Memory Networks (LSTMs), Ridge Linear Classifiers etc. |
||||||||
References: |
DNA detection methods mentioned above Ridge Classifier: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeClassifier.html |
||||||||
Student Requirements: | Proficiency in python, jupyter notebooks (Google Colab), and git. Conversant with basics of machine learning and statistics, knowledge about linux/unix. Basic biology knowledge will be a plus. Knowledge about deep learning, GPUs, Mongo DB and AWS will also be a bonus. | ||||||||
Programs: |
This AO can be done under the following programs:
|
<< Prev Record 22 of 59 Next >> Back To List