Student-Faculty Programs Office
Summer 2024 Announcements of Opportunity


<< Prev    Record 4 of 63    Next >>           Back To List

Project:  Using machine learning to understand the fundamentals of glycosaminoglycan-protein interactions
Disciplines:  Data Science, Biochemistry, Biology, Chemistry, Computer Science
Mentor:  Linda Hsieh-Wilson, Milton and Rosalind Chang Professor of Chemistry, (CCE), lhw@caltech.edu, Phone: 626-395-6101
Mentor URL:  http://hsiehwilsonlab.caltech.edu/meet-linda.html  (opens in new window)
AO Contact:  Hailan Yu, hailanyu@caltech.edu
Background:  Glycosaminoglycans (GAGs) are linear sugar polymers consisting of repeating disaccharide (two sugar monomers) units anchored to the cell surface. Being able to interact with more than 3000 proteins, GAGs contribute to a variety of biological processes such as embryonic development, cancer metastasis, and pathogenic infections. What enables GAGs to bind to proteins with drastically different binding sites is their immense structural diversity. However, this diversity also makes it extremely challenging to isolate structurally defined GAGs in appreciable quantities from natural sources or to synthesize GAGs in the lab. Therefore, a systematic understanding of the structure-activity relationship between GAGs and GAG-binding proteins has not yet been achieved, despite its high biological relevance and therapeutic potential. In an effort to address this lack of understanding, our lab has synthesized a comprehensive library of 64 heparan sulfate (one type of GAGs) tetrasaccharides, encompassing all commonly found modification patterns of natural heparan sulfate tetrasaccharides. We have also collected binding affinity data of various fibroblast growth factors (a class of mitogens indispensable to development and homeostasis) to each of those 64 compounds.
Description:  We aim to use and develop machine learning algorithms to uncover fundamental rules of how GAGs recognize proteins and to provide the field with new workflows on collecting and analyzing GAG-protein interaction data. In this project, the student will be a) refining the codes of existing algorithms the lab has been using to analyze protein binding data, b) assisting in and developing code for converting the analyzed data into clear visual outputs, and c) developing new algorithms for mining the data of sequence information.
References:  Previous work on synthesizing the 64 compounds library and collecting protein binding data to the 64 compounds: https://www.nature.com/articles/s41557-023-01248-4
Student Requirements:  Experience and interest in applying machine learning to biological data sets is strongly recommended (languages include python and R). Interest in biochemistry and biology. A background in biology and chemistry would be a plus.
Programs:  This AO can be done under the following programs:

  Program    Available To
       SURF    Caltech students only 

Click on a program name for program info and application requirements.



<< Prev    Record 4 of 63    Next >>           Back To List
 

Problems with or questions about submitting an AO?  Call Alexandra Katsas of the Student-Faculty Programs Office at (626) 395-2885.
 
About This Site