DRAFT
Home  /  Undergraduate Research  /  Programs  /  Amgen Scholars  /  Announcements of Opportunity

Amgen Scholars: Announcements of Opportunity

Below are Announcements of Opportunity posted by Caltech faculty for the Amgen Scholars program.

Announcements of Opportunity are posted as they are received. Please check back regularly for new AO submissions! Remember: This is just one way that you can go about identifying a suitable project and/or mentor. For additional tips on identifying a mentor click here.

Please remember:

  • Students pursuing Amgen must be U.S. citizens, U.S. permanent residents, or students with DACA status.
  • Students pursuing Amgen must complete the 10-week program from June 21 - August 25, 2023. Students must commit to these dates. No exceptions will be made.
  • Accepted students must live in provided Caltech housing.


<< Prev    Record 25 of 59    Next >>           Back To List


Project:  Using Machine Learning to extract novel information from large and complex astronomical datasets
Disciplines:  Astrophysics, Applied and Computational Mathematics
Mentor:  Dalya Baron, Carnegie-Princeton postdoctoral fellow, (PMA), dalyabaron@gmail.com, Phone: 6264836407
Mentor URL:  https://www.dalyabaron.com/  (opens in new window)
Background:  Astronomy is experiencing rapid growth in data size and complexity. This change fosters the development of data-driven science (i.e. “data science”) as a useful companion to the common model-driven data analysis paradigm, where astronomers develop automatic tools to mine datasets and extract novel information from them. In recent years, machine learning algorithms have become increasingly popular among astronomers, and are now used for a wide variety of tasks. Unsupervised learning algorithms, that are used to perform cluster analysis, dimensionality reduction, visualization, and outlier detection, are of particular importance to scientific research, since they can be used to extract new knowledge from existing datasets, and can facilitate new discoveries.
Description:  All the projects listed below are at the intersection of Data Science, Machine Learning, and Astrophysics. In these projects, the student will apply Unsupervised Machine Learning algorithms to different astronomical datasets, obtained using the most advanced telescopes or part of the largest ongoing surveys in the field. The student will learn about and use dimensionality reduction algorithms such as tSNE, UMAP, and the Sequencer, to detect simple, yet unknown, structures in the dataset. The detection of novel relations in the data can then be used to place constraints on the physical processes that govern the observed astronomical objects.

Some sub-projects:

(1) Application of the Sequencer to light curves from the Zwicky Transient Facility (ZTF): ZTF performs a systematic exploration of the variable sky, delivering time-series of numerous asteroids, variable stars, supernovae, active galactic nuclei, and more. The size and complexity of the dataset do not allow a manual inspection of the different transient phenomena and require the application of automatic tools to visualize and study this rich dataset. In this project, the student will apply the Sequencer, an algorithm designed to find one-dimensional sequences in datasets, to ZTF light curves, with the goal of finding new correlations between the light curve properties.
(2) Unsupervised clustering of ZTF light curves: the student will use dimensionality reduction algorithms such as tSNE and UMAP to perform unsupervised clustering of millions of light curves from the Zwicky Transient Facility. The goal is to find a data-driven low-dimensional representation of the dataset in which different classes of astronomical transients are well-separated. In addition, the goal is to find sources that do not fall under the main clusters, and might represent a new type of astronomical transient.
(3) Using the Sequencer to map integral-field-unit (IFU) data into an image: the most advanced astronomical instruments are now capable of measuring the light from galaxies as a function of wavelength (i.e., spectrum) and location. These integral field units essentially collect 3-dimensional cubes of galaxies, which allow scientists to study the gas and stellar properties in different locations within a given galaxy. These cubes, however, are complex and challenging to analyze, requiring the use of automatic exploration tools. In this project, the student will use the Sequencer to order the spectra of a galaxy observed with an IFU. The ordering will be used to visualize, in a data-driven way, the properties of the galaxy.

This SURF research project will be hosted at Carnegie Observatories, which is located roughly a mile north of Caltech campus. Carnegie hosts undergraduate research summer students from
a variety of colleges and universities across Southern California. In addition to research, Carnegie summer interns (including those from the SURF program) participate in a wide variety of professional development activities, including a coding bootcamp at the beginning of the summer, scientific communication workshops throughout the program, and seminars on issues related to diversity, equity, and inclusion in science. Upon successful completion of the program, all students will also be given the opportunity to attend the American Astronomical Society
meeting and present their research the following January. For information about the Carnegie Summer program can be found at https://obs.carnegiescience.edu/CASSI.
References:  Machine Learning applications in astronomy: https://arxiv.org/abs/1904.07248
The Sequencer algorithm: https://arxiv.org/abs/2006.13948, http://sequencer.org/, https://github.com/dalya/Sequencer
Zwicky Transient Facility: https://www.ztf.caltech.edu/
Student Requirements:  Position available to Caltech students only. Research will be conducted at Carnegie Observatories in Pasadena as part of the Carnegie Astrophysics Summer Student Internship (CASSI) program which runs from June 19th - Aug 25th. Students must be present for the full duration of the program.

A background in programming with python is required, ideally at least at the level of Caltech's CS 1 course. Prior knowledge in algorithms and Machine Learning is ideal, but students new to the field with an interest to learn these subjects are also very welcome.
Programs:  This AO can be done under the following programs:

  Program    Available To
       SURF    Caltech students only 

Click on a program name for program info and application requirements.



<< Prev    Record 25 of 59    Next >>           Back To List