Announcements of Opportunity
SURF: Announcements of Opportunity
Below are Announcements of Opportunity posted by Caltech faculty and JPL technical staff for the SURF program.
Each AO indicates whether or not it is open to non-Caltech students. If an AO is NOT open to non-Caltech students, please DO NOT contact the mentor.
Announcements of Opportunity are posted as they are received. Please check back regularly for new AO submissions! Remember: This is just one way that you can go about identifying a suitable project and/or mentor. Click here for more tips on finding a mentor.
Announcements for external summer programs are listed here.
New for 2021: Students applying for JPL projects should complete a SURF@JPL application instead of a "regular" SURF application.
Students pursuing opportunities at JPL must be
U.S. citizens or U.S. permanent residents.
|Project:||Continued Fractions, Machine Learning and Classification – SURF@Newcastle|
|Disciplines:||Computation and Neural Systems, Astronomy, Math, Computer Science, ACM, Ph|
|Mentor URL:||https://www.newcastle.edu.au/profile/pablo-moscato (opens in new window)|
NOTE: This project is being offered by a Caltech alum and will be conducted at University of Newcastle Australia in Newcastle, Australia. Only Caltech students are eligible for this project.
Starting in the 1950s and 60s, computers have been using machine learning methods based on separations of points in hyper-dimensional spaces using hyperplanes, or by networks of hyperplanes.
These “connexionist” approaches have been very successful, but they tend to generate large “black boxes” of adjustable parameters, hiding what the actual network is “really doing”.
Towards addressing the need of more interpretable models, we started a collaboration that involved several Caltech SURF students in 2018, 2019, and 2020. Starting with regression problems in machine learning, we now aim to deal with classification ones. We have used analytic continued fractions. Have you ever heard about them? And Padé approximants? You will be surprised that there is an interesting mathematical foundation that goes to the year 1748 or even before, as some key results are attributed to Euler and his predecessors.
As of January 2021, several publications have arising from the work with Caltech SURF students and some manuscripts have been submitted [1,2,3,4]. In addition, a new manuscript is being prepared in the area of computational stylistics (involving work by Shakespeare and his peers) .
We have now several datasets of great interest in which the use of continued fractions could be explored as a tool of classification, extending our current methods from regression to classification.
Some of the datasets we have available for this project include, but are not limited to:
a) Identifying the author of works of the time of Shakespeare (see for instance ) based only on the probability of word use (or word pairs which has not yet been explored),
b) Large astronomical datasets from the Zwicky Transient Facility, a large optical survey in multiple filters which produces hundreds of thousands of transient alerts per night . This dataset has a myriad of different classification problems.
c) Other machine learning datasets involving hard classification problems available in the literature or which may interest the candidate.
Students will explore the limitations of different machine learning regression methods and help to develop a new methodology based on analytic continued fractions. Students will work in individual subprojects, but also as a team (in close collaboration with other Caltech SURF students in 2021) and with other Caltech students still interested in the project, current postdocs and PhD students in Newcastle, and partners in Italy, Spain and Australia.
The student will continue the ongoing development of open source codes for memetic algorithms for machine learning problems, mainly in regression but with extension to classification, which it will be based on a representation that exploits the power of analytic continued fractions.
In this particular project, the student will look at datasets of clear astronomical interest, and also use datasets like the one from Shakespeare authorship, as methodological control and to help develop the methodology.
The method will be tested with a number of datasets of interest and available for experimentation. A comparison with other machine learning approaches are expected, thus the deliverables may help the team to continue the collaboration after SURF and engage in ongoing competitions in international events dedicated to this area or those such as being sponsored by Kaggle and other international groups.
We expect that candidates could continue developing this research area while returning to Caltech, if interested in developing an ongoing collaboration with the mentors (as it has happened in the past). It is possible to imagine a number of other approaches can be explored during the SURF project including the implementation of implementations of algorithms in GPUs, TPUs and future hardware systems (such as Intel’s Nervana, or Graphcore IPUs) and to run the method on them. We expect to get access to some of these systems soon. The internship may provide the necessary time for effective communication of what the core problems are and find a first solution which may result in, at least, one journal publication.
1) A memetic algorithm for symbolic regression,
H. Sun and P. Moscato, in Proc. of IEEE Conference on Evolutionary Computation 2019, pp. 2167-2174, (2019)
2) Analytic Continued Fractions for Regression: Results on
352 datasets from the physical sciences, P. Moscato, H. Sun, M.N. Haque, in Proc. of IEEE IEEE Conference on Evolutionary Computation 2020, pp. 1-8. (2020)
3) Analytic Continued Fractions for Regression: A Memetic Algorithm Approach, P. Moscato, H. Sun and M.N. Haque, (2020), https://arxiv.org/abs/2001.00624
4) Learning to extrapolate using continued fractions:
Predicting the critical temperature of superconductor materials, P. Moscato, M.N. Haque, K. Huang, J. Sloan and J.C. de Oliveira, (2020)
5) Continued fractions meet the classics or‘ My kingdom for a continued fraction!’, P. Moscato, H. Craig, G. Egan, M.N. Haque, K. Huang, J. Sloan and J.C. de Oliveira (to appear, 2021).
6) Language Individuation and Marker Words: Shakespeare and His Maxwell's Demon, J. Marsden, D. Budden, H. Craig, and P. Moscato, https://journals.plos.org/plosone/article/authors?id=10.1371/journal.pone.0066813
7) Machine Learning for the Zwicky Transient Facility, A. Mahabal et al, Publications of the Astronomical Society of the Pacific, 2019 PASP 131 038002,
8) Handbook of Memetic Algorithms, F. Neri, C. Cotta and P. Moscato (Eds.), Springer, 2012.
9) Memetic Algorithms for Business Analytics and Data Science: A Brief Survey, P. Moscato and L. Mathieson, in Business and Consumer Analytics: New Ideas, Pablo Moscato and Natalie Jane de Vries (Eds), pp 545-608, https://link.springer.com/chapter/10.1007/978-3-030-06222-4_13
|Student Requirements:||High-level programming skills, interest in scientific computing/machine learning/artificial intelligence. Experience in HPC and GPU computing, knowledge of symbolic regression and its applications is also a plus.|
This AO can be done under the following programs:
<< Prev Record 21 of 69 Next >> Back To List