Search Search

SURF: Announcements of Opportunity

Below are Announcements of Opportunity posted by Caltech faculty and JPL technical staff for the SURF program. Additional AOs for the Amgen Scholars program can be found here.

Specific GROWTH projects being offerred for summer 2019 can be found here.

Each AO indicates whether or not it is open to non-Caltech students. If an AO is NOT open to non-Caltech students, please DO NOT contact the mentor.

Announcements of Opportunity are posted as they are received. Please check back regularly for new AO submissions! Remember: This is just one way that you can go about identifying a suitable project and/or mentor.

Announcements for external summer programs are listed here.

Students pursuing opportunities at JPL must be
U.S. citizens or U.S. permanent residents.

  << Prev    Record 52 of 126    Next >>           Back To List


Project:  Multiobjective Memetic Algorithms for Symbolic Regression – SURF@Newcastle
Disciplines:  Mathematics, Computation and Neural Systems
Mentor:  Pablo Moscato, Professor, (PMA), Pablo.Moscato@newcastle.edu.au, Phone: +61 2 434216209 (mobile)
Mentor URL:  http://www.newcastle.edu.au/profile/pablo-moscato  (opens in new window)
AO Contact:  Pablo Moscato, Pablo.Moscato@newcastle.edu.au
Background:  NOTE:This project is being offered by a Caltech alum and will be conducted at The University of Newcastle, City Campus, Newcastle, Australia.

How can we use problem domain knowledge to deliver a new truly multiobjective memetic approach for symbolic regression?

Can the use of multiple objective functions help to guide model building towards producing better generalization outcomes and more interpretable solutions for AI?

The project starts from successful work done in 2018 as part of a previous SURF project. In general, methods for symbolic regression, such as GP, represent mathematical models as tree structures. Following standard graph notation, there are edges connecting pairs of vertices and some are leaves and others are internal vertices of these structures. The trees’ internal vertices have associated some mathematical operators. These operators depend on the problem domain, they could be mathematical, e.g. addition, represented as ‘+’, substraction (‘-’), multiplication (‘*’), division (‘/’), etc. In other problem domains it could even be logical operators, or even primitive algorithms, etc. Each of the leaves will have associated one of the predictor variables of the study of which we have data available. As said before, other building blocks associated to the internal vertices could be logical operators (e.g. AND, NOT, OR, XOR, etc), mathematical functions (e.g. trigonometric functions, exponentiation, etc.) and many others.

Research in symbolic regression methods, like GPs, have not sufficiently explored how to use problem-domain information. In contrast, a Memetic Algorithm approach, by definition, is primarily concerned with exploiting mathematical knowledge built over many decades in the area of function approximation. We have a good start last year. In 2018, working with one Caltech SURF awardee we have shown that an approach based on representing models as analytic continued fractions provides the basis of a new representation that can dramatically change the field. This representation was shown to provide good models for fitting one-dimensional functions in several real-world regression problems normally used to test both GP methods and artificial neural networks. Results have been collected as a research manuscript and it has been submitted for publication in an International Conference in the field.
This indicates that within a memetic computing approach the new representation is very useful.
There are a number of extensions now of interest. In particular, can the use of several objective functions to help the population of agents. We also need to work in developing new local search strategies to accelerate the creation of models and to improve the efficiency of method.
This project has a vast area of applications and it is flexible to accommodate a problem domain that has the interest of the student and offers the possibility of further extensions and research collaborations with the mentors.
Description:  The student will develop an open source code for a Multiobjective Memetic Algorithm for symbolic regression which it will be based on a representation via ratios of polynomials.

This is likely to lead to a heuristic to address the problem in which some variables are selected and a non-linear optimization problem needs to be solved to identify the contribution of these variables to fitting a particular function given experimental data.

The method will be tested with a number of datasets of interest and available for experimentation. A comparison with other Genetic Programming approaches are expected, thus the deliverables may help to compete in international events dedicated to this area.

Candidates could continue developing this research area while returning to Caltech, if interested in developing an ongoing collaboration with the mentors and the previous SURF student who has worked in a project.

References:  1) John R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA, 1992.
2) John R. Koza. Human-competitive results produced by genetic programming. Genetic Programming and Evolvable Machines, 11(3/4):251–284, September 2010. Tenth Anniversary Issue: Progress in Genetic Programming and Evolvable Machines.
3) Handbook of Memetic Algorithms, F. Neri, C. Cotta and P. Moscato (Eds.), Springer, 2012.
4) Multiobjective memetic algorithms, A Jaszkiewicz, H Ishibuchi, Q Zhang, Handbook of Memetic Algorithms, 201-217, 2012.
5) Machine-assisted discovery of relationships in astronomy, Graham, Matthew J., et al. arXiv preprint arXiv:1302.5129 Mon. Not. R. Astron. Soc. (2013).
6) http://en.wikipedia.org/wiki/Generalized_continued_fraction
7) www.genetic-programming.org
8) A Memetic Algorithm for Symbolic Regression, Haoyuan Sun and Pablo Moscato, submitted, Jan. 2019.
9) Distilling Freeform Natural Laws from Experimental Data, http://www.youtube.com/watch?v=lmiAugo1CJI
Student Requirements:  High-level programming skills, interest in scientific computing/machine learning/artificial intelligence. Experience in non-linear optimization, HPC and GPU computing, and/or knowledge of symbolic regression and its applications is also a plus.
Programs:  This AO can be done under the following programs:

  Program    Available To
       SURF    Caltech students only 

Click on a program name for program info and application requirements.


  << Prev    Record 52 of 126    Next >>           Back To List