Student-Faculty Programs Office
Summer 2025 Announcements of Opportunity


<< Prev    Record 23 of 52    Next >>           Back To List

Project:  Precision Machine Learning With Continued Fractions - SURF@Newcastle in 2025
Disciplines:  Computation and Neural Systems, Mathematics, CS, Applied Math, Physics
Mentor:  Pablo Moscato, Professor, (EAS), pablo.moscato@newcastle.edu.au, Phone: +61 2 424216209
Mentor URL:  https://www.newcastle.edu.au/profile/pablo-moscato  (opens in new window)
Background:  NOTE: This project is being offered by a Caltech alum and is open only to Caltech students. The project will be conducted at the University of Newcastle in Newcastle, Australia.

Do you know that many of the machine learning methods currently in practice have notorious problems when they need to produce results “outside the domain” defined by their training sets?

In fact, many multivariate regression techniques perform really well when they are, in some sense, only interpolating, but they have important drawbacks in extrapolation.

This project aims at exploring which is the extent of these problems and also to find ways to remediate this.

We aim at generating extremely high-precision analytical models of scientific problems of interest.

The project stems from a collaboration with Caltech SURF students in 2018, 2019, 2020, and 2021 so this would be the fifth year of a successful enterprise in this area. We are developing truly innovative new machine learning methods.

As of January 2022, several publications resulted from the work with Caltech SURF students and some manuscripts have been already published or are close to acceptance [1,2,3,4]. In addition, a new manuscript is being prepared in the area of computational stylistics (involving work by Shakespeare and his contemporaries) [5].

This said, this project has a vast area of applications and it is flexible to accommodate a problem domain that has the interest of the student and offers the possibility of further extensions and research collaborations with the mentors.

This project involves the continuation of this work [1,2,3,4,5], acceleration of the codes (currently one in Matlab, another in C++), with emphasis in improving several aspects of the existing memetic algorithm [3,6,7].

At the moment we also need to improve the performance of the non-linear optimization components of our algorithm. Significant experimentation and coding will be part of the project and, ideally, familiarity with the existing tools and methods should be gained before coming to work at Newcastle. Although significant progress has been achieved in 2021, a key publication is [3] to understand the methodology so far developed.

Students will explore the limitation of nearly 40 different machine learning regression methods and explore the approach based on analytic continued fractions. Students will work in individual subprojects, but also as a team (in close collaboration with other Caltech SURF students in 2022) and with other Caltech students still interested in the project, current postdocs and PhD students in Newcastle, and partners in Italy, Spain and Australia.

[This project may have more than one individual, so working collaborators are invited to apply as a team.]
Description:  The student will continue the ongoing development of open source codes for memetic algorithms for machine learning problems, mainly in regression but with extension to classification, which it will be based on a representation that exploits the power of analytic continued fractions.

This is likely to lead to a powerful new method to address the problem in which some variables are selected and a non-linear optimization problem needs to be solved to identify the contribution of these variables to fitting a particular function given experimental data.

The method will be tested with a number of datasets of interest and available for experimentation. A comparison with other machine learning approaches are expected, thus the deliverables may help the team to continue the collaboration after SURF and engage in ongoing competitions in international events dedicated to this area or those such as being sponsored by Kaggle and other international groups.

We expect that candidates could continue developing this research area while returning to Caltech, if interested in developing an ongoing collaboration with the mentors (as it has happened in the past).

The internship may provide the necessary time for effective communication of what the core problems are and find a first solution which may result in, at least, one journal publication.
References:  1) A memetic algorithm for symbolic regression,
H. Sun and P. Moscato, in Proc. of IEEE Conference on Evolutionary Computation 2019, pp. 2167-2174, (2019)
https://ieeexplore.ieee.org/document/8789889
2) Analytic Continued Fractions for Regression: Results on
352 datasets from the physical sciences, P. Moscato, H. Sun, M.N. Haque, in Proc. of IEEE IEEE Conference on Evolutionary Computation 2020, pp. 1-8. (2020)
https://ieeexplore.ieee.org/abstract/document/9185564
3) Analytic Continued Fractions for Regression: A Memetic Algorithm Approach, P. Moscato, H. Sun and M.N. Haque, in Expert Syst. Appl. 179: 115018 (2021) (see also
https://arxiv.org/abs/2001.00624)
4) Learning to extrapolate using continued fractions:
Predicting the critical temperature of superconductor materials, P. Moscato, M.N. Haque, K. Huang, J. Sloan and J.C. de Oliveira, (2023)
Algorithms 16 (8), 382
https://www.mdpi.com/1999-4893/16/8/382
5) Multiple regression techniques for modeling dates of first performances of Shakespeare-era plays, P. Moscato, H. Craig, G. Egan, M.N. Haque, K. Huang, J. Sloan and J.C. de Oliveira, in Expert Systems with Applications 200, 116903.
https://www.sciencedirect.com/science/article/abs/pii/S0957417422003414
6) Handbook of Memetic Algorithms, F. Neri, C. Cotta and P. Moscato (Eds.), Springer, 2012.
https://www.springer.com/gp/book/9783642232466
7) Memetic Algorithms for Business Analytics and Data Science: A Brief Survey, Pablo Moscato and Luke Mathieson, in Business and Consumer Analytics: New Ideas, Pablo Moscato and Natalie Jane de Vries (Eds), pp 545-608, https://link.springer.com/chapter/10.1007/978-3-030-06222-4_13
8) Padé approximant, by G.A. Baker Jr. in Scholarpedia, http://www.scholarpedia.org/article/ Padé_approximant
9) Distilling Freeform Natural Laws from Experimental Data, https://www.youtube.com/watch?v=lmiAugo1CJI
10) John R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA, 1992.
11) John R. Koza. Human-competitive results produced by genetic programming. Genetic Programming and Evolvable Machines, 11(3/4):251–284, September 2010. Tenth Anniversary Issue: Progress in Genetic Programming and Evolvable Machines.
12) Gene Expression Programming: A Survey, Jinghui Zhong, Liang Feng, Yew-Soon Ong, http://ieeexplore.ieee.org/abstract/document/7983467/
13) Machine-assisted discovery of relationships in astronomy, Graham, Matthew J., et al. arXiv preprint arXiv:1302.5129 Mon. Not. R. Astron. Soc. (2013).
14) www.genetic-programming.org
Student Requirements:  High-level programming skills, interest in scientific computing/machine learning/artificial intelligence. Experience in HPC and GPU computing, knowledge of symbolic regression and its applications is also a plus.
Programs:  This AO can be done under the following programs:

  Program    Available To
       SURF    Caltech students only 

Click on a program name for program info and application requirements.



<< Prev    Record 23 of 52    Next >>           Back To List
 

Problems with or questions about submitting an AO?  Call Student-Faculty Programs of the Student-Faculty Programs Office at (626) 395-2885.
 
About This Site