Student-Faculty Programs Office
Summer 2024 Announcements of Opportunity


<< Prev    Record 3 of 63    Next >>           Back To List

Project:  SURF@Newcastle, Australia - Advancing Generative AI in the Image Domain Through New Machine Learning and Analytics
Disciplines:  Computation and Neural Systems, Math, CS, ACM, Ph, other majors also possible with course(s) in optimization
Mentor:  Pablo Moscato, Professor, (EAS), pablo.moscato@newcastle.edu.au, Phone: +61 2 434216209
Mentor URL:  https://www.newcastle.edu.au/profile/pablo-moscato  (opens in new window)
Background:  NOTE: This project is being offered by a Caltech alum and is open only to Caltech students. The project will take place at University of Newcastle, Australia, in Newcastle, Australia.

Diffusion-based probabilistic methods, while very powerful in the area of generative AI, giving rise to companies like Leonardo AI and competitors like Midjourney and RunwayML, are considered to be in their infancy.

There is a need to establish these techniques on a more solid mathematical foundation. While it is not challenged that they are useful, there is anecdotal references about the problems that the techniques have in creating some images from prompts given by users.

There are five main areas that could give rise to possible projects, so without going into much detail we can comment on some of them:

Generative AI and LLMs AND Diffusion models for generative AI (i.e. Stable Diffusion)

Images are generated via natural text processing. This is facilitated by large scale training on image-labelled text pairs, which in turn requires around petascale curation and processing. It is essential for useful generation that “the right prompts” are used. Leonardo AI has been successful by leveraging some LLM approaches for prompt engineering.

This may need further exploration because it is currently providing a sequence of images based on redefinitions of the prompts and a user-in-the-loop strategy.

Work in this particular project would need to somebody who is willing to understand the concepts and perhaps propose alternatives that are more algorithmically-driven (from existing available techniques) as opposed to a mathematically-driven approach. This project is then suitable to Computer Science students with experience in AI and Machine Learning.

Large-scale machine learning AND Artificial Neural Networks and Deep Learning strategies for Image Analysis

Leonardo AI has to be doing massive training of artificial neural networks to develop the new models that are used in their production pipeline (to generate images).

New mathematical techniques and algorithms that can help reduce the computational burden without losing quality could be potentially of use. In turn these techniques can be applied to other problem domains, so again we expect that the lessons learned in this project could help the community at large by reducing the practical complexity of the training.

GPU-based high performance computing

By large all the methods used for training systems as well as the stable diffusion methods use GPU-based computing. Foundational model training is very costly but essential for the ongoing business.

There is room for innovation here, if for instance it is possible to find that the same kind of computations could be done with academic-available computing systems, this would be a massive breakthrough both for the company and the University. At present all training done by Leonardo AI can not be done in academics settings.
Description:  See above.
References:  1. Denoising Diffusion Probabilistic Models by UC Berkeley: Introduced Denoising Diffusion Probabilistic Models (DDPMs) for generating high-quality images from random noise. Utilizes the denoising score matching framework and a forward diffusion process to transform noise into images. Demonstrated the suitability of diffusion models for generating high-quality image samples and meaningful high-level attributes in latent variables. https://proceedings.neurips.cc/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf

2. Diffusion Models Beat GANs on Image Synthesis by OpenAI: Showed that diffusion models can generate higher-quality images compared to Generative Adversarial Networks (GANs). Improved model architecture and proposed a scheme for trading off diversity for fidelity. Demonstrated adjustments in classifier gradients to trade diversity for fidelity and enhance sample quality for conditional image synthesis. https://arxiv.org/abs/2105.05233

3. DALL-E 2 by OpenAI: Enhanced the original DALL-E's capabilities for text-guided image synthesis by training on a dataset of 400 million image-text pairs. Aimed to synthesize realistic images from text descriptions and enable language-guided image manipulations.Implemented safety mitigation measures to address potential risks and limitations of diffusion models. https://openai.com/dall-e-2

4. Imagen by Google: Introduced Imagen, a text-to-image diffusion model, focused on achieving unprecedented photorealism in output images. Utilized a large T5 language model to encode input text into embeddings and a conditional diffusion model to generate images. Produced 1024x1024 samples with high photorealism and image-text alignment, preferred by human raters over other models. https://arxiv.org/abs/2205.11487

5. ControlNet by Stanford: Introduced ControlNet, an architecture to control pretrained large diffusion models and support additional input conditions. Cloned weights of a large diffusion model into trainable and locked copies, enabling more control over output images through conditional inputs. Demonstrated robust learning even with small training datasets and facilitated various image generation applications. https://arxiv.org/abs/2302.05543
Student Requirements:  High-level programming skills, interest in scientific computing/machine learning/artificial intelligence. Experience in HPC and GPU computing, knowledge of symbolic regression and its applications is also a plus.
Programs:  This AO can be done under the following programs:

  Program    Available To
       SURF    Caltech students only 

Click on a program name for program info and application requirements.



<< Prev    Record 3 of 63    Next >>           Back To List
 

Problems with or questions about submitting an AO?  Call Alexandra Katsas of the Student-Faculty Programs Office at (626) 395-2885.
 
About This Site