SimCenter Data Science Interns – Fall 2021

November 8, 2021

This Fall semester, four UC Berkeley undergraduate students are completing an exciting internship with the NHERI SimCenter in the “Deep-learning for compressing scientific data and generating real-time predictions of storm-surge” project. This project ties into the SimCenter’s Hydro-UQ software with an objective of developing a neural-network based algorithm that uses high-fidelity data for training and eventual prediction of storm surge scenarios. These algorithms have demonstrated potential to reliably compress and decompress scientific data, thereby reducing data storage for large regional hazard simulations. The objective is to create neural-network algorithms that can efficiently and accurately predict an event’s impact as a faster alternative to the currently used approach of computationally- and data-intensive storm-surge simulations.

SimCenter Fall 2021 Data Science Intern Program (L to R): Rohil Kanwar, Ajay B. Harish, Michael Leite-Garcia, Michelle Gu, Maxwell Liu

SimCenter Fall 2021 Data Science Intern Program (L to R): Rohil Kanwar, Ajay B. Harish, Michael Leite-Garcia, Michelle Gu, Maxwell Liu

Undergraduate student interns Rohil Kanwar, Michael Leite-Garcia, Michelle Gu, and Maxwell Liu are exploring different machine learning techniques and neural network architectures for the project. The interns’ experience helps them to understand how to apply data to a real-world project. They have been working under the guidance of Dr. Ajay B. Harish of the SimCenter development team, and Dr. Matt Schoettler, SimCenter Associate Director.

Dr. Harish stated that “Rohil is exploring the prediction using similarity scores to train Convolutional Neural Networks (CNNs), Michael is developing Physics-Informed Neural Networks (PINN), Michelle is developing Generative Adversarial Networks (GANs) which are often used in computer vision, and Maxwell is combining Gaussian processes with neural networks. As a part of the Data Discovery Program, we are exploring a wide range of techniques, and we believe that each of these methods will have their own pros and cons. We are very excited to see how the techniques compare towards enhancing the predictive modeling capabilities for natural hazard events.”

Rohil Kanwar is a fourth-year student majoring in Data Science with an interest in deep learning and Natural Language Processing solutions. He intends to leverage the deep learning experience gained in this project into a career in Machine Learning, especially building efficient neural nets for massive datasets. "This project seeks to actually save lives using real-time intelligent information dispersal using the technology that we are building, and that thought empowers me to think about innovative solutions where deep learning could bring about life-altering impact."

Michael Leite-Garcia is a senior majoring in physics and graduating this Fall. “This project has given me insight on what to be aware of working with big data and how to approach large-scale implementation. For future projects I will be able to better identify areas of interest with handling large datasets, how to construct questions that will best help me understand and produce good results, and how to work as a member of a team to accomplish a technical task.”

Michelle Gu is a third-year student majoring in Data Science with a domain emphasis in Computational Biology. “I want to work on projects that make a true impact, and this opportunity with the SimCenter aligned perfectly with that goal and my high school background in Global Ecology. I have gained a tremendous amount of knowledge in data science and machine learning from interacting with the other team members and Dr. Harish, and I'm excited about the possibilities with all the projects Dr. Harish is working on. I can't wait to see all our final products in action and making an impact.”

Maxwell Liu is a fourth-year student majoring in Computer Science and minoring in Data Science. “This project has taught me how to work with large data sets and extract meaningful information from sparse data sets. This has made me more confident to work with similar data sets and the ambiguity of real-life data in the future. The SimCenter has given me the opportunity to learn and grow as an aspiring Data Scientist and researcher, applying Data Science and Computer Science concepts to meaningful problems.”

The internship was made possible by UC Berkeley's Data Science Discovery Program that brings together multidisciplinary teams for cutting-edge data research. More information about the project “Deep-learning for compressing scientific data and generating real-time prediction of storm-surge” can be found at the Discovery Application Portal.