FURI | Spring 2025

Optimizing Earth Science Observations: Developing Reinforcement Learning Techniques for autonomously determining priority observations in a dynamic environment

Data icon, disabled. Four grey bars arranged like a vertical bar chart.

Dynamic Targeting is an extensively researched concept used by satellites to point instruments in the optimal direction to get the maximum scientific yield. However, conventional methods use scheduling systems for determining points on the map that are worth observing. Auto-scheduler systems are effective in selecting priority observations, but these methods are not fully accurate or conducted in real time. Incorrect decisions by satellites lead to wasted power and redundant data. This research aims to enable satellites to dynamically determine valid observation locations in real time. Utilizing state features like geolocation, sun-angle relationship, ground status and convective precipitation, the research proposes reinforcement learning techniques using PyTorch to test multiple model-free reinforcement learning algorithms like PPO, Double-DQN and C51 in order to optimize Earth Science Observations. The research also aims to propose an effective reward function for a binary action space that instructs whether the satellite instrument observes or not. The project focuses on creating a reinforcement learning environment that emulates the satellite’s decisions to observe the potential points. The project involves usage of large simulated environmental data in the NASA GEOS-5 dataset, model selection and training, testing and tuning of the model, and evaluation through comparison with a random forest classifier model. The research will be considered a success when the results correspond with the simulated data on convective precipitation storms, proving greater precision and recall than predictive supervised learning models. The ultimate goal of the research is to autonomously enable satellites in prioritizing areas with high atmospheric activity, enhancing environmental monitoring and providing deeper insights into Earth’s atmospheric dynamics.

Glossary : PPO (Proximal Policy Optimization), Double-DQN (Double Deep Q-Network), C51 (Categorical Deep Q-Network or Distributional Reinforcement Learning)

Student researcher

Shashwat Raj

Computer systems engineering

Hometown: Tempe, Arizona, United States

Graduation date: Spring 2025