MORE | Spring 2020

Efficient Policy Iteration Architecture for Learning Rollout Policy in POMDP

Data icon, disabled. Four grey bars arranged like a vertical bar chart.

The research project considers an infinite horizon discounted dynamic programming problem with finite state and control space under partial observability. These problems are hard due to the curse of dimensionality. The work uses a policy iteration algorithm for learning a rollout policy with multi-step lookahead, truncated rollout, and terminal cost function approximation while exploiting distributed computation. The future work aims to use aggregation to further reduce the state space and complexity of the problem and also explore the efficacy of different neural network architectures as approximators. These methods have been applied in simulation to a class of search and rescue problems.