Download

View publication

Abstract

An agent navigates a latent graph by performing actions that take it from one node to another. The chosen action determines the probability distribution over the next visited node. At each node, the agent receives an observation, but this observation is not unique, so it does not identify the node, making the problem \emph{aliased}. The purpose of this work is three-fold, we want to (a) provide a mechanism to recover the latent graph from a sequence of observation-action pairs in the presence of aliasing; (b) provide a policy that approximately maximizes exploration efficiency (i.e., how well the graph is recovered for a given exploration budget); and (c) introduce measures that are adequate to quantify performance in this type of problem. In the unaliased case, we show improved performance w.r.t.\ state-of-the-art reinforcement learning baselines. For the aliased case we are not aware of suitable baselines and instead show faster recovery w.r.t.\ a random policy for a wide variety of topologies, and exponentially faster recovery than a random policy for challenging topologies. We dub the algorithm eFeX (from eFficient eXploration).