Abstract: Policy synthesis in Markov decision processes uses a known reward function to compute a policy that maximizes it. However, onlookers may infer reward functions by observing agents, which can ...