Reinforcement learning through global stochastic search in N-MDPs