Improving Sample Efficiency in Behavior Learning by Using Sub-optimal Planners for Robots