Reinforcement Learning for Lane Following
Overview
Train a reinforcement learning (RL) policy in simulation to keep a virtual racecar centered in its lane using a lightweight environment such as Donkey Car Gym or a custom Gazebo world. You will configure the simulator, implement a simple reward function, train the agent, and evaluate transfer potential to the real vehicle.
Prerequisites
- Completed Train and Deploy an MLP Steering Policy
- Python RL stack installed (
uv pip install stable-baselines3[extra] gymnasium) - Donkey Car Gym environment or equivalent ROS 2-compatible simulator
- GPU recommended for faster policy training
Learning Objectives
- Students can set up and run a basic reinforcement learning training loop in simulation.
- Formulate lane following as a Markov Decision Process (MDP)
- Implement a reward function balancing lane centering and speed
- Train an RL agent (e.g., PPO) and monitor convergence
- Evaluate policy robustness and plan sim-to-real transfer steps
1. Set Up the Simulator
- Install Donkey Car Gym:
git clone https://github.com/tawnkramer/donkey_gym.git cd donkey_gym pip install -e . - Launch the simulator:
python scripts/donkey_sim.py --headless 0 --level 1 - Confirm control loop frequency (~20 Hz) and observation structure.
2. Define the Environment Wrapper
Create fri_rl/envs/donkey_lane.py:
import gymnasium as gym
import numpy as np
from donkey_gym.envs.donkey_env import DonkeyEnv
class LaneFollowEnv(gym.Env):
metadata = {'render_modes': ['human']}
def __init__(self):
self.env = DonkeyEnv(level=1, frame_skip=1, camera_height=80, camera_width=160)
self.observation_space = gym.spaces.Box(low=0, high=255, shape=(80, 160, 3), dtype=np.uint8)
self.action_space = gym.spaces.Box(low=np.array([-1.0, 0.0]), high=np.array([1.0, 1.0]), dtype=np.float32)
def step(self, action):
obs, reward, done, info = self.env.step(action)
lane_error = info.get('cte', 0.0)
speed = info.get('speed', 0.0)
reward = lane_reward(lane_error, speed)
return obs, reward, done, info
def reset(self, *, seed=None, options=None):
return self.env.reset(), {}
def render(self):
return self.env.viewer.render(return_rgb_array=True)
def close(self):
self.env.close()
Add helper function lane_reward(cte, speed) returning 1.0 - abs(cte) - 0.1 * (1.0 - speed / speed_ref).
3. Train a PPO Agent
- Script
train_ppo.py:import gymnasium as gym from stable_baselines3 import PPO from stable_baselines3.common.callbacks import EvalCallback from fri_rl.envs.donkey_lane import LaneFollowEnv env = LaneFollowEnv() model = PPO('CnnPolicy', env, learning_rate=3e-4, n_steps=2048, batch_size=256, verbose=1) eval_env = LaneFollowEnv() callback = EvalCallback(eval_env, best_model_save_path='artifacts/', eval_freq=10000, deterministic=True) model.learn(total_timesteps=1_000_000, callback=callback) model.save('artifacts/ppo_lane_follow') - Monitor training with TensorBoard (
tensorboard --logdir logs/). - Expect convergence within 0.5–1.0 million steps depending on reward shaping.
4. Evaluate and Visualize
- Run scripted evaluations to calculate mean cross-track error (CTE) and episode length.
- Record videos (
model.get_env().render()) to present qualitative results. - Analyze failure cases (sharp turns, lighting changes) and adjust reward or curriculum.
5. Plan Sim-to-Real Transfer
- Compare observation modalities: if the real car uses grayscale images, replicate in sim.
- Apply domain randomization (textures, lighting) during training to improve robustness.
- Export the policy (
model.policy) and convert it to TorchScript for ROS 2 deployment.
6. Safety Considerations
- Before real-world trials, test the RL policy in a hardware-in-the-loop setup with velocity caps.
- Keep manual override ready and limit gains to reduce aggressive maneuvers.
- Use the MLP policy as a fallback if the RL controller diverges.
Wrap-Up
Archive training logs, best-performing checkpoints, and environment configuration files. Document lesson learned on reward shaping and domain randomization to inform future sim-to-real experiments.