Reinforcement Learning for Lane Following

Overview

Train a reinforcement learning (RL) policy in simulation to keep a virtual racecar centered in its lane using a lightweight environment such as Donkey Car Gym or a custom Gazebo world. You will configure the simulator, implement a simple reward function, train the agent, and evaluate transfer potential to the real vehicle.

Prerequisites

Completed Train and Deploy an MLP Steering Policy
Python RL stack installed (uv pip install stable-baselines3[extra] gymnasium)
Donkey Car Gym environment or equivalent ROS 2-compatible simulator
GPU recommended for faster policy training

Learning Objectives

Students can set up and run a basic reinforcement learning training loop in simulation.
Formulate lane following as a Markov Decision Process (MDP)
Implement a reward function balancing lane centering and speed
Train an RL agent (e.g., PPO) and monitor convergence
Evaluate policy robustness and plan sim-to-real transfer steps

1. Set Up the Simulator

Install Donkey Car Gym:

git clone https://github.com/tawnkramer/donkey_gym.git
cd donkey_gym
pip install -e .

Launch the simulator:

python scripts/donkey_sim.py --headless 0 --level 1

Confirm control loop frequency (~20 Hz) and observation structure.

2. Define the Environment Wrapper

Create fri_rl/envs/donkey_lane.py:

import gymnasium as gym
import numpy as np
from donkey_gym.envs.donkey_env import DonkeyEnv

class LaneFollowEnv(gym.Env):
    metadata = {'render_modes': ['human']}

    def __init__(self):
        self.env = DonkeyEnv(level=1, frame_skip=1, camera_height=80, camera_width=160)
        self.observation_space = gym.spaces.Box(low=0, high=255, shape=(80, 160, 3), dtype=np.uint8)
        self.action_space = gym.spaces.Box(low=np.array([-1.0, 0.0]), high=np.array([1.0, 1.0]), dtype=np.float32)

    def step(self, action):
        obs, reward, done, info = self.env.step(action)
        lane_error = info.get('cte', 0.0)
        speed = info.get('speed', 0.0)
        reward = lane_reward(lane_error, speed)
        return obs, reward, done, info

    def reset(self, *, seed=None, options=None):
        return self.env.reset(), {}

    def render(self):
        return self.env.viewer.render(return_rgb_array=True)

    def close(self):
        self.env.close()

Add helper function lane_reward(cte, speed) returning 1.0 - abs(cte) - 0.1 * (1.0 - speed / speed_ref).

3. Train a PPO Agent

Script train_ppo.py:

import gymnasium as gym
from stable_baselines3 import PPO
from stable_baselines3.common.callbacks import EvalCallback
from fri_rl.envs.donkey_lane import LaneFollowEnv

env = LaneFollowEnv()
model = PPO('CnnPolicy', env, learning_rate=3e-4, n_steps=2048, batch_size=256, verbose=1)
eval_env = LaneFollowEnv()
callback = EvalCallback(eval_env, best_model_save_path='artifacts/', eval_freq=10000, deterministic=True)
model.learn(total_timesteps=1_000_000, callback=callback)
model.save('artifacts/ppo_lane_follow')

Monitor training with TensorBoard (tensorboard --logdir logs/).
Expect convergence within 0.5–1.0 million steps depending on reward shaping.

4. Evaluate and Visualize

Run scripted evaluations to calculate mean cross-track error (CTE) and episode length.
Record videos (model.get_env().render()) to present qualitative results.
Analyze failure cases (sharp turns, lighting changes) and adjust reward or curriculum.

5. Plan Sim-to-Real Transfer

Compare observation modalities: if the real car uses grayscale images, replicate in sim.
Apply domain randomization (textures, lighting) during training to improve robustness.
Export the policy (model.policy) and convert it to TorchScript for ROS 2 deployment.

6. Safety Considerations

Before real-world trials, test the RL policy in a hardware-in-the-loop setup with velocity caps.
Keep manual override ready and limit gains to reduce aggressive maneuvers.
Use the MLP policy as a fallback if the RL controller diverges.

Wrap-Up

Archive training logs, best-performing checkpoints, and environment configuration files. Document lesson learned on reward shaping and domain randomization to inform future sim-to-real experiments.