Long-Horizon Global Trajectory Planning Environment#
The LongHorizonGlobalTrajectoryPlanningEnv is a simple trajectory planning
environment that should be understood as an example of how trajectory planning with
MagLev systems can look like. This environment is therefore intended for
parameter or algorithm tests.
The aim is to learn to control multiple movers that start from random (x,y) positions and continuously receive new goal positions as they complete their current tasks. When a mover reaches its goal, it is immediately assigned a new goal position, allowing it to continue working without waiting for other movers. The environment runs indefinitely until either a collision occurs or no mover reaches any goal within a specified timeout period. A collision is detected, if any two movers collide or if at least one mover leaves the tiles (think of this as a collision with a wall). Control is achieved by specifying either the jerk or the acceleration. In this environment, positions, velocities, accelerations, and jerks have the units m, m/s, m/s² and m/s³, respectively.
Observation Space#
The observation space of this environment is a dictionary containing the following keys and values:
Key |
Value |
|---|---|
observation |
|
achieved_goal |
a numpy array of shape (num_movers*2,) containing the current (x,y)-positions of all movers w.r.t. the frame ((x,y)-pos mover 1, (x,y)-pos mover 2, …) |
desired_goal |
a numpy array of shape (num_movers*2,) containing the desired (x,y)-positions of all movers w.r.t the base frame ((x,y) goal pos mover 1, (x,y) goal pos mover 2, …) |
Action Space#
The action space is continuous. If learn_jerk=True, an action
represents the desired jerks for each mover in x and y direction of the base frame (unit: m/s³), where
j_max is the maximum possible jerk (see environment parameters) and n denotes the number of movers.
Accordingly, if learn_jerk=False, an action
represents the accelerations for each mover in x and y direction of the base frame (unit: m/s²), where
a_max is the maximum possible acceleration (see environment parameters) and n denotes the number of movers.
Immediate Rewards#
The agent receives a reward similar to number of movers that reached their goal positions without collisions within this timestep. In case of a collision, the agent receives a reward of -10. If in a timestep no mover has reached its goal and no collisions have been detected, a reward of -1 is given.
Episode Termination and Truncation#
Episodes are designed to run indefinitely with goals being continuously resampled. An episode only terminates under two conditions:
- Collision:
If there is a collision, the episode terminates immediately.
- Goal timeout:
If no mover reaches any goal within a specified time limit (defined by the
timeout_stepsparameter, which defaults to 50 steps), the episode terminates. The timeout counter resets each time any mover reaches a goal.
When any individual mover reaches its current goal, a new goal is automatically generated for that specific mover and the episode continues indefinitely, unless one of the termination conditions above is met.
Environment Reset#
When the environment is reset, all movers are randomly positioned within the defined workspace boundaries, and their corresponding goal positions are also randomly sampled within the same workspace.
Basic Usage#
The following example shows how to train an agent using Stable-Baselines3. To use the example, please install Stable-Baselines3 as described in the documentation.
Note
This is a simplified example that is not guaranteed to converge, as the default parameters are used. However, it is important to note that
the parameter copy_info_dict is set to True. This way, it is not necessary to check for collision again to compute the reward when a
transition is relabeled by HER, since the information is already available in the info-dict.
import numpy as np
import gymnasium as gym
from stable_baselines3 import SAC, HerReplayBuffer
import magbotsim
gym.register_envs(magbotsim)
render_mode = None
mover_params = {'size': np.array([0.113 / 2, 0.113 / 2, 0.012 / 2]), 'mass': 0.628}
collision_params = {'shape': 'box', 'size': np.array([0.113 / 2 + 1e-6, 0.113 / 2 + 1e-6]), 'offset': 0.0, 'offset_wall': 0.0}
env_params = {
'layout_tiles_list': [np.ones((5,5))],
'num_movers': 5,
'show_2D_plot': False,
'mover_params': mover_params,
'collision_params': collision_params,
'render_mode': render_mode
}
env = gym.make('LongHorizonGlobalTrajectoryPlanningEnv-v0', **env_params)
# copy_info_dict=True, as information about collisions is stored in the info dictionary to avoid
# computationally expensive collision checking calculations when the data is relabeled (HER)
model = SAC(
policy='MultiInputPolicy',
env=env,
replay_buffer_class=HerReplayBuffer,
replay_buffer_kwargs={'copy_info_dict': True},
verbose=1
)
model.learn(total_timesteps=int(1e6))
Version History#
v0: initial version of the environment
Parameters#
- class magbotsim.rl_envs.trajectory_planning.long_horizon_global_trajectory_planning_env.LongHorizonGlobalTrajectoryPlanningEnv(layout_tiles: ndarray, num_movers: int, show_2D_plot: bool, mover_colors_2D_plot: list[str] | None = None, tile_params: dict[str, Any] | None = None, mover_params: dict[str, Any] | None = None, initial_mover_zpos: float = 0.003, std_noise: ndarray | float = 1e-05, render_mode: str | None = 'human', render_every_cycle: bool = False, num_cycles: int = 40, collision_params: dict[str, Any] | None = None, v_max: float = 2.0, a_max: float = 10.0, j_max: float = 100.0, learn_jerk: bool = False, threshold_pos: float = 0.1, use_mj_passive_viewer: bool = False, timeout_steps: int | None = 50, enable_energy_tracking: bool = False)[source]
Bases:
BasicMagBotSingleAgentEnvA simple planning environment.
- Parameters:
layout_tiles – a numpy arrays of shape (num_tiles_x, num_tiles_y) indicating where to add a tile (use 1 to add a tile and 0 to leave cell empty). The x-axis and y-axis correspond to the axes of the numpy array, so the origin of the base frame is in the upper left corner.
num_movers – the number of movers in the environment
show_2D_plot – whether to show a 2D matplotlib plot (useful for debugging)
mover_colors_2D_plot – a list of matplotlib colors, one for each mover (only used if
show_2D_plot=True), defaults to None. None is only accepted ifshow_2D_plot = False.tile_params –
a dictionary that can be used to specify the mass and size of a tile using the keys ‘mass’ or ‘size’, defaults to None. Since one MagLev system usually only contains tiles of one type, i.e. with the same mass and size, the mass is a single float value and the size must be specified as a numpy array of shape (3,). If set to None or only one key is specified, both mass and size or the missing value are set to the following default values:
mass: 5.6 [kg]
size: [0.24/2, 0.24/2, 0.0352/2] (x,y,z) [m] (note: half-size)
mover_params –
a dictionary that can be used to specify the mass and size of each mover using the keys ‘mass’ or ‘size’, defaults to None. To use the same mass and size for each mover, the mass can be specified as a single float value and the size as a numpy array of shape (3,). However, the movers can also be of different types, i.e. different masses and sizes. In this case, the mass and size should be specified as numpy arrays of shapes (num_movers,) and (num_movers,3), respectively. If set to None or only one key is specified, both mass and size or the missing value are set to the following default values:
mass: 1.24 [kg]
size: [0.155/2, 0.155/2, 0.012/2] (x,y,z) [m] (note: half-size)
initial_mover_zpos – the initial distance between the bottom of the mover and the top of a tile, defaults to 0.003 [m]
std_noise – the standard deviation of a Gaussian with zero mean used to add noise, defaults to 1e-5. The standard deviation can be used to add noise to the mover’s position, velocity and acceleration. If you want to use different standard deviations for position, velocity and acceleration use a numpy array of shape (3,); otherwise use a single float value, meaning the same standard deviation is used for all three values.
render_mode – the mode that is used to render the frames (‘human’, ‘rgb_array’ or None), defaults to ‘human’. If set to None, no viewer is initialized and used, i.e. no rendering. This can be useful to speed up training.
render_every_cycle – whether to call
render()after each integrator step in thestep()method, defaults to False. Rendering every cycle leads to a smoother visualization of the scene, but can also be computationally expensive. Thus, this parameter provides the possibility to speed up training and evaluation. Regardless of this parameter, the scene is always rendered afternum_cycleshave been executed ifrender_mode != None.num_cycles – the number of control cycles for which to apply the same action, defaults to 40
collision_params –
a dictionary that can be used to specify the following collision parameters, defaults to None:
collision shape (key: ‘shape’): can be ‘box’ or ‘circle’, defaults to ‘circle’
- size of the collision shape (key: ‘size’), defaults to 0.11 [m]:
- collision shape ‘circle’:
a single float value which corresponds to the radius of the circle, or a numpy array of shape (num_movers,) to specify individual values for each mover
- collision shape ‘box’:
a numpy array of shape (2,) to specify x and y half-size of the box, or a numpy array of shape (num_movers, 2) to specify individual sizes for each mover
- additional size offset (key: ‘offset’), defaults to 0.0 [m]: an additional safety offset that is added to the size of the
collision shape. Think of this offset as increasing the size of a mover by a safety margin.
- additional wall offset (key: ‘offset_wall’), defaults to 0.0 [m]: an additional safety offset that is added to the size
of the collision shape to detect wall collisions. Think of this offset as moving the wall, i.e. the edge of a tile without an adjacent tile, closer to the center of the tile.
v_max – the maximum velocity, defaults to 2.0 [m/s]
a_max – the maximum acceleration, defaults to 10.0 [m/s²]
j_max – the maximum jerk (only used if
learn_jerk=True), defaults to 100.0 [m/s³]learn_jerk – whether to learn the jerk, defaults to False. If set to False, the acceleration is learned, i.e. the policy output.
threshold_pos – the position threshold used to determine whether a mover has reached its goal position, defaults to 0.1 [m]
use_mj_passive_viewer – whether the MuJoCo passive_viewer should be used, defaults to False. If set to False, the Gymnasium MuJoCo WindowViewer with custom overlays is used.
timeout_steps – the number of steps after which the episode is terminated if no goal is reached by any mover within that specified time limit, defaults to 50
enable_energy_tracking – whether to track energy metrics for benchmarking purposes, defaults to False
- _before_mujoco_step_callback(action: ndarray) None[source]
Apply the next action, i.e. it sets the jerk or acceleration, ensuring the minimum and maximum velocity and acceleration (for one cycle).
- Parameters:
action – a numpy array of shape (num_movers * 2,), which specifies the next action (jerk or acceleration)
- _custom_xml_string_callback(custom_model_xml_strings: dict | None) dict[str, str][source]
For each mover, this callback adds actuators to the
custom_model_xml_strings-dict, depending on whether the jerk or acceleration is the output of the policy.- Parameters:
custom_model_xml_strings – the current
custom_model_xml_strings-dict which is modified by this callback- Returns:
the modified
custom_model_xml_strings-dict
- _on_step_end_callback(observation: dict[str, ndarray] | ndarray) None[source]
Callback executed at the end of every step to handle goal completion and resampling.
This callback increments the step counter and checks if any mover has reached its goal. When a mover reaches its goal (distance <= threshold_pos), new goal positions are automatically sampled for those movers and the episode continues indefinitely. The callback also updates the timestamp of when the last goal was reached, which is used for timeout detection.
- Parameters:
observation – the current observation dictionary containing ‘achieved_goal’ and ‘desired_goal’ arrays
- _render_callback() None[source]
Update the Matplotlib2DViewer if
show_2D_plot=True.
- _reset_callback(options: dict[str, Any] | None = None) None[source]
Reset the start and goal positions of all movers and reload the model. It is also checked whether the start positions are collision-free (mover and wall collisions) and whether the new goals can be reached without mover or wall collisions.
- Parameters:
options – not used in this environment
- close() None[source]
Close the environment.
- compute_reward(achieved_goal: ndarray, desired_goal: ndarray, info: dict[str, Any] | None = None) ndarray | float[source]
Compute the immediate reward.
- Parameters:
achieved_goal – a numpy array of shape (batch_size, length achieved_goal) or (length achieved_goal,) containing the (x,y)-positions already achieved
desired_goal – a numpy array of shape (batch_size, length desired_goal) or (length desired_goal,) containing the (x,y) goal positions of all movers
info – a dictionary containing auxiliary information, defaults to None
- Returns:
a single float value or a numpy array of shape (batch_size,) containing the immediate rewards
- compute_terminated(achieved_goal: ndarray, desired_goal: ndarray, info: dict[str, Any] | None = None) ndarray | bool[source]
Check whether a terminal state is reached. A state is terminal when there is a collision between two movers or between a mover and a wall.
- Parameters:
achieved_goal – a numpy array of shape (batch_size, length achieved_goal) or (length achieved_goal,) containing the (x,y)-positions already achieved
desired_goal – a numpy array of shape (batch_size, length desired_goal) or (length desired_goal,) containing the (x,y) goal positions of all movers
info – a dictionary containing auxiliary information, defaults to None
- Returns:
- if batch_size > 1:
a numpy array of shape (batch_size,). An entry is True if the state is terminal, False otherwise
- if batch_size = 1:
True if the state is terminal, False otherwise
- compute_truncated(achieved_goal: ndarray, desired_goal: ndarray, info: dict[str, Any] | None = None) ndarray | bool[source]
Check whether the truncation condition is satisfied. Since the environment continues indefinitely, there is no truncation, unless the
timeout_stepsparameter is set, in which case an episode is truncated if no mover has reached any goal within the lasttimeout_stepssteps.- Parameters:
achieved_goal – a numpy array of shape (batch_size, length achieved_goal) or (length achieved_goal,) containing the (x,y)-positions already achieved
desired_goal – a numpy array of shape (batch_size, length desired_goal) or (length desired_goal,) containing the (x,y) goal positions of all movers
info – a dictionary containing auxiliary information, defaults to None
- Returns:
- if batch_size > 1:
a numpy array of shape (batch_size,) indicating whether truncation should occur
- if batch_size = 1:
True if truncation should occur due to timeout, False otherwise
- ensure_max_dyn_val(current_values: ndarray, max_value: float, next_derivs: ndarray) tuple[ndarray, ndarray][source]
Ensure the minimum and maximum dynamic values.
- Parameters:
current_values – the current velocity or acceleration specified as a numpy array of shape (2,) or (num_checks,2)
max_value – the maximum velocity or acceleration (float)
next_derivs – the next derivative (acceleration or jerk) used for one integrator step specified as a numpy array of shape (2,) or (num_checks,2)
- Returns:
the next velocity or acceleration and the next derivative (acceleration or jerk) corresponding to the next action that must be applied to ensure the minimum and maximum dynamics (each of shape (num_checks,2))
- reload_model(mover_start_xy_pos: ndarray, mover_goal_xy_pos: ndarray) None[source]
Generate a new model XML string with new start and goal positions and reload the model. In this environment, it is necessary to reload the model to ensure that the actuators work as expected.
- Parameters:
mover_start_xy_pos – a numpy array of shape (num_movers,2) containing the (x,y) starting positions of each mover.
mover_goal_xy_pos – a numpy array of shape (num_movers_with_goals,2) containing the (x,y) goal positions of the movers (num_movers_with_goals <= num_movers)
- update_cached_actuator_mujoco_data() None[source]
Update all cached information about MuJoCo actuators, such as names and ids.
- class magbotsim.rl_envs.trajectory_planning.long_horizon_global_trajectory_planning_env.LongHorizonGlobalTrajectoryPlanningEnvB0(show_2D_plot: bool, mover_colors_2D_plot: list[str] | None = None, tile_params: dict[str, Any] | None = None, mover_params: dict[str, Any] | None = None, initial_mover_zpos: float = 0.003, std_noise: ndarray | float = 1e-05, render_mode: str | None = 'human', render_every_cycle: bool = False, num_cycles: int = 40, collision_params: dict[str, Any] | None = None, v_max: float = 2, a_max: float = 10, j_max: float = 100, learn_jerk: bool = False, threshold_pos: float = 0.1, use_mj_passive_viewer: bool = False, timeout_steps: int | None = 50, enable_energy_tracking: bool = False)[source]
Bases:
LongHorizonGlobalTrajectoryPlanningEnv
- class magbotsim.rl_envs.trajectory_planning.long_horizon_global_trajectory_planning_env.LongHorizonGlobalTrajectoryPlanningEnvB1(show_2D_plot: bool, mover_colors_2D_plot: list[str] | None = None, tile_params: dict[str, Any] | None = None, mover_params: dict[str, Any] | None = None, initial_mover_zpos: float = 0.003, std_noise: ndarray | float = 1e-05, render_mode: str | None = 'human', render_every_cycle: bool = False, num_cycles: int = 40, collision_params: dict[str, Any] | None = None, v_max: float = 2, a_max: float = 10, j_max: float = 100, learn_jerk: bool = False, threshold_pos: float = 0.1, use_mj_passive_viewer: bool = False, timeout_steps: int | None = 50, enable_energy_tracking: bool = False)[source]
Bases:
LongHorizonGlobalTrajectoryPlanningEnv
- class magbotsim.rl_envs.trajectory_planning.long_horizon_global_trajectory_planning_env.LongHorizonGlobalTrajectoryPlanningEnvB2(show_2D_plot: bool, mover_colors_2D_plot: list[str] | None = None, tile_params: dict[str, Any] | None = None, mover_params: dict[str, Any] | None = None, initial_mover_zpos: float = 0.003, std_noise: ndarray | float = 1e-05, render_mode: str | None = 'human', render_every_cycle: bool = False, num_cycles: int = 40, collision_params: dict[str, Any] | None = None, v_max: float = 2, a_max: float = 10, j_max: float = 100, learn_jerk: bool = False, threshold_pos: float = 0.1, use_mj_passive_viewer: bool = False, timeout_steps: int | None = 50, enable_energy_tracking: bool = False)[source]
Bases:
LongHorizonGlobalTrajectoryPlanningEnv
- class magbotsim.rl_envs.trajectory_planning.long_horizon_global_trajectory_planning_env.LongHorizonGlobalTrajectoryPlanningEnvB3(show_2D_plot: bool, mover_colors_2D_plot: list[str] | None = None, tile_params: dict[str, Any] | None = None, mover_params: dict[str, Any] | None = None, initial_mover_zpos: float = 0.003, std_noise: ndarray | float = 1e-05, render_mode: str | None = 'human', render_every_cycle: bool = False, num_cycles: int = 40, collision_params: dict[str, Any] | None = None, v_max: float = 2, a_max: float = 10, j_max: float = 100, learn_jerk: bool = False, threshold_pos: float = 0.1, use_mj_passive_viewer: bool = False, timeout_steps: int | None = 50, enable_energy_tracking: bool = False)[source]
Bases:
LongHorizonGlobalTrajectoryPlanningEnv
- class magbotsim.rl_envs.trajectory_planning.long_horizon_global_trajectory_planning_env.LongHorizonGlobalTrajectoryPlanningEnvB4(show_2D_plot: bool, mover_colors_2D_plot: list[str] | None = None, tile_params: dict[str, Any] | None = None, mover_params: dict[str, Any] | None = None, initial_mover_zpos: float = 0.003, std_noise: ndarray | float = 1e-05, render_mode: str | None = 'human', render_every_cycle: bool = False, num_cycles: int = 40, collision_params: dict[str, Any] | None = None, v_max: float = 2, a_max: float = 10, j_max: float = 100, learn_jerk: bool = False, threshold_pos: float = 0.1, use_mj_passive_viewer: bool = False, timeout_steps: int | None = 50, enable_energy_tracking: bool = False)[source]
Bases:
LongHorizonGlobalTrajectoryPlanningEnv
- class magbotsim.rl_envs.trajectory_planning.long_horizon_global_trajectory_planning_env.LongHorizonGlobalTrajectoryPlanningEnvB5(show_2D_plot: bool, mover_colors_2D_plot: list[str] | None = None, tile_params: dict[str, Any] | None = None, mover_params: dict[str, Any] | None = None, initial_mover_zpos: float = 0.003, std_noise: ndarray | float = 1e-05, render_mode: str | None = 'human', render_every_cycle: bool = False, num_cycles: int = 40, collision_params: dict[str, Any] | None = None, v_max: float = 2, a_max: float = 10, j_max: float = 100, learn_jerk: bool = False, threshold_pos: float = 0.1, use_mj_passive_viewer: bool = False, timeout_steps: int | None = 50, enable_energy_tracking: bool = False)[source]
Bases:
LongHorizonGlobalTrajectoryPlanningEnv
- class magbotsim.rl_envs.trajectory_planning.long_horizon_global_trajectory_planning_env.LongHorizonGlobalTrajectoryPlanningEnvB6(show_2D_plot: bool, mover_colors_2D_plot: list[str] | None = None, tile_params: dict[str, Any] | None = None, mover_params: dict[str, Any] | None = None, initial_mover_zpos: float = 0.003, std_noise: ndarray | float = 1e-05, render_mode: str | None = 'human', render_every_cycle: bool = False, num_cycles: int = 40, collision_params: dict[str, Any] | None = None, v_max: float = 2, a_max: float = 10, j_max: float = 100, learn_jerk: bool = False, threshold_pos: float = 0.1, use_mj_passive_viewer: bool = False, timeout_steps: int | None = 50, enable_energy_tracking: bool = False)[source]
Bases:
LongHorizonGlobalTrajectoryPlanningEnv
- class magbotsim.rl_envs.trajectory_planning.long_horizon_global_trajectory_planning_env.LongHorizonGlobalTrajectoryPlanningEnvB7(show_2D_plot: bool, mover_colors_2D_plot: list[str] | None = None, tile_params: dict[str, Any] | None = None, mover_params: dict[str, Any] | None = None, initial_mover_zpos: float = 0.003, std_noise: ndarray | float = 1e-05, render_mode: str | None = 'human', render_every_cycle: bool = False, num_cycles: int = 40, collision_params: dict[str, Any] | None = None, v_max: float = 2, a_max: float = 10, j_max: float = 100, learn_jerk: bool = False, threshold_pos: float = 0.1, use_mj_passive_viewer: bool = False, timeout_steps: int | None = 50, enable_energy_tracking: bool = False)[source]
Bases:
LongHorizonGlobalTrajectoryPlanningEnv
- class magbotsim.rl_envs.trajectory_planning.long_horizon_global_trajectory_planning_env.LongHorizonGlobalTrajectoryPlanningEnvB8(show_2D_plot: bool, mover_colors_2D_plot: list[str] | None = None, tile_params: dict[str, Any] | None = None, mover_params: dict[str, Any] | None = None, initial_mover_zpos: float = 0.003, std_noise: ndarray | float = 1e-05, render_mode: str | None = 'human', render_every_cycle: bool = False, num_cycles: int = 40, collision_params: dict[str, Any] | None = None, v_max: float = 2, a_max: float = 10, j_max: float = 100, learn_jerk: bool = False, threshold_pos: float = 0.1, use_mj_passive_viewer: bool = False, timeout_steps: int | None = 50, enable_energy_tracking: bool = False)[source]
Bases:
LongHorizonGlobalTrajectoryPlanningEnv
- class magbotsim.rl_envs.trajectory_planning.long_horizon_global_trajectory_planning_env.LongHorizonGlobalTrajectoryPlanningEnvB9(show_2D_plot: bool, mover_colors_2D_plot: list[str] | None = None, tile_params: dict[str, Any] | None = None, mover_params: dict[str, Any] | None = None, initial_mover_zpos: float = 0.003, std_noise: ndarray | float = 1e-05, render_mode: str | None = 'human', render_every_cycle: bool = False, num_cycles: int = 40, collision_params: dict[str, Any] | None = None, v_max: float = 2, a_max: float = 10, j_max: float = 100, learn_jerk: bool = False, threshold_pos: float = 0.1, use_mj_passive_viewer: bool = False, timeout_steps: int | None = 50, enable_energy_tracking: bool = False)[source]
Bases:
LongHorizonGlobalTrajectoryPlanningEnv