Benchmark Planning Environment#
The BenchmarkPlanningEnv
is a simple motion planning environment that should be understood as an example of how motion planning
with planar motor systems can look like. This environment is therefore intended for parameter or algorithm tests.
The aim is to learn to move all movers from random (x,y) start positions to variable (x,y) goal positions without collisions by specifying either the jerk or the acceleration. In this environment, positions, velocities, accelerations, and jerks have the units m, m/s, m/s² and m/s³, respectively.
Observation Space#
The observation space of this environment is a dictionary containing the following keys and values:
key |
value |
---|---|
observation |
|
achieved_goal |
a numpy array of shape (num_movers*2,) containing the current (x,y)-positions of all movers w.r.t. the frame ((x,y)-pos mover 1, (x,y)-pos mover 2, …) |
desired_goal |
a numpy array of shape (num_movers*2,) containing the desired (x,y)-positions of all movers w.r.t the base frame ((x,y) goal pos mover 1, (x,y) goal pos mover 2, …) |
Action Space#
The action space is continuous. If learn_jerk=True
, an action
represents the desired jerks for each mover in x and y direction of the base frame (unit: m/s³), where
j_max
is the maximum possible jerk (see environment parameters) and n denotes the number of movers.
Accordingly, if learn_jerk=False
, an action
represents the accelerations for each mover in x and y direction of the base frame (unit: m/s²), where
a_max
is the maximum possible acceleration (see environment parameters) and n denotes the number of movers.
Immediate Rewards#
The agent receives a reward of 50 if all movers reach their goals without collisions. In case of a collision either with another mover or with a wall, the agent receives a reward of -50. For each timestep in which at least one mover has not reached its goal and in which there is no collision, the environment emits the following immediate reward: number of movers that have not reached their goals * (-1)
Episode Termination#
Each episode has a time limit of 50 environment steps. If the time limit is reached, the episode is truncated. Thus, each episode has 50 environment steps, except that all movers have reached their goals or there has been a collision. In these cases, the episode terminates immediately, regardless of the time limit.
Version History#
v0: initial version of the environment
Parameters#
- class gymnasium_planar_robotics.envs.planning.benchmark_planning_env.BenchmarkPlanningEnv(layout_tiles: ndarray, num_movers: int, show_2D_plot: bool, mover_colors_2D_plot: list[str] | None = None, tile_params: dict[str, any] | None = None, mover_params: dict[str, any] | None = None, initial_mover_zpos: float = 0.003, std_noise: ndarray | float = 1e-05, render_mode: str | None = 'human', render_every_cycle: bool = False, num_cycles: int = 40, collision_params: dict[str, any] | None = None, v_max: float = 2.0, a_max: float = 10.0, j_max: float = 100.0, learn_jerk: bool = False, threshold_pos: float = 0.1, use_mj_passive_viewer: bool = False)[source]
A simple planning environment.
- Parameters:
layout_tiles – a numpy array of shape (num_tiles_x, num_tiles_y) indicating where to add a tile (use 1 to add a tile and 0 to leave cell empty). The x-axis and y-axis correspond to the axes of the numpy array, so the origin of the base frame is in the upper left corner.
num_movers – the number of movers to add
show_2D_plot – whether to show a 2D matplotlib plot (useful for debugging)
mover_colors_2D_plot – a list of matplotlib colors, one for each mover (only used if
show_2D_plot=True
), defaults to None. None is only accepted ifshow_2D_plot = False
.tile_params –
a dictionary that can be used to specify the mass and size of a tile using the keys ‘mass’ or ‘size’, defaults to None. Since one planar motor system usually only contains tiles of one type, i.e. with the same mass and size, the mass is a single float value and the size must be specified as a numpy array of shape (3,). If set to None or only one key is specified, both mass and size or the missing value are set to the following default values:
mass: 5.6 [kg]
size: [0.24/2, 0.24/2, 0.0352/2] (x,y,z) [m] (note: half-size)
mover_params –
a dictionary that can be used to specify the mass and size of each mover using the keys ‘mass’ or ‘size’, defaults to None. To use the same mass and size for each mover, the mass can be specified as a single float value and the size as a numpy array of shape (3,). However, the movers can also be of different types, i.e. different masses and sizes. In this case, the mass and size should be specified as numpy arrays of shapes (num_movers,) and (num_movers,3), respectively. If set to None or only one key is specified, both mass and size or the missing value are set to the following default values:
mass: 1.24 [kg]
size: [0.155/2, 0.155/2, 0.012/2] (x,y,z) [m] (note: half-size)
initial_mover_zpos – the initial distance between the bottom of the mover and the top of a tile, defaults to 0.003 [m]
std_noise – the standard deviation of a Gaussian with zero mean used to add noise, defaults to 1e-5. The standard deviation can be used to add noise to the mover’s position, velocity and acceleration. If you want to use different standard deviations for position, velocity and acceleration use a numpy array of shape (3,); otherwise use a single float value, meaning the same standard deviation is used for all three values.
render_mode – the mode that is used to render the frames (‘human’, ‘rgb_array’ or None), defaults to ‘human’. If set to None, no viewer is initialized and used, i.e. no rendering. This can be useful to speed up training.
render_every_cycle – whether to call ‘render’ after each integrator step in the
step()
method, defaults to False. Rendering every cycle leads to a smoother visualization of the scene, but can also be computationally expensive. Thus, this parameter provides the possibility to speed up training and evaluation. Regardless of this parameter, the scene is always rendered after ‘num_cycles’ have been executed if ‘render_mode != None’.num_cycles – the number of control cycles for which to apply the same action, defaults to 40
collision_params –
a dictionary that can be used to specify the following collision parameters, defaults to None:
collision shape (key: ‘shape’): can be ‘box’ or ‘circle’, defaults to ‘circle’
size of the collision shape (key: ‘size’), defaults to 0.11 [m]:
- collision shape ‘circle’:
a single float value which corresponds to the radius of the circle, or a numpy array of shape (num_movers,) to specify individual values for each mover
- collision shape ‘box’:
a numpy array of shape (2,) to specify x and y half-size of the box, or a numpy array of shape (num_movers, 2) to specify individual sizes for each mover
- additional size offset (key: ‘offset’), defaults to 0.0 [m]: an additional safety offset that is added to the size of the
collision shape. Think of this offset as increasing the size of a mover by a safety margin.
- additional wall offset (key: ‘offset_wall’), defaults to 0.0 [m]: an additional safety offset that is added to the size
of the collision shape to detect wall collisions. Think of this offset as moving the wall, i.e. the edge of a tile without an adjacent tile, closer to the center of the tile.
v_max – the maximum velocity, defaults to 2.0 [m/s]
a_max – the maximum acceleration, defaults to 10.0 [m/s²]
j_max – the maximum jerk (only used if
learn_jerk=True
), defaults to 100.0 [m/s³]learn_jerk – whether to learn the jerk, defaults to False. If set to False, the acceleration is learned, i.e. the policy output.
threshold_pos – the position threshold used to determine whether a mover has reached its goal position, defaults to 0.1 [m]
use_mj_passive_viewer – whether the MuJoCo passive_viewer should be used, defaults to False. If set to False, the Gymnasium MuJoCo WindowViewer with custom overlays is used.
- check_mover_collision(mover_names: list[str], c_size: float | ndarray, add_safety_offset: bool = False, mover_qpos: ndarray | None = None, add_qpos_noise: bool = False) bool
Check whether two movers specified in
mover_names
collide. In case of collision shape ‘box’, this method takes the orientation of the movers into account.- Parameters:
mover_names – a list of mover names that should be checked (correspond to the body name of the mover in the MuJoCo model)
c_size –
the size of the collision shape of the movers
- collision_shape = ‘circle’:
use a single float value to specify the same size for all movers and a numpy array of shape (num_movers,) to specify individual sizes for each mover
- collision_shape = ‘box’:
use a numpy array of shape (2,) to specify the same size for all movers and a numpy array of shape (num_movers,2) to specify individual sizes for each mover
add_safety_offset – whether to add the size offset (can be specified using: collision_params[“offset”]), defaults to False. Note that the same size offset is added for both movers.
mover_qpos – the qpos of the movers specified as a numpy array of shape (num_movers,7) (x_p,y_p,z_p,w_o,x_o,y_o,z_o). If set to None, the current qpos of the movers in the MuJoCo model is used; defaults to None
add_qpos_noise – whether to add Gaussian noise to the qpos of the movers, defaults to False. Only used if mover_qpos is not None.
- Returns:
True if the movers collide, False otherwise
- check_wall_collision(mover_names: list[str], c_size: float | ndarray, add_safety_offset: bool = False, mover_qpos: ndarray | None = None, add_qpos_noise: bool = False) ndarray
Check whether the qpos of the movers listed in
mover_names
are valid, i.e. no wall collisions.- Parameters:
mover_names – a list of mover names that should be checked (correspond to the body name of the mover in the MuJoCo model)
c_size –
the size of the collision shape
- collision_shape = ‘circle’:
use a single float value to specify the same size for all movers and a numpy array of shape (num_movers,) to specify individual sizes for each mover
- collision_shape = ‘box’:
use a numpy array of shape (2,) to specify the same size for all movers and a numpy array of shape (num_movers,2) to specify individual sizes for each mover
add_safety_offset – whether to add the size offset (can be specified using: collision_params[“offset”]), defaults to False. Note that the same size offset is added for all movers.
mover_qpos – a numpy array of shape (num_qpos,7) containing the qpos (x_p,y_p,z_p,w_o,x_o,y_o,z_o) of each mover or None. If set to None, the current qpos of each mover in the MuJoCo model is used; defaults to None
add_qpos_noise – whether to add Gaussian noise to the qpos of the movers, defaults to False. Only used if mover_qpos is not None.
- Returns:
a numpy array of shape (num_movers,), where an element is 1 if the qpos is valid (no wall collision), otherwise 0
- close() None [source]
Close the environment.
- compute_reward(achieved_goal: ndarray, desired_goal: ndarray, info: dict[str, any] | None = None) ndarray | float [source]
Compute the immediate reward.
- Parameters:
achieved_goal – a numpy array of shape (batch_size, length achieved_goal) or (length achieved_goal,) containing the (x,y)-positions already achieved
desired_goal – a numpy array of shape (batch_size, length desired_goal) or (length desired_goal,) containing the (x,y) goal positions of all movers
info – a dictionary containing auxiliary information, defaults to None
- Returns:
a single float value or a numpy array of shape (batch_size,) containing the immediate rewards
- compute_terminated(achieved_goal: ndarray, desired_goal: ndarray, info: dict[str, any] | None = None) ndarray | bool [source]
Check whether a terminal state is reached. A state is terminal when there is a collision between two movers or between a mover and a wall or when all movers have reached their goals without collisions.
- Parameters:
achieved_goal – a numpy array of shape (batch_size, length achieved_goal) or (length achieved_goal,) containing the (x,y)-positions already achieved
desired_goal – a numpy array of shape (batch_size, length desired_goal) or (length desired_goal,) containing the (x,y) goal positions of all movers
info – a dictionary containing auxiliary information, defaults to None
- Returns:
- if batch_size > 1:
a numpy array of shape (batch_size,). An entry is True if the state is terminal, False otherwise
- if batch_size = 1:
True if the state is terminal, False otherwise
- compute_truncated(achieved_goal: ndarray, desired_goal: ndarray, info: dict[str, any] | None = None) ndarray | bool [source]
Check whether the truncation condition is satisfied. The truncation condition (a time limit in this environment) is automatically checked by the Gymnasium TimeLimit Wrapper, which is why this method always returns False.
- Parameters:
achieved_goal – a numpy array of shape (batch_size, length achieved_goal) or (length achieved_goal,) containing the (x,y)-positions already achieved
desired_goal – a numpy array of shape (batch_size, length desired_goal) or (length desired_goal,) containing the (x,y) goal positions of all movers
info – a dictionary containing auxiliary information, defaults to None
- Returns:
- if batch_size > 1:
a numpy array of shape (batch_size,) in which all entries are False
- if batch_size = 1:
False
- ensure_max_dyn_val(current_values: ndarray, max_value: float, next_derivs: ndarray) tuple[ndarray, ndarray] [source]
Ensure the minimum and maximum dynamic values.
- Parameters:
current_values – the current velocity or acceleration specified as a numpy array of shape (2,) or (num_checks,2)
max_value – the maximum velocity or acceleration (float)
next_derivs – the next derivative (acceleration or jerk) used for one integrator step specified as a numpy array of shape (2,) or (num_checks,2)
- Returns:
the next velocity or acceleration and the next derivative (acceleration or jerk) corresponding to the next action that must be applied to ensure the minimum and maximum dynamics (each of shape (num_checks,2))
- generate_model_xml_string(mover_start_xy_pos: ndarray | None = None, mover_goal_xy_pos: ndarray | None = None, custom_xml_strings: dict[str, str] = None) str
Generate a MuJoCo model xml string based on the mover-tile configuration of the environment.
- Parameters:
mover_start_xy_pos – a numpy array of shape (num_movers,2) containing the (x,y) starting positions of each mover. If set to None, the movers will be placed in the center of a tile, i.e. the number of tiles must be >= the number of movers; defaults to None.
mover_goal_xy_pos – a numpy array of shape (num_movers_with_goals,2) containing the (x,y) goal positions of the movers (num_movers_with_goals <= num_movers). Note that only the first 6 movers have different colors to make the movers clearly distinguishable. Movers without goals are shown in gray. If set to None, no goals will be displayed and all movers are colored in gray; defaults to None
custom_xml_strings –
a dictionary containing additional xml strings to provide the ability to add actuators, sensors, objects, robots, etc. to the model. The keys determine where to add a string in the xml structure and the values contain the xml string to add. The following keys are accepted:
- ’custom_compiler_xml_str’:
A custom ‘compiler’ xml element. Note that the entire default ‘compiler’ element is replaced.
- ’custom_visual_xml_str’:
A custom ‘visual’ xml element. Note that the entire default ‘visual’ element is replaced.
- ’custom_option_xml_str’:
A custom ‘option’ xml element. Note that the entire default ‘option’ element is replaced.
- ’custom_assets_xml_str’:
This xml string adds elements to the ‘asset’ grouping element.
- ’custom_default_xml_str’:
This xml string adds elements to the ‘default’ grouping element.
- ’custom_worldbody_xml_str’:
This xml string adds elements to the ‘worldbody’ grouping element.
- ’custom_outworldbody_xml_str’:
This xml string should be used to include files or add elements other than ‘compiler’, ‘visual’, ‘option’, ‘asset’, ‘default’ or ‘worldbody’.
If set to None, only the basic xml string is generated, containing tiles, movers (excluding actuators), and possibly goals; defaults to None
- Returns:
MuJoCo model xml string
- get_c_size_arr(c_size: float | ndarray, num_reps: int) ndarray
Return the size of the collision shape as a numpy array of shape (num_reps,1) or (num_reps,2) depending on the collision shape. This method should be used to obtain the appropriate c_size_arr if the same size is to be used for all movers.
- Parameters:
c_size –
the size of the collision shape:
- collision_shape = ‘circle’:
use a single float value to specify the same size for all movers and a numpy array of shape (num_reps,) to specify individual sizes for each mover
- collision_shape = ‘box’:
use a numpy array of shape (2,) to specify the same size for all movers and a numpy array of shape (num_reps,2) to specify individual sizes for each mover
num_reps – the number of repetitions of c_size if the same size of collision shape is to be used for all movers. Otherwise, this value is ignored.
- Returns:
the collision shape sizes as a numpy array of a suitable shape:
- collision_shape = ‘circle’:
a numpy array of shape (num_reps,1)
- collision_shape = ‘box’:
a numpy array of shape (num_reps,2) if c_size is a numpy array of shape (2,). Otherwise, c_size is not modified.
- get_mover_qacc(mover_name: str, add_noise: bool = False) ndarray
Returns the linear and angular acceleration (qacc) of the desired mover.
- Parameters:
mover_name – name of the mover for which the acceleration should be returned (corresponds to the body name of the mover in the MuJoCo model)
add_noise – whether to add Gaussian noise, defaults to False
- Returns:
linear and angular acceleration of the mover (x,y,z,a,b,c)
- get_mover_qacc_arr(mover_names: list[str], add_noise: bool = False) ndarray
Return the qacc of several movers as a numpy array of shape (num_movers,6).
- Parameters:
mover_names – a list of mover names for which the qacc should be returned (correspond to the body name of the mover in the MuJoCo model)
add_noise – whether to add Gaussian noise to the qacc of the movers, defaults to False
- Returns:
a numpy array of shape (num_movers,6) containing the qacc (x,y,z,a,b,c) of each mover. The order of the qacc corresponds to the order of the mover names.
- get_mover_qpos(mover_name: str, add_noise: bool = False) ndarray
Returns the position and orientation of the desired mover. The orientation is returned as a quaternion (w,x,y,z). Note that the z-pos is the distance between the bottom of the mover and the top of a tile. In contrast, the z-pos in the MuJoCo model is the previously mentioned distance + half the height of a mover.
- Parameters:
mover_name – name of the mover for which the position and orientation should be returned (corresponds to the body name of the mover in the MuJoCo model)
add_noise – whether to add Gaussian noise, defaults to False
- Returns:
position and orientation of the desired mover (x_p,y_p,z_p,w_o,x_o,y_o,z_o)
- get_mover_qpos_arr(mover_names: list[str], add_noise: bool = False) ndarray
Return the qpos of several movers as a numpy array of shape (num_movers,7).
- Parameters:
mover_names – a list of mover names for which the qpos should be returned (correspond to the body name of the mover in the MuJoCo model)
add_noise – whether to add Gaussian noise to the qpos of the movers, defaults to False
- Returns:
a numpy array of shape (num_movers,7) containing the qpos (x_p,y_p,z_p,w_o,x_o,y_o,z_o) of each mover. The order of the qpos corresponds to the order of the mover names.
- get_mover_qvel(mover_name: str, add_noise: bool = False) ndarray
Return the linear and angular velocities (qvel) of the desired mover.
- Parameters:
mover_name – name of the mover for which the velocity should be returned (corresponds to the body name of the mover in the MuJoCo model)
add_noise – whether to add Gaussian noise, defaults to False
- Returns:
linear and angular velocities of the mover (x,y,z,a,b,c)
- get_mover_qvel_arr(mover_names: list[str], add_noise: bool = False) ndarray
Return the qvel of several movers as a numpy array of shape (num_movers,6).
- Parameters:
mover_names – a list of mover names for which the qvel should be returned (correspond to the body name of the mover in the MuJoCo model)
add_noise – whether to add Gaussian noise to the qvel of the movers, defaults to False
- Returns:
a numpy array of shape (num_movers,6) containing the qvel (x,y,z,a,b,c) of each mover. The order of the qvel corresponds to the order of the mover names.
- get_tile_indices_mask(mask: ndarray) tuple[ndarray, ndarray]
Find the x and y indices of tiles that correspond to the specified structure (the mask) in the tile layout. Note that the indices of the top left tile in the mask are returned.
- Parameters:
mask – a 2D numpy array containing only 0 and 1 which specifies the structure to be found in the tile layout
- Returns:
the x and y indices of the tiles in separate numpy arrays, each of shape (num_mask_found,)
- get_tile_xy_pos() tuple[ndarray, ndarray]
Find the (x,y)-positions of the tiles. The position of a tile in the tile layout with index (i_x,i_y), can be found using
(x-pos[i_x,i_y], y-pos[i_x,i_y])
, where x-pos and y-pos are returned by this method. Note that the base frame is in the upper left corner.- Returns:
the x and y positions of the tiles in separate numpy arrays, each of shape (num_tiles_x, num_tiles_y)
- get_wrapper_attr(name: str) Any
Gets the attribute name from the environment.
- property np_random: Generator
Returns the environment’s internal
_np_random
that if not set will initialise with a random seed.- Returns:
Instances of np.random.Generator
- qpos_is_valid(qpos: ndarray, c_size: float | ndarray, add_safety_offset: bool = False) ndarray
Check whether qpos is valid. This method considers the edges as imaginary walls if there is no other tile next to that edge. A position is valid if it is above a tile and the distance to the walls is greater that the required safety margin, i.e. no collision with a wall. This also ensures that the position is reachable in case the specified position is a goal position.
This method allows to check multiple qpos at the same time, where the movers can be of different sizes. The orientation of the mover is taken into account if collision_shape = ‘box’, otherwise (collision_shape = ‘circle’) the orientation of the mover is ignored.
- Parameters:
qpos – a numpy array of shape (num_qpos,7) containing the qpos (x_p,y_p,z_p,w_o,x_o,y_o,z_o) to be checked
c_size –
the size of the collision shape
- collision_shape = ‘circle’:
use a single float value to specify the same size for all movers and a numpy array of shape (num_qpos,) to specify individual sizes for each mover
- collision_shape = ‘box’:
use a numpy array of shape (2,) to specify the same size for all movers and a numpy array of shape (num_qpos,2) to specify individual sizes for each mover
add_safety_offset – whether to add the size offset (can be specified using: collision_params[“offset”]), defaults to False. Note that the same size offset is added for all movers.
- Returns:
a numpy array of shape (num_qpos,), where an element is 1 if the qpos is valid, otherwise 0
- reload_model(mover_start_xy_pos: ndarray, mover_goal_xy_pos: ndarray) None [source]
Generate a new model xml string with new start and goal positions and reload the model. In this environment, it is necessary to reload the model to ensure that the actuators work as expected.
- Parameters:
mover_start_xy_pos – a numpy array of shape (num_movers,2) containing the (x,y) starting positions of each mover.
mover_goal_xy_pos – a numpy array of shape (num_movers_with_goals,2) containing the (x,y) goal positions of the movers (num_movers_with_goals <= num_movers)
- render() ndarray | None
Compute frames depending on the initially specified
render_mode
. Before the corresponding viewer is updated, the_render_callback()
is called to give the opportunity to add more functionality.- Returns:
returns a numpy array if render_mode != ‘human’, otherwise it returns None (render_mode ‘human’)
- reset(seed: int | None = None, options: dict[str, any] | None = None) tuple[dict[str, ndarray], dict[str, any]]
Reset the environment returning an initial observation and auxiliary information. More detailed information about the parameters and return values can be found in the Gymnasium documentation: https://gymnasium.farama.org/api/env/#gymnasium.Env.reset.
This method performs the following steps:
reset RNG, if desired
call
_reset_callback(option)
to give the user the opportunity to add more functionalitycall
mj_forward()
check whether there are mover or wall collisions
call
render()
get initial observation and info dictionary
- Parameters:
seed – if set to None, the RNG is not reset; if int, sets the desired seed; defaults to None
options – a dictionary that can be used to specify additional reset options, e.g. object parameters; defaults to None
- Returns:
initial observation and auxiliary information contained in the ‘info’ dictionary
- step(action: int | ndarray) tuple[dict[str, ndarray], float, bool, bool, dict[str, any]]
Execute one step of the environment’s dynamics applying the given action. Note that the environment executes as many MuJoCo simulation steps as the number of cycles specified for this environment (
num_cycles
). The duration of one cycle is determined by the cycle time, which must be specified in the MuJoCo xml string using theoption/timestep
parameter. The same action is applied for all cycles.This method performs the following steps:
check whether the dimension of the action matches the dimension of the action space
if the action space does not contain the specified action, the action is clipped to the interval edges of the action space
call
_step_callback(action)
to give the user the opportunity to add more functionalityexecute MuJoCo simulation steps (
mj_step()
). After each simulation step, it is checked whether there are mover or wall collisions. In case of a collision, mover_collision or wall_collision will be True and no further simulation steps are performed, as a real system would typically stop as well due to position lag errors. In addition,render()
can be called after each simulation step to provide a smooth visualization of the movement (setrender_every_cycle=True
). The callback_mujoco_step_callback(action)
can be used to add functionality BEFORE the next simulation step is executed. This can be useful, for example, to ensure velocity or acceleration limits within each cycle.call
render()
get return values
More detailed information about the parameters and return values can be found in the Gymnasium documentation: https://gymnasium.farama.org/api/env/#gymnasium.Env.step.
- Parameters:
action – the action to apply
- Returns:
the next observation
the immediate reward for taking the action
whether a terminal state is reached
whether the truncation condition is satisfied
auxiliary information contained in the ‘info’ dictionary
- property unwrapped: Env[ObsType, ActType]
Returns the base non-wrapped environment.
- Returns:
Env: The base non-wrapped
gymnasium.Env
instance
- window_viewer_is_running() bool
Check whether the window viewer (render_mode ‘human’) is active, i.e. the window is open.
- Returns:
True if the window is open, False otherwise