Benchmark Pushing Environment#

The BenchmarkPushingEnv is a simple pushing environment that should be understood as an example of how object manipulation with planar motor systems can look like. This environment is therefore intended for parameter or algorithm tests.

The aim is to push an object with the mover to the desired goal position without collisions between a mover and a wall by specifying either the jerk or the acceleration of the mover. The starting positions of the mover and object as well as the object goal position are chosen randomly at the start of a new episode. This environment contains only one object and one mover. In addition, the tile layout is set to a 3x3 layout. In this environment, positions, velocities, accelerations, and jerks have the units m, m/s, m/s² and m/s³, respectively.

Observation Space#

The observation space of this environment is a dictionary containing the following keys and values:

key	value
observation	if `learn_jerk=True`: a numpy array of shape (2*2,) containing the (x,y)-position, (x,y)-velocities and (x,y)-accelerations of the mover if `learn_jerk=False`: a numpy array of shape (2,) containing the (x,y)-position and (x,y)-velocities of the mover
achieved_goal	a numpy array of shape (2,) containing the current (x,y)-position of the object w.r.t. the base frame
desired_goal	a numpy array of shape (2,) containing the desired (x,y)-position of the object w.r.t. the base frame

Action Space#

The action space is continuous and 2-dimensional. If learn_jerk=True, an action

\[a_j = [a_{jx}, a_{jy}] \in [-j_{max},j_{max}]²\]

represents the desired jerks in x and y direction of the base frame (unit: m/s³), where j_max is the maximum possible jerk (see environment parameters).

Accordingly, if learn_jerk=False, an action

\[a_a = [a_{ax}, a_{ay}] \in [-a_{max},a_{max}]²\]

represents the accelerations in x and y direction of the base frame (unit: m/s²), where a_max is the maximum possible acceleration (see environment parameters).

Immediate Rewards#

The agent receives a reward of 0 if the object has reached its goal without a collision between the mover and a wall. For each timestep in which the object has not reached its goal and in which there is no collision between the mover and a wall, the environment emits a small negative reward of -1. In the case of a collision between the mover and a wall, the agent receives a large negative reward of -50.

Episode Termination and Truncation#

Each episode has a time limit of 50 environment steps. If the time limit is reached, the episode is truncated. Thus, each episode has 50 environment steps, except that the mover collides with a wall. In this case, the episode terminates immediately regardless of the time limit. An episode is not terminated when the object reaches its goal position, as the object has to remain close to the desired goal position until the end of an episode.

Note

The term ‘episode steps’ refers to the number of calls of env.steps(). The number of MuJoCo simulation steps, i.e. control cycles, is a multiple of the number of episode steps.

Environment Reset#

When the environment is reset, new (x,y) starting positions for the mover and the object as well as a new goal position are chosen at random. It is ensured that the new start position of the mover is collision-free, i.e. no wall collision and no collision with the object. In addition, the object’s start position is chosen such that the mover fits between the wall and the object. This is important to ensure that the object can be pushed in all directions.

Version History#

v0: initial version of the environment

Parameters#

class gymnasium_planar_robotics.envs.manipulation.benchmark_pushing_env.BenchmarkPushingEnv(mover_params: dict[str, Any] | None = None, initial_mover_zpos: float = 0.003, std_noise: ndarray | float = 1e-05, render_mode: str | None = 'human', render_every_cycle: bool = False, num_cycles: int = 40, collision_params: dict[str, Any] | None = None, v_max: float = 2.0, a_max: float = 10.0, j_max: float = 100.0, learn_jerk: bool = False, threshold_pos: float = 0.05, use_mj_passive_viewer: bool = False)[source]

A simple object pushing environment.

Parameters:

mover_params –
a dictionary that can be used to specify the mass and size of each mover using the keys ‘mass’ or ‘size’, defaults to None. To use the same mass and size for each mover, the mass can be specified as a single float value and the size as a numpy array of shape (3,). However, the movers can also be of different types, i.e. different masses and sizes. In this case, the mass and size should be specified as numpy arrays of shapes (num_movers,) and (num_movers,3), respectively. If set to None or only one key is specified, both mass and size or the missing value are set to the following default values:
- mass: 1.24 [kg]
- size: [0.155/2, 0.155/2, 0.012/2] (x,y,z) [m] (note: half-size)
initial_mover_zpos – the initial distance between the bottom of the mover and the top of a tile, defaults to 0.003
std_noise – the standard deviation of a Gaussian with zero mean used to add noise, defaults to 1e-5. The standard deviation can be used to add noise to the mover’s position, velocity and acceleration. If you want to use different standard deviations for position, velocity and acceleration use a numpy array of shape (3,); otherwise use a single float value, meaning the same standard deviation is used for all three values.
render_mode – the mode that is used to render the frames (‘human’, ‘rgb_array’ or None), defaults to ‘human’. If set to None, no viewer is initialized and used, i.e. no rendering. This can be useful to speed up training.
render_every_cycle – whether to call ‘render’ after each integrator step in the step() method, defaults to False. Rendering every cycle leads to a smoother visualization of the scene, but can also be computationally expensive. Thus, this parameter provides the possibility to speed up training and evaluation. Regardless of this parameter, the scene is always rendered after ‘num_cycles’ have been executed if render_mode != None.
num_cycles – the number of control cycles for which to apply the same action, defaults to 40
collision_params –
a dictionary that can be used to specify the following collision parameters, defaults to None:
- collision shape (key: ‘shape’): can be ‘box’ or ‘circle’, defaults to ‘circle’
- size of the collision shape (key: ‘size’), defaults to 0.11 [m]:
  - collision shape ‘circle’:
    a single float value which corresponds to the radius of the circle, or a numpy array of shape (num_movers,) to specify individual values for each mover
  - collision shape ‘box’:
    a numpy array of shape (2,) to specify x and y half-size of the box, or a numpy array of shape (num_movers, 2) to specify individual sizes for each mover
- additional size offset (key: ‘offset’), defaults to 0.0 [m]: an additional safety offset that is added to the size of the
  collision shape. Think of this offset as increasing the size of a mover by a safety margin.
- additional wall offset (key: ‘offset_wall’), defaults to 0.0 [m]: an additional safety offset that is added to the size
  of the collision shape to detect wall collisions. Think of this offset as moving the wall, i.e. the edge of a tile without an adjacent tile, closer to the center of the tile.
v_max – the maximum velocity, defaults to 2.0 [m/s]
a_max – the maximum acceleration, defaults to 10.0 [m/s²]
j_max – the maximum jerk (only used if ‘learn_jerk=True’), defaults to 100.0 [m/s³]
learn_jerk – whether to learn the jerk, defaults to False. If set to False, the acceleration is learned, i.e. the policy output.
threshold_pos – the position threshold used to determine whether a mover has reached its goal position, defaults to 0.05 [m]
use_mj_passive_viewer – whether the MuJoCo passive_viewer should be used, defaults to False. If set to False, the Gymnasium MuJoCo WindowViewer with custom overlays is used.

check_mover_collision(mover_names: list[str], c_size: float | ndarray, add_safety_offset: bool = False, mover_qpos: ndarray | None = None, add_qpos_noise: bool = False) → bool

Check whether two movers specified in mover_names collide. In case of collision shape ‘box’, this method takes the orientation of the movers into account.

Parameters:

mover_names – a list of mover names that should be checked (correspond to the body name of the mover in the MuJoCo model)
c_size –
the size of the collision shape of the movers
- collision_shape = ‘circle’:
  use a single float value to specify the same size for all movers and a numpy array of shape (num_movers,) to specify individual sizes for each mover
- collision_shape = ‘box’:
  use a numpy array of shape (2,) to specify the same size for all movers and a numpy array of shape (num_movers,2) to specify individual sizes for each mover
add_safety_offset – whether to add the size offset (can be specified using: collision_params[“offset”]), defaults to False. Note that the same size offset is added for both movers.
mover_qpos – the qpos of the movers specified as a numpy array of shape (num_movers,7) (x_p,y_p,z_p,w_o,x_o,y_o,z_o). If set to None, the current qpos of the movers in the MuJoCo model is used; defaults to None
add_qpos_noise – whether to add Gaussian noise to the qpos of the movers, defaults to False. Only used if mover_qpos is not None.

Returns:

True if the movers collide, False otherwise

check_wall_collision(mover_names: list[str], c_size: float | ndarray, add_safety_offset: bool = False, mover_qpos: ndarray | None = None, add_qpos_noise: bool = False) → ndarray

Check whether the qpos of the movers listed in mover_names are valid, i.e. no wall collisions.

Parameters:

mover_names – a list of mover names that should be checked (correspond to the body name of the mover in the MuJoCo model)
c_size –
the size of the collision shape
- collision_shape = ‘circle’:
  use a single float value to specify the same size for all movers and a numpy array of shape (num_movers,) to specify individual sizes for each mover
- collision_shape = ‘box’:
  use a numpy array of shape (2,) to specify the same size for all movers and a numpy array of shape (num_movers,2) to specify individual sizes for each mover
add_safety_offset – whether to add the size offset (can be specified using: collision_params[“offset”]), defaults to False. Note that the same size offset is added for all movers.
mover_qpos – a numpy array of shape (num_qpos,7) containing the qpos (x_p,y_p,z_p,w_o,x_o,y_o,z_o) of each mover or None. If set to None, the current qpos of each mover in the MuJoCo model is used; defaults to None
add_qpos_noise – whether to add Gaussian noise to the qpos of the movers, defaults to False. Only used if mover_qpos is not None.

Returns:

a numpy array of shape (num_movers,), where an element is 1 if the qpos is valid (no wall collision), otherwise 0

close() → None[source]: Close the environment.

compute_reward(achieved_goal: ndarray, desired_goal: ndarray, info: dict[str, Any] | None = None) → ndarray | float[source]

Compute the immediate reward.

Parameters:

achieved_goal – a numpy array of shape (batch_size, length achieved_goal) or (length achieved_goal,) containing the already achieved (x,y)-positions of an object
desired_goal – a numpy array of shape (batch_size, length desired_goal) or (length desired_goal,) containing the (x,y) goal positions of an object
info – a dictionary containing auxiliary information, defaults to None

Returns:

a single float value or a numpy array of shape (batch_size,) containing the immediate rewards

compute_terminated(achieved_goal: ndarray, desired_goal: ndarray, info: dict[str, Any] | None = None) → ndarray | bool[source]

Check whether a terminal state is reached. A state is terminal when the mover collides with a wall.

Parameters:

achieved_goal – a numpy array of shape (batch_size, length achieved_goal) or (length achieved_goal,) containing the already achieved (x,y)-positions of an object
desired_goal – a numpy array of shape (batch_size, length desired_goal) or (length desired_goal,) containing the (x,y) goal positions of an object
info – a dictionary containing auxiliary information, defaults to None

Returns:

if batch_size > 1:
a numpy array of shape (batch_size,). An entry is True if the state is terminal, False otherwise
if batch_size = 1:
True if the state is terminal, False otherwise

compute_truncated(achieved_goal: ndarray, desired_goal: ndarray, info: dict[str, Any] | None = None) → ndarray | bool[source]

Check whether the truncation condition is satisfied. The truncation condition (a time limit in this environment) is automatically checked by the Gymnasium TimeLimit Wrapper, which is why this method always returns False.

Parameters:

achieved_goal – a numpy array of shape (batch_size, length achieved_goal) or (length achieved_goal,) containing the already achieved (x,y)-positions of an object
desired_goal – a numpy array of shape (batch_size, length desired_goal) or (length desired_goal,) containing the (x,y) goal positions of an object
info – a dictionary containing auxiliary information, defaults to None

Returns:

if batch_size > 1:
a numpy array of shape (batch_size,) in which all entries are False
if batch_size = 1:
False

ensure_max_dyn_val(current_values: ndarray, max_value: float, next_derivs: ndarray) → tuple[ndarray, ndarray][source]

Ensure the minimum and maximum dynamic values.

Parameters:

current_values – the current velocity or acceleration specified as a numpy array of shape (2,) or (num_checks,2)
max_value – the maximum velocity or acceleration (float)
next_derivs – the next derivative (acceleration or jerk) used for one integrator step specified as a numpy array of shape (2,) or (num_checks,2)

Returns:

the next velocity or acceleration and the next derivative (acceleration or jerk) corresponding to the next action that must be applied to ensure the minimum and maximum dynamics (each of shape (num_checks,2))

generate_model_xml_string(mover_start_xy_pos: ndarray | None = None, mover_goal_xy_pos: ndarray | None = None, custom_xml_strings: dict[str, str] | None = None) → str

Generate a MuJoCo model xml string based on the mover-tile configuration of the environment.

Parameters:

mover_start_xy_pos – a numpy array of shape (num_movers,2) containing the (x,y) starting positions of each mover. If set to None, the movers will be placed in the center of a tile, i.e. the number of tiles must be >= the number of movers; defaults to None.
mover_goal_xy_pos – a numpy array of shape (num_movers_with_goals,2) containing the (x,y) goal positions of the movers (num_movers_with_goals <= num_movers). Note that only the first 6 movers have different colors to make the movers clearly distinguishable. Movers without goals are shown in gray. If set to None, no goals will be displayed and all movers are colored in gray; defaults to None
custom_xml_strings –
a dictionary containing additional xml strings to provide the ability to add actuators, sensors, objects, robots, etc. to the model. The keys determine where to add a string in the xml structure and the values contain the xml string to add. The following keys are accepted:
- ’custom_compiler_xml_str’:
  A custom ‘compiler’ xml element. Note that the entire default ‘compiler’ element is replaced.
- ’custom_visual_xml_str’:
  A custom ‘visual’ xml element. Note that the entire default ‘visual’ element is replaced.
- ’custom_option_xml_str’:
  A custom ‘option’ xml element. Note that the entire default ‘option’ element is replaced.
- ’custom_assets_xml_str’:
  This xml string adds elements to the ‘asset’ grouping element.
- ’custom_default_xml_str’:
  This xml string adds elements to the ‘default’ grouping element.
- ’custom_worldbody_xml_str’:
  This xml string adds elements to the ‘worldbody’ grouping element.
- ’custom_outworldbody_xml_str’:
  This xml string should be used to include files or add elements other than ‘compiler’, ‘visual’, ‘option’, ‘asset’, ‘default’ or ‘worldbody’.
If set to None, only the basic xml string is generated, containing tiles, movers (excluding actuators), and possibly goals; defaults to None

Returns:

MuJoCo model xml string

get_c_size_arr(c_size: float | ndarray, num_reps: int) → ndarray

Return the size of the collision shape as a numpy array of shape (num_reps,1) or (num_reps,2) depending on the collision shape. This method should be used to obtain the appropriate c_size_arr if the same size is to be used for all movers.

Parameters:

c_size –
the size of the collision shape:
- collision_shape = ‘circle’:
  use a single float value to specify the same size for all movers and a numpy array of shape (num_reps,) to specify individual sizes for each mover
- collision_shape = ‘box’:
  use a numpy array of shape (2,) to specify the same size for all movers and a numpy array of shape (num_reps,2) to specify individual sizes for each mover
num_reps – the number of repetitions of c_size if the same size of collision shape is to be used for all movers. Otherwise, this value is ignored.

Returns:

the collision shape sizes as a numpy array of a suitable shape:

collision_shape = ‘circle’:
a numpy array of shape (num_reps,1)
collision_shape = ‘box’:
a numpy array of shape (num_reps,2) if c_size is a numpy array of shape (2,). Otherwise, c_size is not modified.

get_mover_qacc(mover_name: str, add_noise: bool = False) → ndarray

Returns the linear and angular acceleration (qacc) of the desired mover.

Parameters:

mover_name – name of the mover for which the acceleration should be returned (corresponds to the body name of the mover in the MuJoCo model)
add_noise – whether to add Gaussian noise, defaults to False

Returns:

linear and angular acceleration of the mover (x,y,z,a,b,c)

get_mover_qacc_arr(mover_names: list[str], add_noise: bool = False) → ndarray

Return the qacc of several movers as a numpy array of shape (num_movers,6).

Parameters:

mover_names – a list of mover names for which the qacc should be returned (correspond to the body name of the mover in the MuJoCo model)
add_noise – whether to add Gaussian noise to the qacc of the movers, defaults to False

Returns:

a numpy array of shape (num_movers,6) containing the qacc (x,y,z,a,b,c) of each mover. The order of the qacc corresponds to the order of the mover names.

get_mover_qpos(mover_name: str, add_noise: bool = False) → ndarray

Returns the position and orientation of the desired mover. The orientation is returned as a quaternion (w,x,y,z). Note that the z-pos is the distance between the bottom of the mover and the top of a tile. In contrast, the z-pos in the MuJoCo model is the previously mentioned distance + half the height of a mover.

Parameters:

mover_name – name of the mover for which the position and orientation should be returned (corresponds to the body name of the mover in the MuJoCo model)
add_noise – whether to add Gaussian noise, defaults to False

Returns:

position and orientation of the desired mover (x_p,y_p,z_p,w_o,x_o,y_o,z_o)

get_mover_qpos_arr(mover_names: list[str], add_noise: bool = False) → ndarray

Return the qpos of several movers as a numpy array of shape (num_movers,7).

Parameters:

mover_names – a list of mover names for which the qpos should be returned (correspond to the body name of the mover in the MuJoCo model)
add_noise – whether to add Gaussian noise to the qpos of the movers, defaults to False

Returns:

a numpy array of shape (num_movers,7) containing the qpos (x_p,y_p,z_p,w_o,x_o,y_o,z_o) of each mover. The order of the qpos corresponds to the order of the mover names.

get_mover_qvel(mover_name: str, add_noise: bool = False) → ndarray

Return the linear and angular velocities (qvel) of the desired mover.

Parameters:

mover_name – name of the mover for which the velocity should be returned (corresponds to the body name of the mover in the MuJoCo model)
add_noise – whether to add Gaussian noise, defaults to False

Returns:

linear and angular velocities of the mover (x,y,z,a,b,c)

get_mover_qvel_arr(mover_names: list[str], add_noise: bool = False) → ndarray

Return the qvel of several movers as a numpy array of shape (num_movers,6).

Parameters:

mover_names – a list of mover names for which the qvel should be returned (correspond to the body name of the mover in the MuJoCo model)
add_noise – whether to add Gaussian noise to the qvel of the movers, defaults to False

Returns:

a numpy array of shape (num_movers,6) containing the qvel (x,y,z,a,b,c) of each mover. The order of the qvel corresponds to the order of the mover names.

get_tile_indices_mask(mask: ndarray) → tuple[ndarray, ndarray]

Find the x and y indices of tiles that correspond to the specified structure (the mask) in the tile layout. Note that the indices of the top left tile in the mask are returned.

Parameters:: mask – a 2D numpy array containing only 0 and 1 which specifies the structure to be found in the tile layout
Returns:: the x and y indices of the tiles in separate numpy arrays, each of shape (num_mask_found,)

get_tile_xy_pos() → tuple[ndarray, ndarray]

Find the (x,y)-positions of the tiles. The position of a tile in the tile layout with index (i_x,i_y), can be found using (x-pos[i_x,i_y], y-pos[i_x,i_y]), where x-pos and y-pos are returned by this method. Note that the base frame is in the upper left corner.

Returns:: the x and y positions of the tiles in separate numpy arrays, each of shape (num_tiles_x, num_tiles_y)

get_wrapper_attr(name: str) → Any: Gets the attribute name from the environment.

has_wrapper_attr(name: str) → bool: Checks if the attribute name exists in the environment.

property np_random: Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:: Instances of np.random.Generator

property np_random_seed: int

Returns the environment’s internal _np_random_seed that if not set will first initialise with a random int as seed.

If np_random_seed was set directly instead of through reset() or set_np_random_through_seed(), the seed will take the value -1.

Returns:: int: the seed of the current np_random or -1, if the seed of the rng is unknown

qpos_is_valid(qpos: ndarray, c_size: float | ndarray, add_safety_offset: bool = False) → ndarray

Check whether qpos is valid. This method considers the edges as imaginary walls if there is no other tile next to that edge. A position is valid if it is above a tile and the distance to the walls is greater that the required safety margin, i.e. no collision with a wall. This also ensures that the position is reachable in case the specified position is a goal position.

This method allows to check multiple qpos at the same time, where the movers can be of different sizes. The orientation of the mover is taken into account if collision_shape = ‘box’, otherwise (collision_shape = ‘circle’) the orientation of the mover is ignored.

Parameters:

qpos – a numpy array of shape (num_qpos,7) containing the qpos (x_p,y_p,z_p,w_o,x_o,y_o,z_o) to be checked
c_size –
the size of the collision shape
- collision_shape = ‘circle’:
  use a single float value to specify the same size for all movers and a numpy array of shape (num_qpos,) to specify individual sizes for each mover
- collision_shape = ‘box’:
  use a numpy array of shape (2,) to specify the same size for all movers and a numpy array of shape (num_qpos,2) to specify individual sizes for each mover
add_safety_offset – whether to add the size offset (can be specified using: collision_params[“offset”]), defaults to False. Note that the same size offset is added for all movers.

Returns:

a numpy array of shape (num_qpos,), where an element is 1 if the qpos is valid, otherwise 0

reload_model(mover_start_xy_pos: ndarray | None = None) → None[source]

Generate a new model xml string with new start positions for mover and object and a new object goal position and reload the model. In this environment, it is necessary to reload the model to ensure that the actuators work as expected.

Parameters:: mover_start_xy_pos – None or a numpy array of shape (num_movers,2) containing the (x,y) starting positions of each mover, defaults to None. If set to None, the movers will be placed in the center of the tiles that are added to the xml string first.

render() → ndarray | None

Compute frames depending on the initially specified render_mode. Before the corresponding viewer is updated, the _render_callback() is called to give the opportunity to add more functionality.

Returns:: returns a numpy array if render_mode != ‘human’, otherwise it returns None (render_mode ‘human’)

reset(seed: int | None = None, options: dict[str, Any] | None = None) → tuple[dict[str, ndarray], dict[str, Any]]

Reset the environment returning an initial observation and auxiliary information. More detailed information about the parameters and return values can be found in the Gymnasium documentation: https://gymnasium.farama.org/api/env/#gymnasium.Env.reset.

This method performs the following steps:

reset RNG, if desired
call _reset_callback(option) to give the user the opportunity to add more functionality
call mj_forward()
check whether there are mover, wall, or other collisions, e.g. collisions with an obstacle
call render()
get initial observation and info dictionary

Parameters:

seed – if set to None, the RNG is not reset; if int, sets the desired seed; defaults to None
options – a dictionary that can be used to specify additional reset options, e.g. object parameters; defaults to None

Returns:

initial observation and auxiliary information contained in the ‘info’ dictionary

set_wrapper_attr(name: str, value: Any, *, force: bool = True) → bool: Sets the attribute name on the environment with value, see Wrapper.set_wrapper_attr for more info.

step(action: int | ndarray) → tuple[dict[str, ndarray], float, bool, bool, dict[str, Any]]

Execute one step of the environment’s dynamics applying the given action. Note that the environment executes as many MuJoCo simulation steps as the number of cycles specified for this environment (num_cycles). The duration of one cycle is determined by the cycle time, which must be specified in the MuJoCo xml string using the option/timestep parameter. The same action is applied for all cycles.

This method performs the following steps:

check whether the dimension of the action matches the dimension of the action space
if the action space does not contain the specified action, the action is clipped to the interval edges of the action space
call _step_callback(action) to give the user the opportunity to add more functionality
execute MuJoCo simulation steps (mj_step()). After each simulation step, it is checked whether there are mover, wall, or other collisions, e.g. collisions with an obstacle. To check for other collisions besides mover and wall collisions the _check_for_other_collisions_callback() is called. In case of a collision, no further simulation steps are performed, as a real system would typically stop as well due to position lag errors. In addition, render() can be called after each simulation step to provide a smooth visualization of the movement (set render_every_cycle=True). The callback _mujoco_step_callback(action) can be used to add functionality BEFORE the next simulation step is executed. This can be useful, for example, to ensure velocity or acceleration limits within each cycle.
call render()
get return values

More detailed information about the parameters and return values can be found in the Gymnasium documentation: https://gymnasium.farama.org/api/env/#gymnasium.Env.step.

Parameters:

action – the action to apply

Returns:

the next observation
the immediate reward for taking the action
whether a terminal state is reached
whether the truncation condition is satisfied
auxiliary information contained in the ‘info’ dictionary

property unwrapped: Env[ObsType, ActType]

Returns the base non-wrapped environment.

Returns:: Env: The base non-wrapped gymnasium.Env instance

window_viewer_is_running() → bool

Check whether the window viewer (render_mode ‘human’) is active, i.e. the window is open.

Returns:: True if the window is open, False otherwise