Basic Planar Robotics Single-Agent Environment#

class gymnasium_planar_robotics.envs.basic_envs.BasicPlanarRoboticsSingleAgentEnv(layout_tiles: ndarray, num_movers: int, tile_params: dict[str, Any] | None = None, mover_params: dict[str, Any] | None = None, initial_mover_zpos: float = 0.005, table_height: float = 0.4, std_noise: ndarray | float = 1e-05, render_mode: str | None = 'human', render_every_cycle: bool = False, default_cam_config: dict[str, Any] | None = None, width_no_camera_specified: int = 1240, height_no_camera_specified: int = 1080, num_cycles: int = 40, collision_params: dict[str, Any] | None = None, initial_mover_start_xy_pos: ndarray | None = None, initial_mover_goal_xy_pos: ndarray | None = None, custom_model_xml_strings: dict[str, str] | None = None, use_mj_passive_viewer: bool = False)[source]

A base class for single-agent reinforcement learning environments in the field of planar robotics that follow the Gymnasium API. A more detailed explanation of all parameters can be found in the documentation of the BasicPlanarRoboticsEnv.

Parameters:

layout_tiles – the tile layout
num_movers – the number of movers
tile_params – tile parameters such as the size and mass, defaults to None
mover_params – mover parameters such as the size and mass, defaults to None
initial_mover_zpos – the initial distance between the bottom of the mover and the top of a tile, defaults to 0.005 [m]
table_height – the height of a table on which the tiles are placed, defaults to 0.4 [m]
std_noise – the standard deviation of a Gaussian with zero mean used to add noise, defaults to 1e-5
render_mode – the mode that is used to render the frames (‘human’, ‘rgb_array’ or None), defaults to ‘human’
render_every_cycle – whether to call render() after each integrator step in the step() method, defaults to False. Rendering every cycle leads to a smoother visualization of the scene, but can also be computationally expensive. Thus, this parameter provides the possibility to speed up training and evaluation. Regardless of this parameter, the scene is always rendered after ‘num_cycles’ have been executed if ‘render_mode != None’.
default_cam_config – dictionary with attribute values of the viewer’s default camera, https://mujoco.readthedocs.io/en/latest/XMLreference.html?highlight=camera#visual-global, defaults to None
width_no_camera_specified – if render_mode != ‘human’ and no width is specified, this value is used, defaults to 1240
height_no_camera_specified – if render_mode != ‘human’ and no height is specified, this value is used, defaults to 1080
num_cycles – the number of control cycles for which to apply the same action, defaults to 40
collision_params – a dictionary that can be used to specify collision parameters, defaults to None
initial_mover_start_xy_pos – the initial (x,y) starting positions of the movers, defaults to None
initial_mover_goal_xy_pos – the initial (x,y) goal positions of the movers, defaults to None
custom_model_xml_strings – a dictionary containing additional xml strings to provide the ability to add actuators, sensors, objects, robots, etc. to the model, defaults to None
use_mj_passive_viewer – whether the MuJoCo passive_viewer should be used, defaults to False. If set to False, the Gymnasium MuJoCo WindowViewer with custom overlays is used.

check_mover_collision(mover_names: list[str], c_size: float | ndarray, add_safety_offset: bool = False, mover_qpos: ndarray | None = None, add_qpos_noise: bool = False) → bool

Check whether two movers specified in mover_names collide. In case of collision shape ‘box’, this method takes the orientation of the movers into account.

Parameters:

mover_names – a list of mover names that should be checked (correspond to the body name of the mover in the MuJoCo model)
c_size –
the size of the collision shape of the movers
- collision_shape = ‘circle’:
  use a single float value to specify the same size for all movers and a numpy array of shape (num_movers,) to specify individual sizes for each mover
- collision_shape = ‘box’:
  use a numpy array of shape (2,) to specify the same size for all movers and a numpy array of shape (num_movers,2) to specify individual sizes for each mover
add_safety_offset – whether to add the size offset (can be specified using: collision_params[“offset”]), defaults to False. Note that the same size offset is added for both movers.
mover_qpos – the qpos of the movers specified as a numpy array of shape (num_movers,7) (x_p,y_p,z_p,w_o,x_o,y_o,z_o). If set to None, the current qpos of the movers in the MuJoCo model is used; defaults to None
add_qpos_noise – whether to add Gaussian noise to the qpos of the movers, defaults to False. Only used if mover_qpos is not None.

Returns:

True if the movers collide, False otherwise

check_wall_collision(mover_names: list[str], c_size: float | ndarray, add_safety_offset: bool = False, mover_qpos: ndarray | None = None, add_qpos_noise: bool = False) → ndarray

Check whether the qpos of the movers listed in mover_names are valid, i.e. no wall collisions.

Parameters:

mover_names – a list of mover names that should be checked (correspond to the body name of the mover in the MuJoCo model)
c_size –
the size of the collision shape
- collision_shape = ‘circle’:
  use a single float value to specify the same size for all movers and a numpy array of shape (num_movers,) to specify individual sizes for each mover
- collision_shape = ‘box’:
  use a numpy array of shape (2,) to specify the same size for all movers and a numpy array of shape (num_movers,2) to specify individual sizes for each mover
add_safety_offset – whether to add the size offset (can be specified using: collision_params[“offset”]), defaults to False. Note that the same size offset is added for all movers.
mover_qpos – a numpy array of shape (num_qpos,7) containing the qpos (x_p,y_p,z_p,w_o,x_o,y_o,z_o) of each mover or None. If set to None, the current qpos of each mover in the MuJoCo model is used; defaults to None
add_qpos_noise – whether to add Gaussian noise to the qpos of the movers, defaults to False. Only used if mover_qpos is not None.

Returns:

a numpy array of shape (num_movers,), where an element is 1 if the qpos is valid (no wall collision), otherwise 0

close() → None: Close the environment.

abstractmethod compute_reward(achieved_goal: ndarray | None = None, desired_goal: ndarray | None = None, info: dict[str, Any] | None = None) → ndarray | float[source]

Compute the immediate reward. This method is required by the stable-baselines3 implementation of Hindsight Experience Replay (HER) (for more information, see https://stable-baselines3.readthedocs.io/en/master/modules/her.html).

Parameters:

achieved_goal – a numpy array of shape (batch_size, length achieved_goal) or (length achieved_goal,) containing the goals already achieved (goal-conditioned RL); defaults to None (standard RL)
desired_goal – a numpy array of shape (batch_size, length desired_goal) or (length desired_goal,) containing the desired goals (goal-conditioned RL); defaults to None (standard RL)
info – a dictionary containing auxiliary information, defaults to None

Returns:

a single float value or a numpy array of shape (batch_size,) containing the immediate rewards

abstractmethod compute_terminated(achieved_goal: ndarray | None = None, desired_goal: ndarray | None = None, info: dict[str, Any] | None = None) → ndarray | bool[source]

Check whether a terminal state is reached. This method can be used for both goal-conditioned RL and standard RL. Since Hindsight Experience Replay (HER) is commonly used in goal-conditioned RL, this method receives the ‘achieved_goal’ and ‘desired_goal’ corresponding to the requirements of the HER implementation of stable-baselines3 (for more information, see https://stable-baselines3.readthedocs.io/en/master/modules/her.html).

Parameters:

achieved_goal – a numpy array of shape (batch_size, length achieved_goal) or (length achieved_goal,) containing the goals already achieved (goal-conditioned RL); defaults to None (standard RL)
desired_goal – a numpy array of shape (batch_size, length desired_goal) or (length desired_goal,) containing the desired goals (goal-conditioned RL); defaults to None (standard RL)
info – a dictionary containing auxiliary information, defaults to None

Returns:

a single bool value or a numpy array of shape (batch_size,) containing Boolean values, where True indicates that a terminal state has been reached

abstractmethod compute_truncated(achieved_goal: ndarray | None = None, desired_goal: ndarray | None = None, info: dict[str, Any] | None = None) → ndarray | bool[source]

Check whether the truncation condition is satisfied. This method can be used for both goal-conditioned RL and standard RL. Since Hindsight Experience Replay (HER) is commonly used in goal-conditioned RL, this method receives the ‘achieved_goal’ and ‘desired_goal’ corresponding to the requirements of the HER implementation of stable-baselines3 (for more information, see https://stable-baselines3.readthedocs.io/en/master/modules/her.html).

Parameters:

achieved_goal – a numpy array of shape (batch_size, length achieved_goal) or (length achieved_goal,) containing the goals already achieved (goal-conditioned RL); defaults to None (standard RL)
desired_goal – a numpy array of shape (batch_size, length desired_goal) or (length desired_goal,) containing the desired goals (goal-conditioned RL); defaults to None (standard RL)
info – a dictionary containing auxiliary information, defaults to None

Returns:

a single bool value or a numpy array of shape (batch_size,) containing Boolean values, where True indicates that a the truncation condition is satisfied

generate_model_xml_string(mover_start_xy_pos: ndarray | None = None, mover_goal_xy_pos: ndarray | None = None, custom_xml_strings: dict[str, str] | None = None) → str

Generate a MuJoCo model xml string based on the mover-tile configuration of the environment.

Parameters:

mover_start_xy_pos – a numpy array of shape (num_movers,2) containing the (x,y) starting positions of each mover. If set to None, the movers will be placed in the center of a tile, i.e. the number of tiles must be >= the number of movers; defaults to None.
mover_goal_xy_pos – a numpy array of shape (num_movers_with_goals,2) containing the (x,y) goal positions of the movers (num_movers_with_goals <= num_movers). Note that only the first 6 movers have different colors to make the movers clearly distinguishable. Movers without goals are shown in gray. If set to None, no goals will be displayed and all movers are colored in gray; defaults to None
custom_xml_strings –
a dictionary containing additional xml strings to provide the ability to add actuators, sensors, objects, robots, etc. to the model. The keys determine where to add a string in the xml structure and the values contain the xml string to add. The following keys are accepted:
- ’custom_compiler_xml_str’:
  A custom ‘compiler’ xml element. Note that the entire default ‘compiler’ element is replaced.
- ’custom_visual_xml_str’:
  A custom ‘visual’ xml element. Note that the entire default ‘visual’ element is replaced.
- ’custom_option_xml_str’:
  A custom ‘option’ xml element. Note that the entire default ‘option’ element is replaced.
- ’custom_assets_xml_str’:
  This xml string adds elements to the ‘asset’ grouping element.
- ’custom_default_xml_str’:
  This xml string adds elements to the ‘default’ grouping element.
- ’custom_worldbody_xml_str’:
  This xml string adds elements to the ‘worldbody’ grouping element.
- ’custom_outworldbody_xml_str’:
  This xml string should be used to include files or add elements other than ‘compiler’, ‘visual’, ‘option’, ‘asset’, ‘default’ or ‘worldbody’.
If set to None, only the basic xml string is generated, containing tiles, movers (excluding actuators), and possibly goals; defaults to None

Returns:

MuJoCo model xml string

get_c_size_arr(c_size: float | ndarray, num_reps: int) → ndarray

Return the size of the collision shape as a numpy array of shape (num_reps,1) or (num_reps,2) depending on the collision shape. This method should be used to obtain the appropriate c_size_arr if the same size is to be used for all movers.

Parameters:

c_size –
the size of the collision shape:
- collision_shape = ‘circle’:
  use a single float value to specify the same size for all movers and a numpy array of shape (num_reps,) to specify individual sizes for each mover
- collision_shape = ‘box’:
  use a numpy array of shape (2,) to specify the same size for all movers and a numpy array of shape (num_reps,2) to specify individual sizes for each mover
num_reps – the number of repetitions of c_size if the same size of collision shape is to be used for all movers. Otherwise, this value is ignored.

Returns:

the collision shape sizes as a numpy array of a suitable shape:

collision_shape = ‘circle’:
a numpy array of shape (num_reps,1)
collision_shape = ‘box’:
a numpy array of shape (num_reps,2) if c_size is a numpy array of shape (2,). Otherwise, c_size is not modified.

get_mover_qacc(mover_name: str, add_noise: bool = False) → ndarray

Returns the linear and angular acceleration (qacc) of the desired mover.

Parameters:

mover_name – name of the mover for which the acceleration should be returned (corresponds to the body name of the mover in the MuJoCo model)
add_noise – whether to add Gaussian noise, defaults to False

Returns:

linear and angular acceleration of the mover (x,y,z,a,b,c)

get_mover_qacc_arr(mover_names: list[str], add_noise: bool = False) → ndarray

Return the qacc of several movers as a numpy array of shape (num_movers,6).

Parameters:

mover_names – a list of mover names for which the qacc should be returned (correspond to the body name of the mover in the MuJoCo model)
add_noise – whether to add Gaussian noise to the qacc of the movers, defaults to False

Returns:

a numpy array of shape (num_movers,6) containing the qacc (x,y,z,a,b,c) of each mover. The order of the qacc corresponds to the order of the mover names.

get_mover_qpos(mover_name: str, add_noise: bool = False) → ndarray

Returns the position and orientation of the desired mover. The orientation is returned as a quaternion (w,x,y,z). Note that the z-pos is the distance between the bottom of the mover and the top of a tile. In contrast, the z-pos in the MuJoCo model is the previously mentioned distance + half the height of a mover.

Parameters:

mover_name – name of the mover for which the position and orientation should be returned (corresponds to the body name of the mover in the MuJoCo model)
add_noise – whether to add Gaussian noise, defaults to False

Returns:

position and orientation of the desired mover (x_p,y_p,z_p,w_o,x_o,y_o,z_o)

get_mover_qpos_arr(mover_names: list[str], add_noise: bool = False) → ndarray

Return the qpos of several movers as a numpy array of shape (num_movers,7).

Parameters:

mover_names – a list of mover names for which the qpos should be returned (correspond to the body name of the mover in the MuJoCo model)
add_noise – whether to add Gaussian noise to the qpos of the movers, defaults to False

Returns:

a numpy array of shape (num_movers,7) containing the qpos (x_p,y_p,z_p,w_o,x_o,y_o,z_o) of each mover. The order of the qpos corresponds to the order of the mover names.

get_mover_qvel(mover_name: str, add_noise: bool = False) → ndarray

Return the linear and angular velocities (qvel) of the desired mover.

Parameters:

mover_name – name of the mover for which the velocity should be returned (corresponds to the body name of the mover in the MuJoCo model)
add_noise – whether to add Gaussian noise, defaults to False

Returns:

linear and angular velocities of the mover (x,y,z,a,b,c)

get_mover_qvel_arr(mover_names: list[str], add_noise: bool = False) → ndarray

Return the qvel of several movers as a numpy array of shape (num_movers,6).

Parameters:

mover_names – a list of mover names for which the qvel should be returned (correspond to the body name of the mover in the MuJoCo model)
add_noise – whether to add Gaussian noise to the qvel of the movers, defaults to False

Returns:

a numpy array of shape (num_movers,6) containing the qvel (x,y,z,a,b,c) of each mover. The order of the qvel corresponds to the order of the mover names.

get_tile_indices_mask(mask: ndarray) → tuple[ndarray, ndarray]

Find the x and y indices of tiles that correspond to the specified structure (the mask) in the tile layout. Note that the indices of the top left tile in the mask are returned.

Parameters:: mask – a 2D numpy array containing only 0 and 1 which specifies the structure to be found in the tile layout
Returns:: the x and y indices of the tiles in separate numpy arrays, each of shape (num_mask_found,)

get_tile_xy_pos() → tuple[ndarray, ndarray]

Find the (x,y)-positions of the tiles. The position of a tile in the tile layout with index (i_x,i_y), can be found using (x-pos[i_x,i_y], y-pos[i_x,i_y]), where x-pos and y-pos are returned by this method. Note that the base frame is in the upper left corner.

Returns:: the x and y positions of the tiles in separate numpy arrays, each of shape (num_tiles_x, num_tiles_y)

get_wrapper_attr(name: str) → Any: Gets the attribute name from the environment.

has_wrapper_attr(name: str) → bool: Checks if the attribute name exists in the environment.

property np_random: Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:: Instances of np.random.Generator

property np_random_seed: int

Returns the environment’s internal _np_random_seed that if not set will first initialise with a random int as seed.

If np_random_seed was set directly instead of through reset() or set_np_random_through_seed(), the seed will take the value -1.

Returns:: int: the seed of the current np_random or -1, if the seed of the rng is unknown

qpos_is_valid(qpos: ndarray, c_size: float | ndarray, add_safety_offset: bool = False) → ndarray

Check whether qpos is valid. This method considers the edges as imaginary walls if there is no other tile next to that edge. A position is valid if it is above a tile and the distance to the walls is greater that the required safety margin, i.e. no collision with a wall. This also ensures that the position is reachable in case the specified position is a goal position.

This method allows to check multiple qpos at the same time, where the movers can be of different sizes. The orientation of the mover is taken into account if collision_shape = ‘box’, otherwise (collision_shape = ‘circle’) the orientation of the mover is ignored.

Parameters:

qpos – a numpy array of shape (num_qpos,7) containing the qpos (x_p,y_p,z_p,w_o,x_o,y_o,z_o) to be checked
c_size –
the size of the collision shape
- collision_shape = ‘circle’:
  use a single float value to specify the same size for all movers and a numpy array of shape (num_qpos,) to specify individual sizes for each mover
- collision_shape = ‘box’:
  use a numpy array of shape (2,) to specify the same size for all movers and a numpy array of shape (num_qpos,2) to specify individual sizes for each mover
add_safety_offset – whether to add the size offset (can be specified using: collision_params[“offset”]), defaults to False. Note that the same size offset is added for all movers.

Returns:

a numpy array of shape (num_qpos,), where an element is 1 if the qpos is valid, otherwise 0

render() → ndarray | None

Compute frames depending on the initially specified render_mode. Before the corresponding viewer is updated, the _render_callback() is called to give the opportunity to add more functionality.

Returns:: returns a numpy array if render_mode != ‘human’, otherwise it returns None (render_mode ‘human’)

reset(seed: int | None = None, options: dict[str, Any] | None = None) → tuple[dict[str, ndarray], dict[str, Any]][source]

Reset the environment returning an initial observation and auxiliary information. More detailed information about the parameters and return values can be found in the Gymnasium documentation: https://gymnasium.farama.org/api/env/#gymnasium.Env.reset.

This method performs the following steps:

reset RNG, if desired
call _reset_callback(option) to give the user the opportunity to add more functionality
call mj_forward()
check whether there are mover, wall, or other collisions, e.g. collisions with an obstacle
call render()
get initial observation and info dictionary

Parameters:

seed – if set to None, the RNG is not reset; if int, sets the desired seed; defaults to None
options – a dictionary that can be used to specify additional reset options, e.g. object parameters; defaults to None

Returns:

initial observation and auxiliary information contained in the ‘info’ dictionary

set_wrapper_attr(name: str, value: Any, *, force: bool = True) → bool: Sets the attribute name on the environment with value, see Wrapper.set_wrapper_attr for more info.

step(action: int | ndarray) → tuple[dict[str, ndarray], float, bool, bool, dict[str, Any]][source]

Execute one step of the environment’s dynamics applying the given action. Note that the environment executes as many MuJoCo simulation steps as the number of cycles specified for this environment (num_cycles). The duration of one cycle is determined by the cycle time, which must be specified in the MuJoCo xml string using the option/timestep parameter. The same action is applied for all cycles.

This method performs the following steps:

check whether the dimension of the action matches the dimension of the action space
if the action space does not contain the specified action, the action is clipped to the interval edges of the action space
call _step_callback(action) to give the user the opportunity to add more functionality
execute MuJoCo simulation steps (mj_step()). After each simulation step, it is checked whether there are mover, wall, or other collisions, e.g. collisions with an obstacle. To check for other collisions besides mover and wall collisions the _check_for_other_collisions_callback() is called. In case of a collision, no further simulation steps are performed, as a real system would typically stop as well due to position lag errors. In addition, render() can be called after each simulation step to provide a smooth visualization of the movement (set render_every_cycle=True). The callback _mujoco_step_callback(action) can be used to add functionality BEFORE the next simulation step is executed. This can be useful, for example, to ensure velocity or acceleration limits within each cycle.
call render()
get return values

More detailed information about the parameters and return values can be found in the Gymnasium documentation: https://gymnasium.farama.org/api/env/#gymnasium.Env.step.

Parameters:

action – the action to apply

Returns:

the next observation
the immediate reward for taking the action
whether a terminal state is reached
whether the truncation condition is satisfied
auxiliary information contained in the ‘info’ dictionary

property unwrapped: Env[ObsType, ActType]

Returns the base non-wrapped environment.

Returns:: Env: The base non-wrapped gymnasium.Env instance

window_viewer_is_running() → bool

Check whether the window viewer (render_mode ‘human’) is active, i.e. the window is open.

Returns:: True if the window is open, False otherwise