Basic Planar Robotics Multi-Agent Environment#

class gymnasium_planar_robotics.envs.basic_envs.BasicPlanarRoboticsMultiAgentEnv(layout_tiles: ndarray, num_movers: int, tile_params: dict[str, any] | None = None, mover_params: dict[str, any] | None = None, initial_mover_zpos: float = 0.005, table_height: float = 0.4, std_noise: ndarray | float = 1e-05, render_mode: str | None = 'human', default_cam_config: dict[str, any] | None = None, width_no_camera_specified: int = 1240, height_no_camera_specified: int = 1080, collision_params: dict[str, any] | None = None, initial_mover_start_xy_pos: ndarray | None = None, initial_mover_goal_xy_pos: ndarray | None = None, custom_model_xml_strings: dict[str, str] | None = None, use_mj_passive_viewer: bool = False)[source]

A base class for multi-agent reinforcement learning environments in the field of planar robotics that follow the PettingZoo API. A more detailed explanation of all parameters can be found in the documentation of the BasicPlanarRoboticsEnv.

Parameters:
  • layout_tiles – the tile layout

  • num_movers – the number of movers

  • tile_params – tile parameters such as the size and mass, defaults to None

  • mover_params – mover parameters such as the size and mass, defaults to None

  • initial_mover_zpos – the initial distance between the bottom of the mover and the top of a tile, defaults to 0.005 [m]

  • table_height – the height of a table on which the tiles are placed, defaults to 0.4 [m]

  • std_noise – the standard deviation of a Gaussian with zero mean used to add noise, defaults to 1e-5

  • render_mode – the mode that is used to render the frames (‘human’, ‘rgb_array’ or None), defaults to ‘human’

  • default_cam_config – dictionary with attribute values of the viewer’s default camera, https://mujoco.readthedocs.io/en/latest/XMLreference.html?highlight=camera#visual-global, defaults to None

  • width_no_camera_specified – if render_mode != ‘human’ and no width is specified, this value is used, defaults to 1240

  • height_no_camera_specified – if render_mode != ‘human’ and no height is specified, this value is used, defaults to 1080

  • collision_params – a dictionary that can be used to specify collision parameters, defaults to None

  • initial_mover_start_xy_pos – the initial (x,y) starting positions of the movers, defaults to None

  • initial_mover_goal_xy_pos – the initial (x,y) goal positions of the movers, defaults to None

  • custom_model_xml_strings – a dictionary containing additional xml strings to provide the ability to add actuators, sensors, objects, robots, etc. to the model, defaults to None

  • use_mj_passive_viewer – whether the MuJoCo passive_viewer should be used, defaults to False. If set to False, the Gymnasium MuJoCo WindowViewer with custom overlays is used.

action_space(agent: AgentID) Space

Takes in agent and returns the action space for that agent.

MUST return the same value for the same agent name

Default implementation is to return the action_spaces dict

check_mover_collision(mover_names: list[str], c_size: float | ndarray, add_safety_offset: bool = False, mover_qpos: ndarray | None = None, add_qpos_noise: bool = False) bool

Check whether two movers specified in mover_names collide. In case of collision shape ‘box’, this method takes the orientation of the movers into account.

Parameters:
  • mover_names – a list of mover names that should be checked (correspond to the body name of the mover in the MuJoCo model)

  • c_size

    the size of the collision shape of the movers

    • collision_shape = ‘circle’:

      use a single float value to specify the same size for all movers and a numpy array of shape (num_movers,) to specify individual sizes for each mover

    • collision_shape = ‘box’:

      use a numpy array of shape (2,) to specify the same size for all movers and a numpy array of shape (num_movers,2) to specify individual sizes for each mover

  • add_safety_offset – whether to add the size offset (can be specified using: collision_params[“offset”]), defaults to False. Note that the same size offset is added for both movers.

  • mover_qpos – the qpos of the movers specified as a numpy array of shape (num_movers,7) (x_p,y_p,z_p,w_o,x_o,y_o,z_o). If set to None, the current qpos of the movers in the MuJoCo model is used; defaults to None

  • add_qpos_noise – whether to add Gaussian noise to the qpos of the movers, defaults to False. Only used if mover_qpos is not None.

Returns:

True if the movers collide, False otherwise

check_wall_collision(mover_names: list[str], c_size: float | ndarray, add_safety_offset: bool = False, mover_qpos: ndarray | None = None, add_qpos_noise: bool = False) ndarray

Check whether the qpos of the movers listed in mover_names are valid, i.e. no wall collisions.

Parameters:
  • mover_names – a list of mover names that should be checked (correspond to the body name of the mover in the MuJoCo model)

  • c_size

    the size of the collision shape

    • collision_shape = ‘circle’:

      use a single float value to specify the same size for all movers and a numpy array of shape (num_movers,) to specify individual sizes for each mover

    • collision_shape = ‘box’:

      use a numpy array of shape (2,) to specify the same size for all movers and a numpy array of shape (num_movers,2) to specify individual sizes for each mover

  • add_safety_offset – whether to add the size offset (can be specified using: collision_params[“offset”]), defaults to False. Note that the same size offset is added for all movers.

  • mover_qpos – a numpy array of shape (num_qpos,7) containing the qpos (x_p,y_p,z_p,w_o,x_o,y_o,z_o) of each mover or None. If set to None, the current qpos of each mover in the MuJoCo model is used; defaults to None

  • add_qpos_noise – whether to add Gaussian noise to the qpos of the movers, defaults to False. Only used if mover_qpos is not None.

Returns:

a numpy array of shape (num_movers,), where an element is 1 if the qpos is valid (no wall collision), otherwise 0

close() None

Close the environment.

generate_model_xml_string(mover_start_xy_pos: ndarray | None = None, mover_goal_xy_pos: ndarray | None = None, custom_xml_strings: dict[str, str] = None) str

Generate a MuJoCo model xml string based on the mover-tile configuration of the environment.

Parameters:
  • mover_start_xy_pos – a numpy array of shape (num_movers,2) containing the (x,y) starting positions of each mover. If set to None, the movers will be placed in the center of a tile, i.e. the number of tiles must be >= the number of movers; defaults to None.

  • mover_goal_xy_pos – a numpy array of shape (num_movers_with_goals,2) containing the (x,y) goal positions of the movers (num_movers_with_goals <= num_movers). Note that only the first 6 movers have different colors to make the movers clearly distinguishable. Movers without goals are shown in gray. If set to None, no goals will be displayed and all movers are colored in gray; defaults to None

  • custom_xml_strings

    a dictionary containing additional xml strings to provide the ability to add actuators, sensors, objects, robots, etc. to the model. The keys determine where to add a string in the xml structure and the values contain the xml string to add. The following keys are accepted:

    • ’custom_compiler_xml_str’:

      A custom ‘compiler’ xml element. Note that the entire default ‘compiler’ element is replaced.

    • ’custom_visual_xml_str’:

      A custom ‘visual’ xml element. Note that the entire default ‘visual’ element is replaced.

    • ’custom_option_xml_str’:

      A custom ‘option’ xml element. Note that the entire default ‘option’ element is replaced.

    • ’custom_assets_xml_str’:

      This xml string adds elements to the ‘asset’ grouping element.

    • ’custom_default_xml_str’:

      This xml string adds elements to the ‘default’ grouping element.

    • ’custom_worldbody_xml_str’:

      This xml string adds elements to the ‘worldbody’ grouping element.

    • ’custom_outworldbody_xml_str’:

      This xml string should be used to include files or add elements other than ‘compiler’, ‘visual’, ‘option’, ‘asset’, ‘default’ or ‘worldbody’.

    If set to None, only the basic xml string is generated, containing tiles, movers (excluding actuators), and possibly goals; defaults to None

Returns:

MuJoCo model xml string

get_c_size_arr(c_size: float | ndarray, num_reps: int) ndarray

Return the size of the collision shape as a numpy array of shape (num_reps,1) or (num_reps,2) depending on the collision shape. This method should be used to obtain the appropriate c_size_arr if the same size is to be used for all movers.

Parameters:
  • c_size

    the size of the collision shape:

    • collision_shape = ‘circle’:

      use a single float value to specify the same size for all movers and a numpy array of shape (num_reps,) to specify individual sizes for each mover

    • collision_shape = ‘box’:

      use a numpy array of shape (2,) to specify the same size for all movers and a numpy array of shape (num_reps,2) to specify individual sizes for each mover

  • num_reps – the number of repetitions of c_size if the same size of collision shape is to be used for all movers. Otherwise, this value is ignored.

Returns:

the collision shape sizes as a numpy array of a suitable shape:

  • collision_shape = ‘circle’:

    a numpy array of shape (num_reps,1)

  • collision_shape = ‘box’:

    a numpy array of shape (num_reps,2) if c_size is a numpy array of shape (2,). Otherwise, c_size is not modified.

get_mover_qacc(mover_name: str, add_noise: bool = False) ndarray

Returns the linear and angular acceleration (qacc) of the desired mover.

Parameters:
  • mover_name – name of the mover for which the acceleration should be returned (corresponds to the body name of the mover in the MuJoCo model)

  • add_noise – whether to add Gaussian noise, defaults to False

Returns:

linear and angular acceleration of the mover (x,y,z,a,b,c)

get_mover_qacc_arr(mover_names: list[str], add_noise: bool = False) ndarray

Return the qacc of several movers as a numpy array of shape (num_movers,6).

Parameters:
  • mover_names – a list of mover names for which the qacc should be returned (correspond to the body name of the mover in the MuJoCo model)

  • add_noise – whether to add Gaussian noise to the qacc of the movers, defaults to False

Returns:

a numpy array of shape (num_movers,6) containing the qacc (x,y,z,a,b,c) of each mover. The order of the qacc corresponds to the order of the mover names.

get_mover_qpos(mover_name: str, add_noise: bool = False) ndarray

Returns the position and orientation of the desired mover. The orientation is returned as a quaternion (w,x,y,z). Note that the z-pos is the distance between the bottom of the mover and the top of a tile. In contrast, the z-pos in the MuJoCo model is the previously mentioned distance + half the height of a mover.

Parameters:
  • mover_name – name of the mover for which the position and orientation should be returned (corresponds to the body name of the mover in the MuJoCo model)

  • add_noise – whether to add Gaussian noise, defaults to False

Returns:

position and orientation of the desired mover (x_p,y_p,z_p,w_o,x_o,y_o,z_o)

get_mover_qpos_arr(mover_names: list[str], add_noise: bool = False) ndarray

Return the qpos of several movers as a numpy array of shape (num_movers,7).

Parameters:
  • mover_names – a list of mover names for which the qpos should be returned (correspond to the body name of the mover in the MuJoCo model)

  • add_noise – whether to add Gaussian noise to the qpos of the movers, defaults to False

Returns:

a numpy array of shape (num_movers,7) containing the qpos (x_p,y_p,z_p,w_o,x_o,y_o,z_o) of each mover. The order of the qpos corresponds to the order of the mover names.

get_mover_qvel(mover_name: str, add_noise: bool = False) ndarray

Return the linear and angular velocities (qvel) of the desired mover.

Parameters:
  • mover_name – name of the mover for which the velocity should be returned (corresponds to the body name of the mover in the MuJoCo model)

  • add_noise – whether to add Gaussian noise, defaults to False

Returns:

linear and angular velocities of the mover (x,y,z,a,b,c)

get_mover_qvel_arr(mover_names: list[str], add_noise: bool = False) ndarray

Return the qvel of several movers as a numpy array of shape (num_movers,6).

Parameters:
  • mover_names – a list of mover names for which the qvel should be returned (correspond to the body name of the mover in the MuJoCo model)

  • add_noise – whether to add Gaussian noise to the qvel of the movers, defaults to False

Returns:

a numpy array of shape (num_movers,6) containing the qvel (x,y,z,a,b,c) of each mover. The order of the qvel corresponds to the order of the mover names.

get_tile_indices_mask(mask: ndarray) tuple[ndarray, ndarray]

Find the x and y indices of tiles that correspond to the specified structure (the mask) in the tile layout. Note that the indices of the top left tile in the mask are returned.

Parameters:

mask – a 2D numpy array containing only 0 and 1 which specifies the structure to be found in the tile layout

Returns:

the x and y indices of the tiles in separate numpy arrays, each of shape (num_mask_found,)

get_tile_xy_pos() tuple[ndarray, ndarray]

Find the (x,y)-positions of the tiles. The position of a tile in the tile layout with index (i_x,i_y), can be found using (x-pos[i_x,i_y], y-pos[i_x,i_y]), where x-pos and y-pos are returned by this method. Note that the base frame is in the upper left corner.

Returns:

the x and y positions of the tiles in separate numpy arrays, each of shape (num_tiles_x, num_tiles_y)

observation_space(agent: AgentID) Space

Takes in agent and returns the observation space for that agent.

MUST return the same value for the same agent name

Default implementation is to return the observation_spaces dict

qpos_is_valid(qpos: ndarray, c_size: float | ndarray, add_safety_offset: bool = False) ndarray

Check whether qpos is valid. This method considers the edges as imaginary walls if there is no other tile next to that edge. A position is valid if it is above a tile and the distance to the walls is greater that the required safety margin, i.e. no collision with a wall. This also ensures that the position is reachable in case the specified position is a goal position.

This method allows to check multiple qpos at the same time, where the movers can be of different sizes. The orientation of the mover is taken into account if collision_shape = ‘box’, otherwise (collision_shape = ‘circle’) the orientation of the mover is ignored.

Parameters:
  • qpos – a numpy array of shape (num_qpos,7) containing the qpos (x_p,y_p,z_p,w_o,x_o,y_o,z_o) to be checked

  • c_size

    the size of the collision shape

    • collision_shape = ‘circle’:

      use a single float value to specify the same size for all movers and a numpy array of shape (num_qpos,) to specify individual sizes for each mover

    • collision_shape = ‘box’:

      use a numpy array of shape (2,) to specify the same size for all movers and a numpy array of shape (num_qpos,2) to specify individual sizes for each mover

  • add_safety_offset – whether to add the size offset (can be specified using: collision_params[“offset”]), defaults to False. Note that the same size offset is added for all movers.

Returns:

a numpy array of shape (num_qpos,), where an element is 1 if the qpos is valid, otherwise 0

render() ndarray | None

Compute frames depending on the initially specified render_mode. Before the corresponding viewer is updated, the _render_callback() is called to give the opportunity to add more functionality.

Returns:

returns a numpy array if render_mode != ‘human’, otherwise it returns None (render_mode ‘human’)

reset(seed: int | None = None, options: dict | None = None) tuple[dict[AgentID, ObsType], dict[AgentID, dict]]

Resets the environment.

And returns a dictionary of observations (keyed by the agent name)

state() ndarray

Returns the state.

State returns a global view of the environment appropriate for centralized training decentralized execution methods like QMIX

step(actions: dict[AgentID, ActionType]) tuple[dict[AgentID, ObsType], dict[AgentID, float], dict[AgentID, bool], dict[AgentID, bool], dict[AgentID, dict]]

Receives a dictionary of actions keyed by the agent name.

Returns the observation dictionary, reward dictionary, terminated dictionary, truncated dictionary and info dictionary, where each dictionary is keyed by the agent.

window_viewer_is_running() bool

Check whether the window viewer (render_mode ‘human’) is active, i.e. the window is open.

Returns:

True if the window is open, False otherwise