Basic Magnetic Robotics Single-Agent Environment#

class magbotsim.rl_envs.basic_single_agent_env.BasicMagBotSingleAgentEnv(layout_tiles: ndarray, num_movers: int, tile_params: dict[str, Any] | None = None, mover_params: dict[str, Any] | None = None, initial_mover_zpos: float = 0.005, table_height: float = 0.4, std_noise: ndarray | float = 1e-05, render_mode: str | None = 'human', render_every_cycle: bool = False, default_cam_config: dict[str, Any] | None = None, width_no_camera_specified: int = 1240, height_no_camera_specified: int = 1080, num_cycles: int = 40, collision_params: dict[str, Any] | None = None, initial_mover_start_xy_pos: ndarray | None = None, initial_mover_goal_xy_pos: ndarray | None = None, custom_model_xml_strings: dict[str, str] | None = None, use_mj_passive_viewer: bool = False)[source]

Bases: BasicMagBotEnv, Env, ABC

A base class for single-agent reinforcement learning environments in the field of Magnetic Robotics that follow the Gymnasium API. A more detailed explanation of all parameters can be found in the documentation of the BasicMagBotEnv.

Parameters:

layout_tiles – the tile layout
num_movers – the number of movers
tile_params – tile parameters such as the size and mass, defaults to None
mover_params – mover parameters such as the size and mass, defaults to None
initial_mover_zpos – the initial distance between the bottom of the mover and the top of a tile, defaults to 0.005 [m]
table_height – the height of a table on which the tiles are placed, defaults to 0.4 [m]
std_noise – the standard deviation of a Gaussian with zero mean used to add noise, defaults to 1e-5
render_mode – the mode that is used to render the frames (‘human’, ‘rgb_array’ or None), defaults to ‘human’
render_every_cycle – whether to call render() after each integrator step in the step() method, defaults to False. Rendering every cycle leads to a smoother visualization of the scene, but can also be computationally expensive. Thus, this parameter provides the possibility to speed up training and evaluation. Regardless of this parameter, the scene is always rendered after ‘num_cycles’ have been executed if ‘render_mode != None’.
default_cam_config – dictionary with attribute values of the viewer’s default camera (see MuJoCo docs), defaults to None
width_no_camera_specified – if render_mode != ‘human’ and no width is specified, this value is used, defaults to 1240
height_no_camera_specified – if render_mode != ‘human’ and no height is specified, this value is used, defaults to 1080
num_cycles – the number of control cycles for which to apply the same action, defaults to 40
collision_params – a dictionary that can be used to specify collision parameters, defaults to None
initial_mover_start_xy_pos – the initial (x,y) starting positions of the movers, defaults to None
initial_mover_goal_xy_pos – the initial (x,y) goal positions of the movers, defaults to None
custom_xml_strings –
a dictionary containing additional XML strings to provide the ability to add actuators, sensors, objects, robots, etc. to the model. The keys determine where to add a string in the XML structure and the values contain the XML string to add. The following keys are accepted:
- custom_compiler_xml_str:
  A custom ‘compiler’ XML section. Note that the entire default ‘compiler’ section is replaced.
- custom_visual_xml_str:
  A custom ‘visual’ XML section. Note that the entire default ‘visual’ section is replaced.
- custom_option_xml_str:
  A custom ‘option’ XML section. Note that the entire default ‘option’ section is replaced.
- custom_assets_xml_str:
  This XML string adds elements to the ‘asset’ section.
- custom_default_xml_str:
  This XML string adds elements to the ‘default’ section.
- custom_worldbody_xml_str:
  This XML string adds elements to the ‘worldbody’ section.
- custom_contact_xml_str:
  This XML string adds elements to the ‘contact’ section.
- custom_outworldbody_xml_str:
  This XML string should be used to include files or add sections.
- custom_mover_body_xml_str_list:
  This list of XML strings should be used to attach objects to a mover. Note that this a list with length num_movers. If nothing is attached to a mover, add None at the corresponding mover index.
If set to None, only the basic XML string is generated, containing tiles, movers (excluding actuators), and possibly goals; defaults to None. This dictionary can be further modified using the _custom_xml_string_callback().
use_mj_passive_viewer – whether the MuJoCo passive_viewer should be used, defaults to False. If set to False, the Gymnasium MuJoCo WindowViewer with custom overlays is used.

_after_mujoco_step_callback() → None[source]: A callback that should be used to add further functionality to the step() method (see documentation of the step() method for more information about when the callback is called).

_before_mujoco_step_callback(action: int | ndarray) → None[source]

A callback that should be used to add further functionality to the step() method (see documentation of the step() method for more information about when the callback is called).

Parameters:: action – the action to apply

_check_for_other_collisions_callback() → tuple[bool, dict[str, Any] | None][source]

A callback that is intended to use to check for other collisions besides mover or wall collisions, e.g. collisions with obstacles.

Returns:

whether there is a collision (bool)
a dictionary that is intended to contain additional information about the collision (can be None)

_custom_xml_string_callback(custom_model_xml_strings: dict[str, str] | None = None) → dict[str, str] | None

A callback that should be used to add further functionality to the __init__() method. This callback should be used to modify the custom XML string in the custom_model_xml_strings dictionary after the tile, mover and collision parameters have been preprocessed and checked, but before the MuJoCo model XML string is generated. This allows adding custom XML strings based on the tile or mover configuration, e.g. to add actuators for each mover.

Parameters:: custom_model_xml_strings – a dictionary containing additional XML strings to provide the ability to add actuators, sensors, objects, robots, etc. to the model., defaults to None (see documentation of the __init__() method for more detailed information). Note that this dictionary may be modified within this method.
Returns:: the possibly modified dictionary with additional XML strings

_on_step_end_callback(observation: dict[str, ndarray] | ndarray) → None[source]

A callback that should be used to add further functionality to the step() method (see documentation of the step() method for more information about when the callback is called).

Parameters:: observation – the next observation after the action was applied

_render_callback() → None: A callback that should be used to add further functionality to the render() method (see documentation of the render() method for more information about when the callback is called).

_reset_callback(options: dict[str, Any] | None = None) → None[source]

A callback that should be used to add further functionality to the reset() method (see documentation of the reset() method for more information about when the callback is called).

Parameters:: options – a dictionary that can be used to specify additional reset options, e.g. object parameters; defaults to None

_step_callback(action: int | ndarray) → None[source]

A callback that should be used to add further functionality to the step() method (see documentation of the step() method for more information about when the callback is called).

Parameters:: action – the action to apply

check_mover_collision(mover_names: list[str], c_size: float | ndarray, add_safety_offset: bool = False, mover_qpos: ndarray | None = None, add_qpos_noise: bool = False) → bool

Check whether two movers specified in mover_names collide. In case of collision shape ‘box’, this method takes the orientation of the movers into account.

Parameters:

mover_names – a list of mover names that should be checked (correspond to the body name of the mover in the MuJoCo model)
c_size –
the size of the collision shape of the movers
- collision_shape = ‘circle’:
  use a single float value to specify the same size for all movers and a numpy array of shape (num_movers,) to specify individual sizes for each mover
- collision_shape = ‘box’:
  use a numpy array of shape (2,) to specify the same size for all movers and a numpy array of shape (num_movers,2) to specify individual sizes for each mover
add_safety_offset – whether to add the size offset (can be specified using: collision_params[“offset”]), defaults to False. Note that the same size offset is added for both movers.
mover_qpos – the qpos of the movers specified as a numpy array of shape (num_movers,7) (x_p,y_p,z_p,w_o,x_o,y_o,z_o). If set to None, the current qpos of the movers in the MuJoCo model is used; defaults to None
add_qpos_noise – whether to add Gaussian noise to the qpos of the movers, defaults to False. Only used if mover_qpos is not None.

Returns:

True if the movers collide, False otherwise

check_wall_collision(mover_names: list[str], c_size: float | ndarray, add_safety_offset: bool = False, mover_qpos: ndarray | None = None, add_qpos_noise: bool = False) → ndarray

Check whether the qpos of the movers listed in mover_names are valid, i.e. no wall collisions.

Parameters:

mover_names – a list of mover names that should be checked (correspond to the body name of the mover in the MuJoCo model)
c_size –
the size of the collision shape
- collision_shape = ‘circle’:
  use a single float value to specify the same size for all movers and a numpy array of shape (num_movers,) to specify individual sizes for each mover
- collision_shape = ‘box’:
  use a numpy array of shape (2,) to specify the same size for all movers and a numpy array of shape (num_movers,2) to specify individual sizes for each mover
add_safety_offset – whether to add the size offset (can be specified using: collision_params[“offset”]), defaults to False. Note that the same size offset is added for all movers.
mover_qpos – a numpy array of shape (num_qpos,7) containing the qpos (x_p,y_p,z_p,w_o,x_o,y_o,z_o) of each mover or None. If set to None, the current qpos of each mover in the MuJoCo model is used; defaults to None
add_qpos_noise – whether to add Gaussian noise to the qpos of the movers, defaults to False. Only used if mover_qpos is not None.

Returns:

a numpy array of shape (num_movers,), where an element is 1 if the qpos is valid (no wall collision), otherwise 0

close() → None: Close the environment.

abstractmethod compute_reward(achieved_goal: ndarray | None = None, desired_goal: ndarray | None = None, info: dict[str, Any] | None = None) → ndarray | float[source]

Compute the immediate reward. This method is required by the stable-baselines3 implementation of Hindsight Experience Replay (HER) (for more information, see https://stable-baselines3.readthedocs.io/en/master/modules/her.html).

Parameters:

achieved_goal – a numpy array of shape (batch_size, length achieved_goal) or (length achieved_goal,) containing the goals already achieved (goal-conditioned RL); defaults to None (standard RL)
desired_goal – a numpy array of shape (batch_size, length desired_goal) or (length desired_goal,) containing the desired goals (goal-conditioned RL); defaults to None (standard RL)
info – a dictionary containing auxiliary information, defaults to None

Returns:

a single float value or a numpy array of shape (batch_size,) containing the immediate rewards

abstractmethod compute_terminated(achieved_goal: ndarray | None = None, desired_goal: ndarray | None = None, info: dict[str, Any] | None = None) → ndarray | bool[source]

Check whether a terminal state is reached. This method can be used for both goal-conditioned RL and standard RL. Since Hindsight Experience Replay (HER) is commonly used in goal-conditioned RL, this method receives the ‘achieved_goal’ and ‘desired_goal’ corresponding to the requirements of the HER implementation of stable-baselines3 (for more information, see https://stable-baselines3.readthedocs.io/en/master/modules/her.html).

Parameters:

achieved_goal – a numpy array of shape (batch_size, length achieved_goal) or (length achieved_goal,) containing the goals already achieved (goal-conditioned RL); defaults to None (standard RL)
desired_goal – a numpy array of shape (batch_size, length desired_goal) or (length desired_goal,) containing the desired goals (goal-conditioned RL); defaults to None (standard RL)
info – a dictionary containing auxiliary information, defaults to None

Returns:

a single bool value or a numpy array of shape (batch_size,) containing Boolean values, where True indicates that a terminal state has been reached

abstractmethod compute_truncated(achieved_goal: ndarray | None = None, desired_goal: ndarray | None = None, info: dict[str, Any] | None = None) → ndarray | bool[source]

Check whether the truncation condition is satisfied. This method can be used for both goal-conditioned RL and standard RL. Since Hindsight Experience Replay (HER) is commonly used in goal-conditioned RL, this method receives the ‘achieved_goal’ and ‘desired_goal’ corresponding to the requirements of the HER implementation of stable-baselines3 (for more information, see https://stable-baselines3.readthedocs.io/en/master/modules/her.html).

Parameters:

achieved_goal – a numpy array of shape (batch_size, length achieved_goal) or (length achieved_goal,) containing the goals already achieved (goal-conditioned RL); defaults to None (standard RL)
desired_goal – a numpy array of shape (batch_size, length desired_goal) or (length desired_goal,) containing the desired goals (goal-conditioned RL); defaults to None (standard RL)
info – a dictionary containing auxiliary information, defaults to None

Returns:

a single bool value or a numpy array of shape (batch_size,) containing Boolean values, where True indicates that a the truncation condition is satisfied

generate_model_xml_string(mover_start_xy_pos: ndarray | None = None, mover_goal_xy_pos: ndarray | None = None, custom_xml_strings: dict[str, str] | None = None) → str

Generate a MuJoCo model XML string based on the mover-tile configuration of the environment.

Parameters:

mover_start_xy_pos – a numpy array of shape (num_movers,2) containing the (x,y) starting positions of each mover. If set to None, the movers will be placed in the center of a tile, i.e. the number of tiles must be >= the number of movers; defaults to None.
mover_goal_xy_pos – a numpy array of shape (num_movers_with_goals,2) containing the (x,y) goal positions of the movers (num_movers_with_goals <= num_movers). Note that only the first 6 movers have different colors to make the movers clearly distinguishable. Movers without goals are shown in gray. If set to None, no goals will be displayed and all movers are colored in gray; defaults to None
custom_xml_strings –
a dictionary containing additional XML strings to provide the ability to add actuators, sensors, objects, robots, etc. to the model. The keys determine where to add a string in the XML structure and the values contain the XML string to add. The following keys are accepted:
- custom_compiler_xml_str:
  A custom ‘compiler’ XML section. Note that the entire default ‘compiler’ section is replaced.
- custom_visual_xml_str:
  A custom ‘visual’ XML section. Note that the entire default ‘visual’ section is replaced.
- custom_option_xml_str:
  A custom ‘option’ XML section. Note that the entire default ‘option’ section is replaced.
- custom_assets_xml_str:
  This XML string adds elements to the ‘asset’ section.
- custom_default_xml_str:
  This XML string adds elements to the ‘default’ section.
- custom_worldbody_xml_str:
  This XML string adds elements to the ‘worldbody’ section.
- custom_contact_xml_str:
  This XML string adds elements to the ‘contact’ section.
- custom_outworldbody_xml_str:
  This XML string should be used to include files or add sections.
- custom_mover_body_xml_str_list:
  This list of XML strings should be used to attach objects to a mover. Note that this a list with length num_movers. If nothing is attached to a mover, add None at the corresponding mover index.
If set to None, only the basic XML string is generated, containing tiles, movers (excluding actuators), and possibly goals; defaults to None

Returns:

MuJoCo model XML string

get_c_size_arr(c_size: float | ndarray, num_reps: int) → ndarray

Return the size of the collision shape as a numpy array of shape (num_reps,1) or (num_reps,2) depending on the collision shape. This method should be used to obtain the appropriate c_size_arr if the same size is to be used for all movers.

Parameters:

c_size –
the size of the collision shape:
- collision_shape = ‘circle’:
  use a single float value to specify the same size for all movers and a numpy array of shape (num_reps,) to specify individual sizes for each mover
- collision_shape = ‘box’:
  use a numpy array of shape (2,) to specify the same size for all movers and a numpy array of shape (num_reps,2) to specify individual sizes for each mover
num_reps – the number of repetitions of c_size if the same size of collision shape is to be used for all movers. Otherwise, this value is ignored.

Returns:

the collision shape sizes as a numpy array of a suitable shape:

collision_shape = ‘circle’:
a numpy array of shape (num_reps,1)
collision_shape = ‘box’:
a numpy array of shape (num_reps,2) if c_size is a numpy array of shape (2,). Otherwise, c_size is not modified.

get_mover_qacc(mover_names: str | list[str], add_noise: bool = False) → ndarray

Return the qacc of several movers as a numpy array of shape (num_movers,6).

Parameters:

mover_names – a single mover name or a list of mover names for which the qacc should be returned (correspond to the body name of the mover in the MuJoCo model)
add_noise – whether to add Gaussian noise to the qacc of the movers, defaults to False

Returns:

a numpy array of shape (num_movers,6) containing the qacc (x,y,z,a,b,c) of each mover. The order of the qacc corresponds to the order of the mover names.

get_mover_qpos(mover_names: str | list[str], add_noise: bool = False) → ndarray

Return the qpos of several movers as a numpy array of shape (num_movers,7).

Parameters:

mover_names – a single mover name or a list of mover names for which the qpos should be returned (correspond to the body name of the mover in the MuJoCo model)
add_noise – whether to add Gaussian noise to the qpos of the movers, defaults to False

Returns:

a numpy array of shape (num_movers,7) containing the qpos (x_p,y_p,z_p,w_o,x_o,y_o,z_o) of each mover. The order of the qpos corresponds to the order of the mover names.

get_mover_qvel(mover_names: str | list[str], add_noise: bool = False) → ndarray

Return the qvel of several movers as a numpy array of shape (num_movers,6).

Parameters:

mover_names – a single mover name or a list of mover names for which the qvel should be returned (correspond to the body name of the mover in the MuJoCo model)
add_noise – whether to add Gaussian noise to the qvel of the movers, defaults to False

Returns:

a numpy array of shape (num_movers,6) containing the qvel (x,y,z,a,b,c) of each mover. The order of the qvel corresponds to the order of the mover names.

get_tile_indices_mask(mask: ndarray) → tuple[ndarray, ndarray]

Find the x and y indices of tiles that correspond to the specified structure (the mask) in the tile layout. Note that the indices of the top left tile in the mask are returned.

Parameters:: mask – a 2D numpy array containing only 0 and 1 which specifies the structure to be found in the tile layout
Returns:: the x and y indices of the tiles in separate numpy arrays, each of shape (num_mask_found,)

get_tile_xy_pos() → tuple[ndarray, ndarray]

Find the (x,y)-positions of the tiles. The position of a tile in the tile layout with index (i_x,i_y), can be found using (x-pos[i_x,i_y], y-pos[i_x,i_y]), where x-pos and y-pos are returned by this method. Note that the base frame is in the upper left corner.

Returns:: the x and y positions of the tiles in separate numpy arrays, each of shape (num_tiles_x, num_tiles_y)

get_wrapper_attr(name: str) → Any: Gets the attribute name from the environment.

has_wrapper_attr(name: str) → bool: Checks if the attribute name exists in the environment.

property np_random: Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:: Instances of np.random.Generator

property np_random_seed: int

Returns the environment’s internal _np_random_seed that if not set will first initialise with a random int as seed.

If np_random_seed was set directly instead of through reset() or set_np_random_through_seed(), the seed will take the value -1.

Returns:: int: the seed of the current np_random or -1, if the seed of the rng is unknown

qpos_is_valid(qpos: ndarray, c_size: float | ndarray, add_safety_offset: bool = False) → ndarray

Check whether qpos is valid. This method considers the edges as imaginary walls if there is no other tile next to that edge. A position is valid if it is above a tile and the distance to the walls is greater that the required safety margin, i.e. no collision with a wall. This also ensures that the position is reachable in case the specified position is a goal position.

This method allows to check multiple qpos at the same time, where the movers can be of different sizes. The orientation of the mover is taken into account if collision_shape = ‘box’, otherwise (collision_shape = ‘circle’) the orientation of the mover is ignored.

Parameters:

qpos – a numpy array of shape (num_qpos,7) containing the qpos (x_p,y_p,z_p,w_o,x_o,y_o,z_o) to be checked
c_size –
the size of the collision shape
- collision_shape = ‘circle’:
  use a single float value to specify the same size for all movers and a numpy array of shape (num_qpos,) to specify individual sizes for each mover
- collision_shape = ‘box’:
  use a numpy array of shape (2,) to specify the same size for all movers and a numpy array of shape (num_qpos,2) to specify individual sizes for each mover
add_safety_offset – whether to add the size offset (can be specified using: collision_params[“offset”]), defaults to False. Note that the same size offset is added for all movers.

Returns:

a numpy array of shape (num_qpos,), where an element is 1 if the qpos is valid, otherwise 0

render() → ndarray | None

Compute frames depending on the initially specified render_mode. Before the corresponding viewer is updated, the _render_callback() is called to give the opportunity to add more functionality.

Returns:: returns a numpy array if render_mode != ‘human’, otherwise it returns None (render_mode ‘human’)

reset(seed: int | None = None, options: dict[str, Any] | None = None) → tuple[dict[str, ndarray], dict[str, Any]][source]

Reset the environment returning an initial observation and auxiliary information. More detailed information about the parameters and return values can be found in the Gymnasium documentation: https://gymnasium.farama.org/api/env/#gymnasium.Env.reset.

This method performs the following steps:

reset RNG, if desired
call _reset_callback(option) to give the user the opportunity to add more functionality
call mj_forward()
check whether there are mover, wall, or other collisions, e.g. collisions with an obstacle
call render()
get initial observation and info dictionary

Parameters:

seed – if set to None, the RNG is not reset; if int, sets the desired seed; defaults to None
options – a dictionary that can be used to specify additional reset options, e.g. object parameters; defaults to None

Returns:

initial observation and auxiliary information contained in the ‘info’ dictionary

set_wrapper_attr(name: str, value: Any, *, force: bool = True) → bool: Sets the attribute name on the environment with value, see Wrapper.set_wrapper_attr for more info.

step(action: int | ndarray) → tuple[dict[str, ndarray], float, bool, bool, dict[str, Any]][source]

Execute one step of the environment’s dynamics applying the given action. Note that the environment executes as many MuJoCo simulation steps as the number of cycles specified for this environment (num_cycles). The duration of one cycle is determined by the cycle time, which must be specified in the MuJoCo XML string using the option/timestep parameter. The same action is applied for all cycles.

This method performs the following steps:

check whether the dimension of the action matches the dimension of the action space
if the action space does not contain the specified action, the action is clipped to the interval edges of the action space
call _step_callback(action) to give the user the opportunity to add more functionality
execute MuJoCo simulation steps (mj_step()). After each simulation step, it is checked whether there are mover, wall, or other collisions, e.g. collisions with an obstacle. To check for other collisions besides mover and wall collisions the _check_for_other_collisions_callback() is called. In case of a collision, no further simulation steps are performed, as a real system would typically stop as well due to position lag errors. In addition, render() can be called after each simulation step to provide a smooth visualization of the movement (set render_every_cycle=True). The callbacks _before_mujoco_step_callback(action) and _after_mujoco_step_callback() are executed before and after mujoco.mj_step(self.model, self.data, nstep=1) is called and can be used to add functionality. This can be useful, for example, to ensure velocity or acceleration limits within each cycle.
call render()
get return values
call _on_step_end_callback(observation) to give the user the opportunity to add more functionality

More detailed information about the parameters and return values can be found in the Gymnasium documentation: https://gymnasium.farama.org/api/env/#gymnasium.Env.step.

Parameters:

action – the action to apply

Returns:

the next observation
the immediate reward for taking the action
whether a terminal state is reached
whether the truncation condition is satisfied
auxiliary information contained in the ‘info’ dictionary

property unwrapped: Env[ObsType, ActType]

Returns the base non-wrapped environment.

Returns:: Env: The base non-wrapped gymnasium.Env instance

update_cached_mover_mujoco_data() → None: Update all cached information about MuJoCo objects, such as mover names, mover joint names, mover goal site names, etc.

window_viewer_is_running() → bool

Check whether the window viewer (render_mode ‘human’) is active, i.e. the window is open.

Returns:: True if the window is open, False otherwise