Outer Environments#

Outer environments are wrappers which contain an inner environment and state and/or observation representations. The main purpose of the outer environments, is to hide the internal logic of inner environments, which is based on python object manipulation, and provide an interface based exclusively on raw numeric data.

class OuterEnv(env, *, state_representation=None, observation_representation=None)[source]

Outer environment

Outer environments provide an interface primarily based on numeric data, with states and observations represented by ndarray, and actions by Action.

An outer environment mostly has the same responsibilities as the corresponding inner environments, with the main difference being the data format for states and observations:

(re)set the initial state (as dictionary of numpy.ndarray),
update the state (as dictionary of numpy.ndarray),
return the observation (as dictionary of numpy.ndarray),
return the reward (as float),
return the terminal signal (as bool).

Note

The conversion between inner and outer data representations can only be performed in one direction: from inner to outer. That means that, while it is possible to convert state/observation python objects into raw numeric states/observations, it is not possible to convert raw numeric states/observations into state/observation objects. As a consequence, outer environments do not provide a functional interface.

Non-Functional Interface#

The non-functional interface uses the environment’s internal state, and internally refers back to the functional interface by providing that state:

OuterEnv.reset()[source]

Resets the state

Return type: None

OuterEnv.step(action)[source]

Runs the dynamics for one timestep, and returns reward and done flag

Parameters: action (Action) – agent’s action
Returns: (reward, terminality)
Return type: Tuple[float, bool]

OuterEnv.state

Returns the representation of the current state.

Return type: Dict[str, numpy.ndarray]

OuterEnv.observation

Returns the representation of the current observation.

Return type: Dict[str, numpy.ndarray]

The following figure depicts the inner working of a generic outer environment.

../../_images/outer-env-design-dark.png — Fig. 5 Schematic of the outer environment.#

../../_images/outer-env-design-light.png — Fig. 6 Schematic of the outer environment.#