Outer Environments#
Outer environments are wrappers which contain an inner environment and state and/or observation representations. The main purpose of the outer environments, is to hide the internal logic of inner environments, which is based on python object manipulation, and provide an interface based exclusively on raw numeric data.
- class OuterEnv(env, *, state_representation=None, observation_representation=None)[source]
Outer environment
Outer environments provide an interface primarily based on numeric data, with states and observations represented by
ndarray
, and actions byAction
.
An outer environment mostly has the same responsibilities as the corresponding inner environments, with the main difference being the data format for states and observations:
(re)set the initial state (as dictionary of
numpy.ndarray
),update the state (as dictionary of
numpy.ndarray
),return the observation (as dictionary of
numpy.ndarray
),return the reward (as
float
),return the terminal signal (as
bool
).
Note
The conversion between inner and outer data representations can only be performed in one direction: from inner to outer. That means that, while it is possible to convert state/observation python objects into raw numeric states/observations, it is not possible to convert raw numeric states/observations into state/observation objects. As a consequence, outer environments do not provide a functional interface.
Non-Functional Interface#
The non-functional interface uses the environment’s internal state, and internally refers back to the functional interface by providing that state:
- OuterEnv.step(action)[source]
Runs the dynamics for one timestep, and returns reward and done flag
- OuterEnv.state
Returns the representation of the current state.
- Return type
Dict[str, numpy.ndarray]
- OuterEnv.observation
Returns the representation of the current observation.
- Return type
Dict[str, numpy.ndarray]
The following figure depicts the inner working of a generic outer environment.