Inner Environments#

Inner environments represent the logic of the environment dynamics purely in OOP. Its methods receive and return Action, State, Observation objects directly.

class InnerEnv(state_space, action_space, observation_space)[source]

Inner environment

Inner environments provide an interface primarily based on python objects, with states represented by State, observations by Observation, and actions by Action.

An inner environment has the following responsibilities:

(re)set the initial state (as State),
update the state (as State),
return the observation (as Observation),
return the reward (as float),
return the terminal signal (as bool).

An inner environment provides these functionalities using two types of interfaces: a functional one and a non-functional one.

Functional Interface#

The functional interface ignores the environment’s internal state, and require the method callers to provide their own states:

abstract InnerEnv.functional_reset()[source]

Returns a new state

Return type: State

abstract InnerEnv.functional_step(state, action)[source]

Returns next state, reward, and done flag

Return type: Tuple[State, float, bool]

abstract InnerEnv.functional_observation(state)[source]

Returns observation

Return type: Observation

Non-Functional Interface#

The non-functional interface uses the environment’s internal state, and internally refers back to the functional interface by providing that state:

InnerEnv.reset()[source]

Resets the state

Internally calls functional_reset() to reset the state; also resets the observation, so that an updated observation will be generated upon request.

InnerEnv.step(action)[source]

Runs the dynamics for one timestep, and returns reward and done flag

Internally calls functional_step() to update the state; also resets the observation, so that an updated observation will be generated upon request.

Parameters: action (Action) – the chosen action to apply
Returns: reward and terminal
Return type: Tuple[float, bool]

InnerEnv.state

Return the current state

Return type: State

InnerEnv.observation

Returns the current observation

Internally calls functional_observation() to generate the current observation based on the current state. The observation is generated lazily, such that at most one observation is generated for each state. As a consequence, this will return the same observation until the state is reset/updated, even if the observation function is stochastic.

Return type: Observation

The following figure depicts the inner working of a generic inner environment.

../../_images/inner-env-design-dark.png — Fig. 3 Schematic of the inner environment.#

../../_images/inner-env-design-light.png — Fig. 4 Schematic of the inner environment.#

GridWorld#

InnerEnv is actually a pure interface, in the sense that it provides no concrete implementation but only a set of methods which other concrete classes should instantiate. Currently, GV only provides a single implementation of this interface (GridWorld), which makes specific assumptions about the implementation of the functional methods. Technically, other implementations can (and will, eventually) be provided, for which the rest of this section would not necessarily hold.

class GridWorld(state_space, action_space, observation_space, reset_function, transition_function, observation_function, reward_function, termination_function)[source]

Implementation of the InnerEnv interface.

Initializes a GridWorld from the given components.

Parameters

state_space (StateSpace) –
action_space (ActionSpace) –
observation_space (ObservationSpace) –
reset_function (ResetFunction) – (ResetFunction):
transition_function (TransitionFunction) – (TransitionFunction),:
observation_function (ObservationFunction) –
reward_function (RewardFunction) –
termination_function (TerminatingFunction) –