Inner Environments#
Inner environments represent the logic of the environment dynamics purely in
OOP. Its methods receive and return Action
,
State
,
Observation
objects directly.
- class InnerEnv(state_space, action_space, observation_space)[source]
Inner environment
Inner environments provide an interface primarily based on python objects, with states represented by
State
, observations byObservation
, and actions byAction
.
An inner environment has the following responsibilities:
(re)set the initial state (as
State
),update the state (as
State
),return the observation (as
Observation
),return the reward (as
float
),return the terminal signal (as
bool
).
An inner environment provides these functionalities using two types of interfaces: a functional one and a non-functional one.
Functional Interface#
The functional interface ignores the environment’s internal state, and require the method callers to provide their own states:
- abstract InnerEnv.functional_step(state, action)[source]
Returns next state, reward, and done flag
- abstract InnerEnv.functional_observation(state)[source]
Returns observation
- Return type
Non-Functional Interface#
The non-functional interface uses the environment’s internal state, and internally refers back to the functional interface by providing that state:
- InnerEnv.reset()[source]
Resets the state
Internally calls
functional_reset()
to reset the state; also resets the observation, so that an updated observation will be generated upon request.
- InnerEnv.step(action)[source]
Runs the dynamics for one timestep, and returns reward and done flag
Internally calls
functional_step()
to update the state; also resets the observation, so that an updated observation will be generated upon request.
- InnerEnv.state
Return the current state
- Return type
- InnerEnv.observation
Returns the current observation
Internally calls
functional_observation()
to generate the current observation based on the current state. The observation is generated lazily, such that at most one observation is generated for each state. As a consequence, this will return the same observation until the state is reset/updated, even if the observation function is stochastic.- Return type
The following figure depicts the inner working of a generic inner environment.
GridWorld#
InnerEnv
is actually a pure
interface, in the sense that it provides no concrete implementation but only a
set of methods which other concrete classes should instantiate. Currently, GV
only provides a single implementation of this interface
(GridWorld
), which makes specific
assumptions about the implementation of the functional methods. Technically,
other implementations can (and will, eventually) be provided, for which the
rest of this section would not necessarily hold.
- class GridWorld(state_space, action_space, observation_space, reset_function, transition_function, observation_function, reward_function, termination_function)[source]
Implementation of the InnerEnv interface.
Initializes a GridWorld from the given components.
- Parameters
state_space (StateSpace) –
action_space (ActionSpace) –
observation_space (ObservationSpace) –
reset_function (
ResetFunction
) – (ResetFunction):transition_function (
TransitionFunction
) – (TransitionFunction),:observation_function (ObservationFunction) –
reward_function (RewardFunction) –
termination_function (TerminatingFunction) –