gym-gridverse#

Gridworld domains for fully and partially observable reinforcement learning

Free software: MIT license
Documentation: https://gym-gridverse.readthedocs.io.

Features#

Customization#

GridVerse is highly customizable; while many components are provided out-of-the-box, it is designed such that you can create your own components programmatically, including your own objects, starting states, transition functions, reward functions, observation functions, terminating functions, etc.

The following GridObjects are provided:

Floor – An empty tile.
Wall – An opaque wall.
Exit – An exit tile.
Door – A door which can be opened/closed.
Key – An item to open a locked Door.
MovingObstacle – An obstacle which moves autonomously.
Box – A container of other GridObjects.
Telepod – A teleporting tile.

The following transition functions are provided:

move_agent – Moves the agent.
turn_agent – Turns the agent.
pickndrop – Lets agent pick and/or drop an object.
actuate_door – Opens/closes a Door.
actuate_box – Opens a Box.
move_obstacles – Lets MovingObstacle objects move.
teleport – Teleports the agent across the Telepods.

The following reward functions are provided:

reduce_sum – A sum of other rewards
living_reward – A constant reward
reach_exit – A reward for reaching an Exit.
overlap – A reward for standing on/off a GridObject type.
proportional_to_distance – Reward based on distance from a GridObject type.
getting_closer – Rewards for moving closer to/further from a GridObject type.
actuate_door – Rewards for actuating a Door.
pickndrop – Rewards for picking and/or dropping GridObject types.

The following observation functions are provided:

from_visibility – Observability determined by custom visibility functions.
full_observation – Observability which is unblocked by Walls.
partial_observation – Observability which is blocked by Walls.
raytracing observation – Observability determined by direct line of sight.

The following terminating functions are provided:

reduce_any – Terminates if any of the given terminating functions are satisfied.
reduce_all – Terminates if all of the given terminating functions are satisfied.
overlap – Terminates if the agent is standing on a GridObject type.
reach_exit – Terminates if the agent reaches an Exit.
bump_moving_obstacle – Terminates if the agent bumps into a MovingObstacle.
bump_into_wall – Terminates if the agent bumps into a Wall.

YAML Configuration Files#

Aside being able to define your own environments programmatically, GridVerse allows you to create and share YAML configuration files which fully describe the components which define an environment. This is a very convenient way to create an environment made of existing components and share it with the world. The yaml/ folder contains a number of environments defined using the YAML configuration format.

Suitable for Fully/Partially Observable Control Problems for Learning/Planning#

Depending on your research interests, most GridVerse components can be used to form either fully observable or partially observable control problems. Further, GridVerse environments provide both a state-ful and a functional interface, depending on whether you are addressing learning or planning problems.

Future work / in progress:#

100% test coverage
Multi-agent support
Benchmark performance of reinforcement learning and planning algorithms

Examples#

yaml/gv_crossing.7x7.yaml
State	Observations

yaml/gv_crossing.7x7.yaml

State

gv_crossing.7x7.state.gif

Observations

gv_crossing.7x7.observation.montage.gif

yaml/gv_dynamic_obstacles.7x7.yaml
State	Observations

yaml/gv_dynamic_obstacles.7x7.yaml

State

gv_dynamic_obstacles.7x7.state.gif

Observations

gv_dynamic_obstacles.7x7.observation.montage.gif

yaml/gv_empty.8x8.yaml
State	Observations

yaml/gv_empty.8x8.yaml

State

gv_empty.8x8.state.gif

Observations

gv_empty.8x8.observation.montage.gif

yaml/gv_four_rooms.9x9.yaml
State	Observations

yaml/gv_four_rooms.9x9.yaml

State

gv_four_rooms.9x9.state.gif

Observations

gv_four_rooms.9x9.observation.montage.gif

yaml/gv_keydoor.5x5.yaml
State	Observations

yaml/gv_keydoor.5x5.yaml

State

gv_keydoor.5x5.state.gif

Observations

gv_keydoor.5x5.observation.montage.gif

yaml/gv_nine_rooms.13.13.yaml
State	Observations

yaml/gv_nine_rooms.13.13.yaml

State

gv_nine_rooms.13x13.state.gif

Observations

gv_nine_rooms.13x13.observation.montage.gif

yaml/gv_teleport.7x7.yaml
State	Observations

yaml/gv_teleport.7x7.yaml

State

gv_teleport.7x7.state.gif

Observations

gv_teleport.7x7.observation.montage.gif

Similar Projects#

The GridVerse project takes heavy inspiration from MiniGrid, and was designed to address a few shortcomings which limited our ability to it fully:

Customization and Configurability: Our design philosophy is primarily based on user customization. We provide interfaces for you to design your own objects, state dynamics, reward functions, observability, etc. We also provide a YAML-based configuration format which will allow you to conveniently share environmens with others.
Time-Invariant Reward Functions: Our reward functions satisfy the formal time-invariance property of Markov decision processes.
Full Observability: We provide a full observability interface which satisfies the formal property of Markov decision processes.
Functional Interface: We provide a functional interface which enables the use of planning methods, e.g., MCTS, POMCP.

MiniWorld is a 3D variant similar to MiniGrid by the same authors.

While GridVerse provides functionality which we found useful and/or necessary for our needs, each project provides something which is unique compared to the others, e.g., MiniGrid includes tasks which involve natural language comprehension, and MiniWorld incorporates a whole third dimension. Make sure to browse all projects to get a clearer picture on which best suits your needs.

Table 1 Project Comparison#
	GridVerse	MiniGrid	MiniWorld
2D Environments	✔	✔
3D Environments			✔
Partial Observability	✔	✔	✔
Full Observability	✔	1
RGB Observability		✔	✔
Natural Language Tasks		✔
Customizable	✔		✔
YAML-Configurable	✔

1: While Minigrid provides FullyObsWrapper, which extends the agent’s observation range, it does not represents true full-state observability.

Citation#

If you use gym-gridverse, please cite it:

@misc{baisero2021gym-gridverse,
    author = {Andrea Baisero and Sammie Katt},
    title = {gym-gridverse: Gridworld domains for fully and partially observable reinforcement learning},
    year = {2021},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/abaisero/gym-gridverse}},
}

Credits#

This package was inspired by MiniGrid, and created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.