Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] reward_lose not correctly set at timeout #3

Open
samvelyan opened this issue Feb 12, 2025 · 1 comment
Open

[BUG] reward_lose not correctly set at timeout #3

samvelyan opened this issue Feb 12, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@samvelyan
Copy link
Owner

This is a migration copy of the original bug report here: facebookresearch#103.


When setting max_episode_steps, while gym correctly triggers a reset, MiniHack (and perhaps even NLE) does not set StepStatus.ABORTED.

As a consequence, the reward for Timeout is not reward_lose.

To Reproduce

import gym
import gym.vector
import minihack

MAX_STEPS = 20

env = gym.make(
    "MiniHack-KeyRoom-S5-v0",
    reward_lose=-1.0,
    max_episode_steps=MAX_STEPS,
)
timestep = env.reset()
env.render()


for i in range(MAX_STEPS):
    timestep = env.step(3) 
    reward = timestep[1]
    info = timestep[-1]
    print(reward, info["end_status"])

assert int(info["end_status"]) == -1
Expected behavior
int(info["end_status"]) should be -1

Potential reasons

gym.make accepts max_episode_steps as argument.
For this reason, max_episode_steps is not included in kwargs and it is not passed through to the MiniHack constructor.

@samvelyan samvelyan added the bug Something isn't working label Feb 12, 2025
@samvelyan
Copy link
Owner Author

Copying a relevant comment from the same Issue.


You can temporarily patch it with this

class PatchTimeoutWrapper(gym.Wrapper):
    def __init__(self, env: gym.Env):
        super().__init__(env)
        self.unwrapped._max_episode_steps = self._max_episode_steps

And use:

env = gym.make(
    "MiniHack-KeyRoom-S5-v0",
    reward_lose=-1.0,
    max_episode_steps=MAX_STEPS,
)
env = PatchTimeoutWrapper(env)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant