Environments and Wrappers
This guide covers creating custom environments, composing wrappers, rendering, and running vectorized environments.
The Env.t Type
An environment ('obs, 'act, 'render) Env.t is parameterized by its
observation type, action type, and render type. The type system ensures that
policies, wrappers, and collection utilities all agree on these types.
The lifecycle is strict:
- Call
Env.resetto get the initial observation - Call
Env.stepwith an action to advance one timestep - When
terminatedortruncatedis true, callEnv.resetagain - Call
Env.closewhen done (optional, releases resources)
Calling step before reset, or after a terminal step without resetting,
raises Invalid_argument.
Creating Custom Environments
Use Env.create to build an environment from reset and step functions.
Both receive the environment handle as their first argument, which provides
access to spaces and lifecycle state. Random keys are drawn from the implicit
RNG scope (see below).
open Fehu
(* A simple counting environment: agent must choose action 1 *)
let make_counter () =
let count = ref 0 in
Env.create
~id:"Counter-v0"
~observation_space:(Space.Discrete.create 100)
~action_space:(Space.Discrete.create 2)
~reset:(fun _env ?options:_ () ->
count := 0;
Space.Discrete.of_int 0, Info.empty)
~step:(fun _env action ->
let a = Space.Discrete.to_int action in
if a = 1 then incr count else count := 0;
let obs = Space.Discrete.of_int !count in
let terminated = !count >= 10 in
Env.step_result ~observation:obs
~reward:(if a = 1 then 1.0 else -1.0)
~terminated ())
()
RNG Management
Environments draw random keys from the implicit RNG scope established by
Nx.Rng.run. Any call to Space.sample or other random operations inside
reset and step callbacks will use this scope automatically:
let make_noisy_env () =
Env.create
~observation_space:(Space.Box.create
~low:[| 0.0 |] ~high:[| 1.0 |])
~action_space:(Space.Discrete.create 2)
~reset:(fun env ?options:_ () ->
let obs = Space.sample (Env.observation_space env) in
obs, Info.empty)
~step:(fun env _action ->
let obs = Space.sample (Env.observation_space env) in
Env.step_result ~observation:obs
~reward:1.0 ())
()
Wrappers
Wrappers transform an environment's observations, actions, or rewards without modifying the inner environment. They compose: wrap a wrapper to stack transformations.
map_observation
Transform observations from reset and step:
open Fehu
(* Normalize observations to [0, 1] *)
let env = Env.map_observation
~observation_space:(Space.Box.create
~low:[| 0.0; 0.0; 0.0; 0.0 |]
~high:[| 1.0; 1.0; 1.0; 1.0 |])
~f:(fun obs info ->
(* obs is a float32 tensor, transform it *)
let normalized = normalize_fn obs in
normalized, info)
env
The function f receives both the observation and the info dictionary,
returning both. This allows wrappers to pass metadata through info.
map_action
Transform actions before they reach the inner environment:
(* Remap discrete actions *)
let env = Env.map_action
~action_space:(Space.Discrete.create 3)
~f:(fun act ->
(* Map from 3-action to 2-action space *)
let i = Space.Discrete.to_int act in
Space.Discrete.of_int (if i >= 2 then 1 else i))
env
map_reward
Transform rewards after each step:
(* Scale rewards *)
let env = Env.map_reward
~f:(fun ~reward ~info -> reward *. 0.01, info)
env
clip_action
Clamp continuous actions to the action space bounds. The wrapper relaxes the action space to accept any float values, then clips before forwarding:
(* Works with Box action spaces *)
let env = Env.clip_action env
clip_observation
Clamp observations to specified bounds:
let env = Env.clip_observation
~low:[| -1.0; -1.0 |]
~high:[| 1.0; 1.0 |]
env
time_limit
Enforce a maximum episode length. When the limit is reached, the step's
truncated flag is set to true:
let env = Env.time_limit ~max_episode_steps:200 env
Custom Wrappers with Env.wrap
For transformations that need full control over reset and step, use Env.wrap.
The wrapper shares the inner environment's lifecycle (RNG, closed flag, reset
flag):
open Fehu
(* A wrapper that tracks episode reward *)
let with_episode_reward env =
let episode_reward = ref 0.0 in
Env.wrap
~observation_space:(Env.observation_space env)
~action_space:(Env.action_space env)
~reset:(fun inner ?options () ->
episode_reward := 0.0;
Env.reset inner ?options ())
~step:(fun inner action ->
let s = Env.step inner action in
episode_reward := !episode_reward +. s.reward;
let info =
if s.terminated || s.truncated then
Info.set "episode_reward"
(Info.float !episode_reward) s.info
else s.info
in
{ s with info })
env
Rendering
Environments support optional rendering via render modes. Pass
~render_mode at creation time:
let env = Fehu_envs.Grid_world.make
~render_mode:`Ansi ()
let _obs, _info = Env.reset env ()
match Env.render env with
| Some (Text s) -> print_endline s
| _ -> ()
Render Rollout
Render.rollout runs a policy and feeds rendered frames to a sink function:
open Fehu
(* Collect rendered frames from a policy rollout *)
let frames = ref [] in
Render.rollout env
~policy:(fun _obs -> Space.sample (Env.action_space env))
~steps:100
~sink:(fun img -> frames := img :: !frames)
()
Recording with on_render
Render.on_render wraps an environment so that every frame after reset and
step is passed to a sink:
let env = Render.on_render
~sink:(fun img -> save_frame img)
env
Vectorized Environments
Vec_env runs multiple environment instances with batched inputs and outputs.
Terminated or truncated episodes are automatically reset.
open Fehu
let () = Nx.Rng.run ~seed:42 @@ fun () ->
(* Create 4 parallel environments *)
let envs = List.init 4 (fun _ -> Fehu_envs.Cartpole.make ()) in
let vec = Vec_env.create envs in
let n = Vec_env.num_envs vec in (* 4 *)
(* Reset all environments *)
let _observations, _infos = Vec_env.reset vec () in
(* Step all environments with an array of actions *)
let actions = Array.init n (fun _ -> Space.Discrete.of_int 0) in
let _s = Vec_env.step vec actions in
(* _s.observations, _s.rewards, _s.terminated, _s.truncated *)
(* Clean up *)
Vec_env.close vec
All environments must have structurally identical observation and action spaces
(checked via Space.equal_spec). On terminal steps, the original terminal
observation is stored in the step info under "final_observation" as a packed
Value.t, and the terminal info under "final_info".
Next Steps
- Getting Started -- installation, environments, spaces, step loop
- Collection and Evaluation -- trajectory collection, replay buffers, GAE, evaluation