Module Fehu.Env
Reinforcement learning environments.
An environment defines an interactive loop: the agent observes, acts, and receives a reward. The environment enforces a lifecycle: reset must be called before step, and a terminated or truncated episode requires another reset.
Step results
type 'obs step = {observation : 'obs;(*The observation after the action.
*)reward : float;(*Scalar reward for the transition.
*)terminated : bool;(*
*)truewhen the episode ends naturally.truncated : bool;(*
*)truewhen the episode is cut short.info : Info.t;(*Auxiliary metadata.
*)
}The type for step results.
val step_result :
observation:'obs ->
?reward:float ->
?terminated:bool ->
?truncated:bool ->
?info:Info.t ->
unit ->
'obs stepstep_result ~observation () constructs a step result. reward defaults to 0., terminated and truncated default to false, info defaults to Info.empty.
Render modes
Rendering modes supported by environments.
val render_mode_to_string : render_mode -> stringrender_mode_to_string m is the string representation of m.
Environments
val create :
?id:string ->
observation_space:'obs Space.t ->
action_space:'act Space.t ->
?render_mode:render_mode ->
?render_modes:string list ->
reset:(('obs, 'act, 'render) t -> ?options:Info.t -> unit -> 'obs * Info.t) ->
step:(('obs, 'act, 'render) t -> 'act -> 'obs step) ->
?render:(unit -> 'render option) ->
?close:(unit -> unit) ->
unit ->
('obs, 'act, 'render) tcreate ~observation_space ~action_space ~reset ~step () makes a new environment.
reset and step receive the environment handle as first argument. Random keys for stochastic behavior are drawn from the implicit RNG scope.
render_modes lists the supported render mode strings. When render_mode is provided, it must appear in render_modes.
Raises Invalid_argument if render_mode is not in render_modes.
val wrap :
?id:string ->
observation_space:'obs2 Space.t ->
action_space:'act2 Space.t ->
?render_mode:render_mode ->
reset:
(('obs1, 'act1, 'render) t -> ?options:Info.t -> unit -> 'obs2 * Info.t) ->
step:(('obs1, 'act1, 'render) t -> 'act2 -> 'obs2 step) ->
?render:(('obs1, 'act1, 'render) t -> 'render option) ->
?close:(('obs1, 'act1, 'render) t -> unit) ->
('obs1, 'act1, 'render) t ->
('obs2, 'act2, 'render) twrap ~observation_space ~action_space ~reset ~step inner builds a new environment that wraps inner. The wrapper shares inner's lifecycle state (RNG, closed flag, reset flag). All guards (closed, needs-reset, space validation) are enforced by reset/step, so wrappers get them automatically.
The render type is preserved from inner. render_mode defaults to inner's.
Accessors
val id : ('obs, 'act, 'render) t -> string optionid env is the environment's identifier, if any.
observation_space env is the space of valid observations.
action_space env is the space of valid actions.
val render_mode : ('obs, 'act, 'render) t -> render_mode optionrender_mode env is the render mode chosen at construction, if any.
Lifecycle
val closed : ('obs, 'act, 'render) t -> boolclosed env is true iff the environment has been closed.
reset env () resets the environment to an initial state.
Raises Invalid_argument if env is closed, or if the reset function produces an observation outside observation_space.
step env action advances the environment by one timestep.
Raises Invalid_argument if env is closed, if no reset has been called since the last terminal step, if action is outside action_space, or if the step function produces an observation outside observation_space.
val render : ('obs, 'act, 'render) t -> 'render optionrender env produces a visualization of the current state.
Raises Invalid_argument if env is closed.
val close : ('obs, 'act, 'render) t -> unitclose env releases resources held by the environment. Subsequent calls are no-ops.
Wrappers
val map_action :
action_space:'act2 Space.t ->
f:('act2 -> 'act1) ->
('obs, 'act1, 'render) t ->
('obs, 'act2, 'render) tmap_action ~action_space ~f env transforms actions before passing them to the inner environment.
val map_reward :
f:(reward:float -> info:Info.t -> float * Info.t) ->
('obs, 'act, 'render) t ->
('obs, 'act, 'render) tmap_reward ~f env transforms rewards after each step.
Clipping
val clip_action :
('obs, Space.Box.element, 'render) t ->
('obs, Space.Box.element, 'render) tclip_action env clamps continuous actions to the bounds of the inner environment's Space.spec.Box action space. The wrapper exposes a relaxed space that accepts any float values, then clips before forwarding.
val clip_observation :
low:float array ->
high:float array ->
(Space.Box.element, 'act, 'render) t ->
(Space.Box.element, 'act, 'render) tclip_observation ~low ~high env clamps observations to [low; high]. The wrapper's observation space is the intersection of the provided bounds and the inner space's bounds.
Raises Invalid_argument if low and high differ in length or do not match the inner space's dimensionality.
Limits
time_limit ~max_episode_steps env enforces a maximum episode length. When the limit is reached the step's truncated flag is set to true. The counter resets on reset.
Raises Invalid_argument if max_episode_steps <= 0.