ᚠ fehu
Reinforcement learning for OCaml
Fehu is a reinforcement learning environment toolkit for OCaml. It provides type-safe environments, composable wrappers, trajectory collection, replay buffers, GAE computation, policy evaluation, and vectorized environments.
Fehu follows the Gymnasium interface pattern: environments expose reset and
step with typed observation and action spaces. Wrappers compose freely.
Collection and evaluation utilities handle the plumbing between environments
and training loops.
Features
- Type-safe environments: observation and action spaces are encoded in the type system
- Rich space types: Discrete, Box, Multi_binary, Multi_discrete, Tuple, Dict, Sequence, Text
- Composable wrappers: map_observation, map_action, map_reward, clip_action, clip_observation, time_limit
- Trajectory collection: rollout and episode collection in structure-of-arrays form
- Replay buffers: fixed-capacity circular buffer with uniform random sampling
- GAE: generalized advantage estimation with proper terminated/truncated handling
- Policy evaluation: run a policy over episodes and get mean/std reward statistics
- Vectorized environments: run multiple environments with batched step and auto-reset
- Built-in environments: CartPole, MountainCar, GridWorld, RandomWalk
Quick Start
Create an environment, run a random agent, and evaluate:
open Fehu
let () = Nx.Rng.run ~seed:42 @@ fun () ->
let env = Fehu_envs.Cartpole.make () in
(* Run one episode *)
let _obs, _info = Env.reset env () in
let done_ = ref false in
let total_reward = ref 0.0 in
while not !done_ do
let act = Space.sample (Env.action_space env) in
let s = Env.step env act in
total_reward := !total_reward +. s.reward;
done_ := s.terminated || s.truncated
done;
(* Evaluate over 10 episodes *)
let _stats = Eval.run env
~policy:(fun _obs -> Space.sample (Env.action_space env))
~n_episodes:10 ()
in ()
Next Steps
- Getting Started -- installation, environments, spaces, step loop
- Environments and Wrappers -- custom environments, wrappers, rendering, vectorized environments
- Collection and Evaluation -- trajectory collection, replay buffers, GAE, evaluation