fehu ᚠ

Gymnasium's ease. Stable Baselines' algorithms. OCaml's type safety.

why fehu?

type-safe environments

Environment APIs catch shape mismatches at compile time. Never debug observation space errors at runtime.

functional RL

Agents are immutable. Experience replay is just data. Everything composes naturally.

built on rune

Policy gradients and Q-learning with automatic differentiation. Neural networks from Kaun.

standard benchmarks

CartPole, MountainCar, and more. Compare directly with Gymnasium and Stable Baselines3.

show me the code

Python (Gymnasium)

import gymnasium as gym

# Create environment
env = gym.make('CartPole-v1')
obs, info = env.reset()

# Run episode
done = False
while not done:
    action = env.action_space.sample()
    obs, reward, term, trunc, info = \
        env.step(action)
    done = term or trunc

FEHU

open Fehu

(* Create environment *)
let rng = Rune.Rng.key 42 in
let env = Fehu_envs.Cartpole.make ~rng ()

(* Run episode *)
let obs, _info = Env.reset env () in
let done_ = ref false in
while not !done_ do
  let action = Space.sample (Env.action_space env) in
  let t = Env.step env action in
  done_ := t.terminated || t.truncated
done

train an agent

open Fehu
open Kaun

(* Create policy network *)
let policy = Layer.sequential [
  Layer.linear ~in_features:4 ~out_features:128 ();
  Layer.relu ();
  Layer.linear ~in_features:128 ~out_features:2 ();
]

(* Train with REINFORCE *)
let agent = Fehu_algorithms.Reinforce.create
  ~policy_network:policy
  ~n_actions:2
  ~rng
  { learning_rate = 0.001; gamma = 0.99; ... }

let agent = Fehu_algorithms.Reinforce.learn
  agent ~env ~total_timesteps:100_000
  ~callback:(fun ~iteration ~metrics ->
    Printf.printf "Episode %d: Return = %.2f\n"
      iteration metrics.episode_return; true)
  ()

what's implemented

Fehu is in active development. Here's what works today:

environments

✓ CartPole-v1
✓ MountainCar-v0
✓ GridWorld (custom)
✓ RandomWalk (custom)

algorithms

✓ REINFORCE (policy gradient)
✓ DQN (deep Q-network)
✓ Experience replay buffers
⏳ PPO, A2C (coming soon)

design principles

Type-safe environments. Observation and action spaces are strongly typed. The compiler catches dimension mismatches before you run your training loop.

Functional agents. Agents are immutable values. Policy updates return new agents. Experience replay is just an array of transitions.

Leverage the Raven ecosystem. Use Kaun for neural networks, Rune for autodiff, and Nx for tensor operations. Fehu is thin glue code over powerful primitives.

get started

Fehu isn't released yet. For now, check out the documentation to learn more.

When it's ready:

# Install
opam install fehu

# Train CartPole
open Fehu

let rng = Rune.Rng.key 42 in
let env = Fehu_envs.Cartpole.make ~rng ()

(* Create and train your agent *)
let agent = train_cartpole env ~episodes:500