fehu ᚠ
Gymnasium's ease. Stable Baselines' algorithms. OCaml's type safety.
why fehu?
type-safe environments
Environment APIs catch shape mismatches at compile time. Never debug observation space errors at runtime.
functional RL
Agents are immutable. Experience replay is just data. Everything composes naturally.
built on rune
Policy gradients and Q-learning with automatic differentiation. Neural networks from Kaun.
standard benchmarks
CartPole, MountainCar, and more. Compare directly with Gymnasium and Stable Baselines3.
show me the code
Python (Gymnasium)
import gymnasium as gym # Create environment env = gym.make('CartPole-v1') obs, info = env.reset() # Run episode done = False while not done: action = env.action_space.sample() obs, reward, term, trunc, info = \ env.step(action) done = term or trunc
FEHU
open Fehu (* Create environment *) let rng = Rune.Rng.key 42 in let env = Fehu_envs.Cartpole.make ~rng () (* Run episode *) let obs, _info = Env.reset env () in let done_ = ref false in while not !done_ do let action = Space.sample (Env.action_space env) in let t = Env.step env action in done_ := t.terminated || t.truncated done
train an agent
open Fehu open Kaun (* Create policy network *) let policy = Layer.sequential [ Layer.linear ~in_features:4 ~out_features:128 (); Layer.relu (); Layer.linear ~in_features:128 ~out_features:2 (); ] (* Train with REINFORCE *) let agent = Fehu_algorithms.Reinforce.create ~policy_network:policy ~n_actions:2 ~rng { learning_rate = 0.001; gamma = 0.99; ... } let agent = Fehu_algorithms.Reinforce.learn agent ~env ~total_timesteps:100_000 ~callback:(fun ~iteration ~metrics -> Printf.printf "Episode %d: Return = %.2f\n" iteration metrics.episode_return; true) ()
what's implemented
Fehu is in active development. Here's what works today:
environments
- ✓ CartPole-v1
- ✓ MountainCar-v0
- ✓ GridWorld (custom)
- ✓ RandomWalk (custom)
algorithms
- ✓ REINFORCE (policy gradient)
- ✓ DQN (deep Q-network)
- ✓ Experience replay buffers
- ⏳ PPO, A2C (coming soon)
design principles
Type-safe environments. Observation and action spaces are strongly typed. The compiler catches dimension mismatches before you run your training loop.
Functional agents. Agents are immutable values. Policy updates return new agents. Experience replay is just an array of transitions.
Leverage the Raven ecosystem. Use Kaun for neural networks, Rune for autodiff, and Nx for tensor operations. Fehu is thin glue code over powerful primitives.
get started
Fehu isn't released yet. For now, check out the documentation to learn more.
When it's ready:
# Install opam install fehu # Train CartPole open Fehu let rng = Rune.Rng.key 42 in let env = Fehu_envs.Cartpole.make ~rng () (* Create and train your agent *) let agent = train_cartpole env ~episodes:500