Module Fehu.Eval

Policy evaluation.

Runs a deterministic or stochastic policy over multiple episodes and reports summary statistics.

Types

type stats = {
  1. mean_reward : float;
    (*

    Mean total reward across episodes.

    *)
  2. std_reward : float;
    (*

    Standard deviation of total rewards.

    *)
  3. mean_length : float;
    (*

    Mean episode length in steps.

    *)
  4. n_episodes : int;
    (*

    Number of episodes evaluated.

    *)
}

The type for evaluation statistics.

Running

val run : ('obs, 'act, 'render) Env.t -> policy:('obs -> 'act) -> ?n_episodes:int -> ?max_steps:int -> unit -> stats

run env ~policy () evaluates policy over n_episodes (default 10) episodes of at most max_steps (default 1000) steps each. The environment is reset between episodes.