Module Row.Agg

Row-wise aggregations using vectorized operations.

These functions compute aggregations horizontally across columns for each row, similar to pandas' axis=1 operations or Polars' horizontal functions. They are much more efficient than using Row.map_list for common reductions.

Null handling:

  • Float columns: NaN values are treated as nulls
  • Integer columns: Int32.min_int and Int64.min_int are treated as nulls
  • String/Bool columns: None values are treated as nulls
  • When skipna=true (default): nulls are excluded from computations
  • When skipna=false: any null in a row produces a null result

Performance: These functions use vectorized Nx operations internally, which are significantly faster than row-by-row iteration for large datasets.

val sum : ?skipna:bool -> t -> names:string list -> Col.t

sum ?skipna df ~names computes row-wise sum across specified columns.

Uses vectorized Nx operations for efficiency. All specified columns must be numeric (float or integer types).

  • parameter skipna

    If true (default), skip null values. If false, any null in a row makes the entire row sum null.

  • raises Invalid_argument

    if any column is not numeric or doesn't exist.

Example:

  let df = create [("a", Col.float64 [|1.; 2.; 3.|]);
                  ("b", Col.float64 [|4.; 5.; 6.|])] in
  let sums = Row.Agg.sum df ~names:["a"; "b"] in
  (* Result: Col.float64 [|5.; 7.; 9.|] *)
val mean : ?skipna:bool -> t -> names:string list -> Col.t

mean ?skipna df ~names computes row-wise mean across specified columns.

The result is always a float64 column regardless of input types. When skipna=true, the divisor is the count of non-null values per row.

  • parameter skipna

    If true (default), exclude nulls from mean calculation.

  • raises Invalid_argument

    if any column is not numeric or doesn't exist.

val min : ?skipna:bool -> t -> names:string list -> Col.t

min ?skipna df ~names computes row-wise minimum across specified columns.

The result column preserves the most precise numeric type among inputs (e.g., if any input is float64, result is float64).

  • parameter skipna

    If true (default), ignore nulls when finding minimum.

  • raises Invalid_argument

    if any column is not numeric or doesn't exist.

val max : ?skipna:bool -> t -> names:string list -> Col.t

max ?skipna df ~names computes row-wise maximum across specified columns.

The result column preserves the most precise numeric type among inputs.

  • parameter skipna

    If true (default), ignore nulls when finding maximum.

  • raises Invalid_argument

    if any column is not numeric or doesn't exist.

val dot : t -> names:string list -> weights:float array -> Col.t

dot df ~names ~weights computes weighted sum (dot product) across columns.

Computes the dot product of row values with the given weights. Equivalent to pandas' df[cols].dot(weights).

  • parameter weights

    Must have the same length as names.

  • raises Invalid_argument

    if lengths don't match or columns aren't numeric.

Example:

  let portfolio_weights = [|0.6; 0.3; 0.1|] in
  let weighted_returns = Row.Agg.dot df
    ~names:["stock_a"; "stock_b"; "stock_c"]
    ~weights:portfolio_weights
val all : t -> names:string list -> Col.t

all df ~names returns true if all values in the row are true.

All specified columns must be boolean type. Null values are treated as false for the purpose of the "all" operation.

  • raises Invalid_argument

    if any column is not boolean or doesn't exist.

val any : t -> names:string list -> Col.t

any df ~names returns true if any value in the row is true.

All specified columns must be boolean type. Null values are treated as false for the purpose of the "any" operation.

  • raises Invalid_argument

    if any column is not boolean or doesn't exist.