Module Agg.Float

Float aggregations - work on any numeric column (int or float types).

Values are coerced to float for computation. All functions in this module will accept int32, int64, float32, or float64 columns and return float results.

  • raises Invalid_argument

    if column is not numeric.

val sum : t -> string -> float

sum df name returns the sum as float.

Works on any numeric column type (int32, int64, float32, float64). Null values are excluded from the sum calculation.

Time complexity: O(n) where n is the number of rows.

  • raises Invalid_argument

    if column is not numeric or doesn't exist.

val mean : t -> string -> float

mean df name returns the arithmetic mean.

Computes sum divided by count of non-null values. Returns NaN if all values are null or the column is empty.

Time complexity: O(n) where n is the number of rows.

val std : t -> string -> float

std df name returns the population standard deviation.

Computes standard deviation over non-null values, dividing by n. Returns NaN if no non-null values exist.

Time complexity: O(n) - requires two passes over the data.

val var : t -> string -> float

var df name returns the population variance.

Computes variance over non-null values, dividing by n. The standard deviation is the square root of this value.

Time complexity: O(n) - requires two passes over the data.

val min : t -> string -> float option

min df name returns minimum value, None if empty or all nulls.

Null values are ignored during comparison.

Time complexity: O(n) where n is the number of rows.

val max : t -> string -> float option

max df name returns maximum value, None if empty or all nulls.

Null values are ignored during comparison.

Time complexity: O(n) where n is the number of rows.

val median : t -> string -> float

median df name returns the median (50th percentile).

For even-length arrays, returns the average of the two middle values. Null values are excluded before sorting.

Time complexity: O(n log n) due to sorting requirement.

val quantile : t -> string -> q:float -> float

quantile df name ~q returns the q-th quantile where 0 <= q <= 1.

Uses linear interpolation between data points. q=0.5 gives the median, q=0.25 gives the first quartile, etc.

  • parameter q

    Quantile level between 0.0 and 1.0 inclusive.

  • raises Invalid_argument

    if q is outside 0, 1.

Time complexity: O(n log n) due to sorting requirement.