Module Agg.String

String aggregations - work on string columns only.

Operations in this module work exclusively with string columns. Null values (None in string option arrays) are handled consistently across all functions.

  • raises Invalid_argument

    if column is not string type.

val min : t -> string -> string option

min df name returns lexicographically smallest string, None if empty.

Uses OCaml's string comparison (which compares byte values). Null values are excluded from comparison.

Time complexity: O(n * m) where n is rows and m is average string length.

val max : t -> string -> string option

max df name returns lexicographically largest string, None if empty.

Uses OCaml's string comparison. Null values are excluded from comparison.

Time complexity: O(n * m) where n is rows and m is average string length.

val concat : t -> string -> ?sep:string -> unit -> string

concat df name ?sep () concatenates all non-null strings with separator.

  • parameter sep

    Separator between strings (default is empty string).

Null values are skipped during concatenation. If all values are null, returns empty string.

Time complexity: O(n * m) where n is rows and m is average string length.

val unique : t -> string -> string array

unique df name returns array of unique non-null values.

The order of unique values is not guaranteed. Null values are excluded from the result.

Time complexity: O(n * m) where n is rows and m is average string length.

val nunique : t -> string -> int

nunique df name returns count of unique non-null values.

Null values are not counted towards the unique count.

Time complexity: O(n * m) where n is rows and m is average string length.

val mode : t -> string -> string option

mode df name returns most frequent non-null value, None if empty.

If multiple values are tied for most frequent, returns one of them (the choice is implementation-dependent). Null values are excluded.

Time complexity: O(n * m) where n is rows and m is average string length.