Module Talon.Row

Row-wise computations using an applicative interface.

The Row module provides a declarative way to express computations over dataframe rows. Rather than imperatively iterating through rows, you compose row-wise operations that are executed efficiently in batch.

The applicative interface allows combining values from multiple columns with type safety. Operations are lazy and only executed when the dataframe is processed with functions like filter_by, map, or with_column.

Performance: Row computations compile to efficient loops that process all rows in a single pass. Use with_columns_map to compute multiple columns simultaneously for better cache locality.

Applicative Interface

The applicative pattern allows combining independent computations. This is more compositional than monadic interfaces and maps naturally to columnar data processing.

val return : 'a -> 'a row

return x creates a constant computation returning x for each row.

Example:

  let always_true = Row.return true in
  let filtered = filter_by df always_true (* no-op filter *)

Time complexity: O(1) construction, O(n) when executed over n rows.

val apply : ('a -> 'b) row -> 'a row -> 'b row

apply f x applies a function computation to a value computation.

This is the fundamental applicative operation. Most users will prefer the map and map2 convenience functions.

Example:

  let add_one = Row.return (fun x -> x + 1) in
  let values = Row.int32 "age" in
  let incremented = Row.apply add_one values
val map : 'a row -> f:('a -> 'b) -> 'b row

map x ~f maps a function over a computation.

This is the most common way to transform column values. The function f is applied to each row's value from the computation x.

Example:

  let ages = Row.int32 "age" in
  let is_adult = Row.map ages ~f:(fun age -> age >= 18l)
val map2 : 'a row -> 'b row -> f:('a -> 'b -> 'c) -> 'c row

map2 x y ~f combines two computations with a binary function.

Applies f to corresponding values from both computations. This is efficient for combining columns element-wise.

Example:

  let first_name = Row.string "first_name" in
  let last_name = Row.string "last_name" in
  let full_name = Row.map2 first_name last_name ~f:(fun f l -> f ^ " " ^ l)
val map3 : 'a row -> 'b row -> 'c row -> f:('a -> 'b -> 'c -> 'd) -> 'd row

map3 x y z ~f combines three computations with a ternary function.

Useful for operations involving three columns, such as computing weighted averages or three-way comparisons.

val both : 'a row -> 'b row -> ('a * 'b) row

both x y pairs two computations, creating tuples.

Equivalent to map2 x y ~f:(fun a b -> (a, b)) but more explicit about the intent to pair values.

Example:

  let coords = Row.both (Row.float64 "x") (Row.float64 "y") in
  let distances = Row.map coords ~f:(fun (x, y) -> sqrt (x*.x +. y*.y))

Column Accessors

These functions extract values from named columns with type safety. Each accessor verifies the column exists and has the expected type at runtime.

val float32 : string -> float row

float32 name extracts float32 values from column.

  • raises Not_found

    if column doesn't exist.

  • raises Invalid_argument

    if column is not float32 type.

val float64 : string -> float row

float64 name extracts float64 values from column.

  • raises Not_found

    if column doesn't exist.

  • raises Invalid_argument

    if column is not float64 type.

val int32 : string -> int32 row

int32 name extracts int32 values from column.

  • raises Not_found

    if column doesn't exist.

  • raises Invalid_argument

    if column is not int32 type.

val int64 : string -> int64 row

int64 name extracts int64 values from column.

  • raises Not_found

    if column doesn't exist.

  • raises Invalid_argument

    if column is not int64 type.

val string : string -> string row

string name extracts string values from column.

Null values are converted to empty strings for compatibility with non-option return type.

  • raises Not_found

    if column doesn't exist.

  • raises Invalid_argument

    if column is not string type.

val bool : string -> bool row

bool name extracts boolean values from column.

Null values are converted to false for compatibility with non-option return type.

  • raises Not_found

    if column doesn't exist.

  • raises Invalid_argument

    if column is not boolean type.

val number : string -> float row

number name extracts numeric values from column, coercing all numeric types (int32/int64/float32/float64) to float.

This is convenient for generic numeric operations where the exact integer vs float distinction doesn't matter. Null values become NaN.

  • raises Not_found

    if column doesn't exist.

  • raises Invalid_argument

    if column is not a numeric type.

val numbers : string list -> float row list

numbers names creates a list of number accessors for the given column names.

Equivalent to List.map number names but more concise for common use cases like row-wise aggregations across multiple columns.

Example:

  let score_cols = ["math"; "science"; "english"] in
  let scores = Row.numbers score_cols in
  let total = Row.map_list scores ~f:(List.fold_left (+.) 0.)

Row Information

val index : int row

index returns the current row index (0-based).

Useful for creating row numbers, conditional logic based on position, or debugging row-wise computations.

Example:

  let with_row_num =
    with_column df "row_num" Nx.int32 (Row.map Row.index ~f:Int32.of_int)
val sequence : 'a row list -> 'a list row

sequence xs transforms a list of computations into a computation of a list.

Standard applicative operation for collecting values from multiple columns. This is the fundamental operation for working with dynamic lists of columns.

Example:

  let numeric_cols = Cols.numeric df in
  let values = List.map Row.number numeric_cols in
  let all_values = Row.sequence values in
  let row_sums = Row.map all_values ~f:(List.fold_left (+.) 0.)
val all : 'a row list -> 'a list row

all xs is an alias for sequence xs.

More readable name when the intent is to collect all values from a list of computations.

val map_list : 'a row list -> f:('a list -> 'b) -> 'b row

map_list xs ~f sequences computations then maps f over the resulting list.

Equivalent to map (sequence xs) ~f but more convenient. This is the standard pattern for applying reductions across multiple columns.

Example:

  let score_computations = Row.numbers ["math"; "science"; "english"] in
  let averages = Row.map_list score_computations ~f:(fun scores ->
    List.fold_left (+.) 0. scores /. float (List.length scores))
val fold_list : 'a row list -> init:'b -> f:('b -> 'a -> 'b) -> 'b row

fold_list xs ~init ~f folds over a list of computations without creating an intermediate list.

More memory-efficient than map_list for reductions, especially when processing many columns. The fold happens during row iteration rather than creating intermediate lists.

Example:

  let score_computations = Row.numbers ["q1"; "q2"; "q3"; "q4"] in
  let total_scores = Row.fold_list score_computations ~init:0. ~f:(+.)
val float32s : string list -> float row list

Convenience builders to avoid writing List.map float32 names etc.

These functions are particularly useful when you need type-specific accessors for multiple columns of the same type.

val float64s : string list -> float row list

float64s names creates float64 accessors for all column names.

val int32s : string list -> int32 row list

int32s names creates int32 accessors for all column names.

val int64s : string list -> int64 row list

int64s names creates int64 accessors for all column names.

val bools : string list -> bool row list

bools names creates boolean accessors for all column names.

val strings : string list -> string row list

strings names creates string accessors for all column names.

Option-based accessors

These accessors return None for null values instead of using placeholder tensor values. Use these when you need to distinguish genuine values from missing data.

val float32_opt : string -> float option row

float32_opt name extracts float32 values as options from column.

Returns None for null values (as indicated by the mask). Use this instead of float32 when you need to distinguish null values from valid data.

  • raises Not_found

    if column doesn't exist.

  • raises Invalid_argument

    if column is not float32 type.

val float64_opt : string -> float option row

float64_opt name extracts float64 values as options from column.

Returns None for null values (as indicated by the mask).

  • raises Not_found

    if column doesn't exist.

  • raises Invalid_argument

    if column is not float64 type.

val int32_opt : string -> int32 option row

int32_opt name extracts int32 values as options from column.

Returns None for null values (as indicated by the mask).

  • raises Not_found

    if column doesn't exist.

  • raises Invalid_argument

    if column is not int32 type.

val int64_opt : string -> int64 option row

int64_opt name extracts int64 values as options from column.

Returns None for null values (as indicated by the mask).

  • raises Not_found

    if column doesn't exist.

  • raises Invalid_argument

    if column is not int64 type.

val string_opt : string -> string option row

string_opt name extracts string values as options from column.

Returns None for null values. Use this instead of string when you need to distinguish null strings from empty strings.

  • raises Not_found

    if column doesn't exist.

  • raises Invalid_argument

    if column is not string type.

val bool_opt : string -> bool option row

bool_opt name extracts boolean values as options from column.

Returns None for null values. Use this instead of bool when you need to distinguish null values from false.

  • raises Not_found

    if column doesn't exist.

  • raises Invalid_argument

    if column is not boolean type.

val float32s_opt : string list -> float option row list

float32s_opt names creates float32 option accessors for all column names.

val float64s_opt : string list -> float option row list

float64s_opt names creates float64 option accessors for all column names.

val int32s_opt : string list -> int32 option row list

int32s_opt names creates int32 option accessors for all column names.

val int64s_opt : string list -> int64 option row list

int64s_opt names creates int64 option accessors for all column names.

val bools_opt : string list -> bool option row list

bools_opt names creates bool option accessors for all column names.

val strings_opt : string list -> string option row list

strings_opt names creates string option accessors for all column names.

Row-wise Aggregations

Efficient horizontal aggregations across columns within each row. These operations are vectorized using Nx operations for performance.

module Agg : sig ... end

Row-wise aggregations using vectorized operations.