Module Talon.Row
Row-wise computations using an applicative interface.
The Row module provides a declarative way to express computations over dataframe rows. Rather than imperatively iterating through rows, you compose row-wise operations that are executed efficiently in batch.
The applicative interface allows combining values from multiple columns with type safety. Operations are lazy and only executed when the dataframe is processed with functions like filter_by, map, or with_column.
Performance: Row computations compile to efficient loops that process all rows in a single pass. Use with_columns_map to compute multiple columns simultaneously for better cache locality.
Applicative Interface
The applicative pattern allows combining independent computations. This is more compositional than monadic interfaces and maps naturally to columnar data processing.
val return : 'a -> 'a rowreturn x creates a constant computation returning x for each row.
Example:
let always_true = Row.return true in
let filtered = filter_by df always_true (* no-op filter *)Time complexity: O(1) construction, O(n) when executed over n rows.
apply f x applies a function computation to a value computation.
This is the fundamental applicative operation. Most users will prefer the map and map2 convenience functions.
Example:
let add_one = Row.return (fun x -> x + 1) in
let values = Row.int32 "age" in
let incremented = Row.apply add_one valuesmap x ~f maps a function over a computation.
This is the most common way to transform column values. The function f is applied to each row's value from the computation x.
Example:
let ages = Row.int32 "age" in
let is_adult = Row.map ages ~f:(fun age -> age >= 18l)map2 x y ~f combines two computations with a binary function.
Applies f to corresponding values from both computations. This is efficient for combining columns element-wise.
Example:
let first_name = Row.string "first_name" in
let last_name = Row.string "last_name" in
let full_name = Row.map2 first_name last_name ~f:(fun f l -> f ^ " " ^ l)map3 x y z ~f combines three computations with a ternary function.
Useful for operations involving three columns, such as computing weighted averages or three-way comparisons.
both x y pairs two computations, creating tuples.
Equivalent to map2 x y ~f:(fun a b -> (a, b)) but more explicit about the intent to pair values.
Example:
let coords = Row.both (Row.float64 "x") (Row.float64 "y") in
let distances = Row.map coords ~f:(fun (x, y) -> sqrt (x*.x +. y*.y))Column Accessors
These functions extract values from named columns with type safety. Each accessor verifies the column exists and has the expected type at runtime.
val float32 : string -> float rowfloat32 name extracts float32 values from column.
val float64 : string -> float rowfloat64 name extracts float64 values from column.
val int32 : string -> int32 rowint32 name extracts int32 values from column.
val int64 : string -> int64 rowint64 name extracts int64 values from column.
val string : string -> string rowstring name extracts string values from column.
Null values are converted to empty strings for compatibility with non-option return type.
val bool : string -> bool rowbool name extracts boolean values from column.
Null values are converted to false for compatibility with non-option return type.
val number : string -> float rownumber name extracts numeric values from column, coercing all numeric types (int32/int64/float32/float64) to float.
This is convenient for generic numeric operations where the exact integer vs float distinction doesn't matter. Null values become NaN.
val numbers : string list -> float row listnumbers names creates a list of number accessors for the given column names.
Equivalent to List.map number names but more concise for common use cases like row-wise aggregations across multiple columns.
Example:
let score_cols = ["math"; "science"; "english"] in
let scores = Row.numbers score_cols in
let total = Row.map_list scores ~f:(List.fold_left (+.) 0.)Row Information
val index : int rowindex returns the current row index (0-based).
Useful for creating row numbers, conditional logic based on position, or debugging row-wise computations.
Example:
let with_row_num =
with_column df "row_num" Nx.int32 (Row.map Row.index ~f:Int32.of_int)sequence xs transforms a list of computations into a computation of a list.
Standard applicative operation for collecting values from multiple columns. This is the fundamental operation for working with dynamic lists of columns.
Example:
let numeric_cols = Cols.numeric df in
let values = List.map Row.number numeric_cols in
let all_values = Row.sequence values in
let row_sums = Row.map all_values ~f:(List.fold_left (+.) 0.)all xs is an alias for sequence xs.
More readable name when the intent is to collect all values from a list of computations.
map_list xs ~f sequences computations then maps f over the resulting list.
Equivalent to map (sequence xs) ~f but more convenient. This is the standard pattern for applying reductions across multiple columns.
Example:
let score_computations = Row.numbers ["math"; "science"; "english"] in
let averages = Row.map_list score_computations ~f:(fun scores ->
List.fold_left (+.) 0. scores /. float (List.length scores))fold_list xs ~init ~f folds over a list of computations without creating an intermediate list.
More memory-efficient than map_list for reductions, especially when processing many columns. The fold happens during row iteration rather than creating intermediate lists.
Example:
let score_computations = Row.numbers ["q1"; "q2"; "q3"; "q4"] in
let total_scores = Row.fold_list score_computations ~init:0. ~f:(+.)val float32s : string list -> float row listConvenience builders to avoid writing List.map float32 names etc.
These functions are particularly useful when you need type-specific accessors for multiple columns of the same type.
val float64s : string list -> float row listfloat64s names creates float64 accessors for all column names.
val int32s : string list -> int32 row listint32s names creates int32 accessors for all column names.
val int64s : string list -> int64 row listint64s names creates int64 accessors for all column names.
val bools : string list -> bool row listbools names creates boolean accessors for all column names.
val strings : string list -> string row liststrings names creates string accessors for all column names.
Option-based accessors
These accessors return None for null values instead of using placeholder tensor values. Use these when you need to distinguish genuine values from missing data.
val float32_opt : string -> float option rowfloat32_opt name extracts float32 values as options from column.
Returns None for null values (as indicated by the mask). Use this instead of float32 when you need to distinguish null values from valid data.
val float64_opt : string -> float option rowfloat64_opt name extracts float64 values as options from column.
Returns None for null values (as indicated by the mask).
val int32_opt : string -> int32 option rowint32_opt name extracts int32 values as options from column.
Returns None for null values (as indicated by the mask).
val int64_opt : string -> int64 option rowint64_opt name extracts int64 values as options from column.
Returns None for null values (as indicated by the mask).
val string_opt : string -> string option rowstring_opt name extracts string values as options from column.
Returns None for null values. Use this instead of string when you need to distinguish null strings from empty strings.
val bool_opt : string -> bool option rowbool_opt name extracts boolean values as options from column.
Returns None for null values. Use this instead of bool when you need to distinguish null values from false.
val float32s_opt : string list -> float option row listfloat32s_opt names creates float32 option accessors for all column names.
val float64s_opt : string list -> float option row listfloat64s_opt names creates float64 option accessors for all column names.
val int32s_opt : string list -> int32 option row listint32s_opt names creates int32 option accessors for all column names.
val int64s_opt : string list -> int64 option row listint64s_opt names creates int64 option accessors for all column names.
val bools_opt : string list -> bool option row listbools_opt names creates bool option accessors for all column names.
val strings_opt : string list -> string option row liststrings_opt names creates string option accessors for all column names.
Row-wise Aggregations
Efficient horizontal aggregations across columns within each row. These operations are vectorized using Nx operations for performance.
module Agg : sig ... endRow-wise aggregations using vectorized operations.