Operations Reference
Every operation in sowilo, organized by category. All functions operate on Rune float32 tensors with values in [0, 1] unless otherwise noted.
Type Conversion and Preprocessing
to_float
Converts a tensor to float32 and scales to [0, 1] by dividing by 255.
let img = Nx_io.load_image "photo.png" |> Sowilo.to_float
(* uint8 [0, 255] -> float32 [0.0, 1.0] *)
to_uint8
Scales from [0, 1] to [0, 255] and casts to uint8. Values are clipped to [0, 1] before scaling.
let result = Sowilo.to_uint8 processed_img
(* float32 [0.0, 1.0] -> uint8 [0, 255] *)
normalize
Per-channel normalization: (img - mean) / std. The mean and std lists
must match the channel dimension length.
(* ImageNet normalization *)
let normalized =
Sowilo.normalize
~mean:[0.485; 0.456; 0.406]
~std:[0.229; 0.224; 0.225]
img
Raises Invalid_argument if mean or std length does not match the
number of channels.
threshold
Binary thresholding: returns 1.0 where the image exceeds the threshold, 0.0 elsewhere.
(* Pixels > 0.5 become 1.0, rest become 0.0 *)
let binary = Sowilo.threshold 0.5 gray_img
Color Space Conversion and Adjustment
to_grayscale
Converts RGB to single-channel grayscale using ITU-R BT.601 weights:
0.299 * R + 0.587 * G + 0.114 * B. Input must have C >= 3. Output has
C = 1.
let gray = Sowilo.to_grayscale rgb_img
rgb_to_hsv / hsv_to_rgb
Convert between RGB and HSV color spaces. H is in [0, 1] (normalized from [0, 360]), S and V are in [0, 1].
let hsv = Sowilo.rgb_to_hsv img
(* ... manipulate hue, saturation, value channels ... *)
let rgb = Sowilo.hsv_to_rgb hsv
adjust_brightness
Scales pixel values by a factor and clips to [0, 1].
let brighter = Sowilo.adjust_brightness 1.3 img (* 30% brighter *)
let darker = Sowilo.adjust_brightness 0.7 img (* 30% darker *)
adjust_contrast
Adjusts contrast around the per-channel mean. Factor 0 produces solid gray, 1 is the original image.
let high_contrast = Sowilo.adjust_contrast 1.5 img
let low_contrast = Sowilo.adjust_contrast 0.5 img
adjust_saturation
Adjusts color saturation via HSV. Factor 0 produces grayscale, 1 is the original image.
let vivid = Sowilo.adjust_saturation 1.5 img
let muted = Sowilo.adjust_saturation 0.5 img
adjust_hue
Rotates hue by a delta in [-0.5, 0.5], corresponding to a full rotation of the hue circle.
let warm = Sowilo.adjust_hue 0.05 img
let cool = Sowilo.adjust_hue (-0.05) img
adjust_gamma
Applies gamma correction: img ** gamma. Values less than 1.0 brighten,
greater than 1.0 darken.
let brightened = Sowilo.adjust_gamma 0.5 img
let darkened = Sowilo.adjust_gamma 2.0 img
invert
Inverts the image: 1.0 - img.
let negative = Sowilo.invert img
Geometric Transforms
resize
Resizes to target dimensions. Defaults to bilinear interpolation. Casts to float32 internally for bilinear mode.
let small = Sowilo.resize ~height:224 ~width:224 img
let nearest = Sowilo.resize ~interpolation:Nearest ~height:64 ~width:64 img
Raises Invalid_argument if height or width is not positive.
crop
Extracts a rectangular region starting at (y, x) with the given dimensions.
let region = Sowilo.crop ~y:50 ~x:100 ~height:200 ~width:300 img
Raises Invalid_argument if the region exceeds image bounds.
center_crop
Crops a centered rectangle of the given size.
let centered = Sowilo.center_crop ~height:200 ~width:200 img
Raises Invalid_argument if the crop size exceeds image dimensions.
hflip / vflip
Flip horizontally (left to right) or vertically (top to bottom).
let mirrored = Sowilo.hflip img
let upside_down = Sowilo.vflip img
rotate90
Rotates by k * 90 degrees counter-clockwise. k defaults to 1. Negative values rotate clockwise.
let rotated_90 = Sowilo.rotate90 img (* 90 CCW *)
let rotated_180 = Sowilo.rotate90 ~k:2 img (* 180 *)
let rotated_cw = Sowilo.rotate90 ~k:(-1) img (* 90 CW *)
pad
Zero-pads the spatial dimensions. The tuple specifies (top, bottom, left,
right) padding. An optional ~value parameter sets the fill value (defaults
to 0.0).
let padded = Sowilo.pad (10, 10, 20, 20) img
let white_padded = Sowilo.pad ~value:1.0 (5, 5, 5, 5) img
Spatial Filtering
gaussian_blur
Isotropic Gaussian blur using separable convolution. sigma is required.
ksize defaults to 2 * ceil(3 * sigma) + 1, capturing 99.7% of the
distribution.
let blurred = Sowilo.gaussian_blur ~sigma:1.0 img
let blurred_5x5 = Sowilo.gaussian_blur ~sigma:1.5 ~ksize:5 img
Raises Invalid_argument if ksize is even or not positive.
box_blur
Applies a ksize x ksize averaging filter.
let averaged = Sowilo.box_blur ~ksize:3 img
let smooth = Sowilo.box_blur ~ksize:7 img
Raises Invalid_argument if ksize is not positive.
median_blur
Applies a median filter. Not differentiable: uses sort internally, gradient is zero almost everywhere.
let denoised = Sowilo.median_blur ~ksize:3 img
Raises Invalid_argument if ksize is not a positive odd integer.
filter2d
Applies a custom 2D convolution kernel of shape [kH; kW]. Applied
independently to each channel with Same padding.
(* Sharpen kernel *)
let kernel = Nx.create Nx.Float32 [| 3; 3 |]
[| 0.; -1.; 0.; -1.; 5.; -1.; 0.; -1.; 0. |]
let sharpened = Sowilo.filter2d kernel img
unsharp_mask
Sharpens by subtracting a Gaussian blur:
img + amount * (img - gaussian_blur ~sigma img). amount defaults to 1.0.
let sharp = Sowilo.unsharp_mask ~sigma:1.0 img
let very_sharp = Sowilo.unsharp_mask ~sigma:1.0 ~amount:2.0 img
Morphological Operations
structuring_element
Creates a structuring element of the given shape and size. The size is a pair of positive odd integers (height, width).
Three shapes are available:
Rect-- full rectangleCross-- cross-shaped elementEllipse-- elliptical element
let rect = Sowilo.structuring_element Rect (5, 5)
let cross = Sowilo.structuring_element Cross (3, 3)
let ellipse = Sowilo.structuring_element Ellipse (7, 7)
Raises Invalid_argument if height or width is not positive or not odd.
erode / dilate
Erosion replaces each pixel with the minimum over the kernel-shaped neighborhood. Dilation replaces with the maximum.
let kernel = Sowilo.structuring_element Rect (5, 5) in
let eroded = Sowilo.erode ~kernel img
let dilated = Sowilo.dilate ~kernel img
opening / closing
Opening (erode then dilate) removes small bright regions. Closing (dilate then erode) fills small dark regions.
let kernel = Sowilo.structuring_element Rect (5, 5) in
let opened = Sowilo.opening ~kernel binary_img
let closed = Sowilo.closing ~kernel binary_img
morphological_gradient
The difference between dilation and erosion: dilate - erode. Highlights
edges.
let kernel = Sowilo.structuring_element Rect (3, 3) in
let edges = Sowilo.morphological_gradient ~kernel img
Edge Detection
All edge detection operations require grayscale input (C = 1).
sobel
Computes Sobel gradients. Returns a (gx, gy) tuple where gx is the
horizontal gradient and gy is the vertical gradient. ksize defaults
to 3.
let gx, gy = Sowilo.sobel gray in
let gx5, gy5 = Sowilo.sobel ~ksize:5 gray in
(* Compute gradient magnitude *)
let magnitude =
Nx.sqrt (Nx.add (Nx.mul gx gx) (Nx.mul gy gy))
scharr
Computes Scharr gradients, which are more rotationally accurate than Sobel.
Returns a (gx, gy) tuple.
let gx, gy = Sowilo.scharr gray
laplacian
Computes the Laplacian (sum of second spatial derivatives). ksize defaults
to 3.
let lap = Sowilo.laplacian gray
let lap5 = Sowilo.laplacian ~ksize:5 gray
canny
Canny edge detector. Returns 1.0 for edge pixels, 0.0 otherwise. low and
high are hysteresis thresholds (in [0, 1] since images are float32 in
[0, 1]). sigma controls the initial Gaussian blur and defaults to 1.4.
Not differentiable: uses non-maximum suppression and hysteresis thresholding.
let edges = Sowilo.canny ~low:0.2 ~high:0.6 gray
let tight = Sowilo.canny ~low:0.3 ~high:0.7 ~sigma:1.0 gray
Differentiability Summary
Most operations are differentiable because they are built from standard Rune tensor operations. The two exceptions are:
| Operation | Differentiable | Reason |
|---|---|---|
median_blur |
No | Uses sort; gradient is zero almost everywhere |
canny |
No | Uses non-maximum suppression and hysteresis thresholding |
All other operations (filters, color transforms, geometric transforms,
morphology, threshold, sobel, scharr, laplacian) support Rune.grad.