jon.recoil.org

Module NpySource

Read numpy .npy files.

A portable OCaml library for parsing the numpy binary format. Data is returned as flat Bigarray.Array1 values in C layout; use shape to interpret the dimensions.

Supported dtypes

Example

  let data = In_channel.with_open_bin "embeddings.npy" In_channel.input_all in
  match Npy.of_string data with
  | Error msg -> failwith msg
  | Ok t ->
    let shape = Npy.shape t in
    Printf.printf "shape: %s\n"
      (String.concat "x" (Array.to_list (Array.map string_of_int shape)));
    match Npy.data_float32 t with
    | Some ba -> Printf.printf "first value: %f\n" (Bigarray.Array1.get ba 0)
    | None -> Printf.printf "not a float32 array\n"

Types

Sourcetype _ dtype =
  1. | Int8 : int Npy.dtype
  2. | Uint8 : int Npy.dtype
  3. | Float32 : float Npy.dtype
  4. | Float64 : float Npy.dtype

Element types supported by this library.

Sourcetype t

A parsed .npy file.

Parsing

Sourceval of_string : string -> (Npy.t, string) Stdlib.result

Parse a .npy file from its complete contents as a string. Supports format versions 1.0 and 2.0. Returns Error msg if the magic bytes are wrong, the header is malformed, or the dtype is unsupported.

Metadata

Sourceval shape : Npy.t -> int array

The shape of the array. For example, a 10x8 matrix returns [|10; 8|]. A scalar returns [||].

Sourceval fortran_order : Npy.t -> bool

Whether the data is stored in Fortran (column-major) order. Most numpy files use C order (false).

Data access

Each accessor returns Some bigarray if the dtype matches, or None otherwise. The returned Bigarray.Array1 is a flat (1-dimensional) view of the data in row-major order; use shape to interpret the dimensions.

For multi-dimensional indexing, compute the flat index as: row * cols + col (for 2D) or i * (d1 * d2) + j * d2 + k (for 3D).

Access data as signed 8-bit integers (numpy dtype |i1).

Access data as unsigned 8-bit integers (numpy dtype |u1).

Access data as 32-bit floats (numpy dtype <f4, little-endian).

Access data as 64-bit floats (numpy dtype <f8, little-endian).