Module Std.Hash
module type Full = sig ... endmodule type S = sig ... endmodule F
(Hash : Ppx_hash_lib.Std.Hash.S) :
Ppx_hash_lib.Std.Hash.Full
with type hash_value = Hash.hash_value
and type state = Hash.state
and type seed = Hash.seedThe code of ppx_hash is agnostic to the choice of hash algorithm that is used. However, it is not currently possible to mix various choices of hash algorithms in a given code base.
We experimented with: (a) custom hash algorithms implemented in OCaml and (b) in C; (c) OCaml's internal hash function (which is a custom version of Murmur3, implemented in C); (d) siphash, a modern hash function implemented in C.
Our findings were as follows:
- Implementing our own custom hash algorithms in OCaml and C yielded very little performance improvement over the (c) proposal, without providing the benefit of being a peer-reviewed, widely used hash function.
- Siphash (a modern hash function with an internal state of 32 bytes) has a worse performance profile than (a,b,c) above (hashing takes more time). Since its internal state is bigger than an OCaml immediate value, one must either manage allocation of such state explicitly, or paying the cost of allocation each time a hash is computed. While being a supposedly good hash function (with good hash quality), this quality was not translated in measurable improvements in our macro benchmarks. (Also, based on the data available at the time of writing, it's unclear that other hash algorithms in this class would be more than marginally faster.)
- By contrast, using the internal combinators of OCaml hash function means that we do not allocate (the internal state of this hash function is 32 bit) and have the same quality and performance as Hashtbl.hash.
Hence, we are here making the choice of using this Internalhash (that is, Murmur3, the OCaml hash algorithm as of 4.03) as our hash algorithm. It means that the state of the hash function does not need to be preallocated, and makes for simpler use in hash tables and other structures.
include Ppx_hash_lib.Std.Hash.Full
with type state = Base_internalhash_types.state
and type seed = Base_internalhash_types.seed
and type hash_value = Base_internalhash_types.hash_value
type state = Base_internalhash_types.statestate is the internal hash-state used by the hash function.
val fold_int :
Ppx_hash_lib.Std.Hash.state ->
int ->
Ppx_hash_lib.Std.Hash.state @@ portablefold_<T> state v incorporates a value v of type <T> into the hash-state, returning a modified hash-state. Implementations of the fold_<T> functions may mutate the state argument in place, and return a reference to it. Implementations of the fold_<T> functions should not allocate.
val fold_int64 :
Ppx_hash_lib.Std.Hash.state ->
int64 ->
Ppx_hash_lib.Std.Hash.state @@ portableval fold_float :
Ppx_hash_lib.Std.Hash.state ->
float ->
Ppx_hash_lib.Std.Hash.state @@ portableval fold_string :
Ppx_hash_lib.Std.Hash.state ->
string ->
Ppx_hash_lib.Std.Hash.state @@ portabletype seed = Base_internalhash_types.seedseed is the type used to seed the initial hash-state.
val alloc : unit -> Ppx_hash_lib.Std.Hash.state @@ portablealloc () returns a fresh uninitialized hash-state. May allocate.
val reset :
?seed:Ppx_hash_lib.Std.Hash.seed ->
Ppx_hash_lib.Std.Hash.state ->
Ppx_hash_lib.Std.Hash.state @@ portablereset ?seed state initializes/resets a hash-state with the given seed, or else a default-seed. Argument state may be mutated. Should not allocate.
type hash_value = Base_internalhash_types.hash_valuehash_value The type of hash values, returned by get_hash_value.
val get_hash_value :
Ppx_hash_lib.Std.Hash.state ->
Ppx_hash_lib.Std.Hash.hash_value @@ portableget_hash_value extracts a hash-value from the hash-state.
module For_tests : sig ... endtype ('a : any) folder =
Ppx_hash_lib.Std.Hash.state ->
'a ->
Ppx_hash_lib.Std.Hash.stateval create :
?seed:Ppx_hash_lib.Std.Hash.seed ->
unit ->
Ppx_hash_lib.Std.Hash.state @@ portablecreate ?seed () is a convenience. Equivalent to reset ?seed (alloc ()).
val of_fold :
(Ppx_hash_lib.Std.Hash.state -> 'a -> Ppx_hash_lib.Std.Hash.state) ->
'a ->
Ppx_hash_lib.Std.Hash.hash_value @@ portableof_fold fold constructs a standard hash function from an existing fold function.
module Builtin : sig ... endval run :
?seed:Ppx_hash_lib.Std.Hash.seed ->
'a Ppx_hash_lib.Std.Hash.folder ->
'a ->
Ppx_hash_lib.Std.Hash.hash_value @@ portablerun ?seed folder x runs folder on x in a newly allocated hash-state, initialized using optional seed or a default-seed.
The following identity exists: run [%hash_fold: T] == [%hash: T]
run can be used if we wish to run a hash-folder with a non-default seed.