estimate-cardinality ( seq k -- n )


Vocabulary
math.cardinality

Inputs
seqa sequence
ka number


Outputs
na number


Word description
Estimates the number of unique elements in seq.

The number k controls how many bits of hash to use, creating 2^k buckets.

Definition


:: estimate-cardinality ( seq k -- n )
k 2^ :> num_buckets num_buckets 0 <array> :> max_zeros seq [
hashcode >fixnum :> h h num_buckets 1 - bitand
:> bucket h k neg shift :> bucket_hash bucket max_zeros
[ bucket_hash trailing-zeros max ] change-nth
] each max_zeros [ mean 2 swap ^ ] [ length * ] bi 0.79402 *
;