%SQL.HLL
Class %SQL.HLL Extends %RegisteredObject
ObjectScript API for building Hyper Log Log estimates of the number of unique elements (cardinality) in a group of data.
The estimates are kept in containers called sketches. The containers are identified by the id of this class.
Lets assume you have 1 million pieces of data and want to know how many of those pieces are unique:
-
Use %New to instantiate a new HLL object:
set hll= ##class(%SQL.HLL).%New()
-
Feed one million pieces of data into the sketch with update:
for i=1:1:1000000 {do hll.update(i)}
-
Get an estimate of the cardinality by calling estimate
write hll.estimate()
996537
Notes: We test this class at Intersystems by using murmur hash with a seed of hll.#SEED:
$zcrc(yourdata,9,2059198193) or $zcrc(yourdata,9,hll.#SEED)
The underlying library uses 64 bits of this 128 bit hash.
Estimate Partitioning: pass an existing sketch into %New to initialize its state
from the standard serialized form (optionally Base64 encoded).
To combine estimates get and merge your sketches, if your data is distributed
across many processes.
Parameters
ENCODE
Parameter ENCODE = 1;
Whether or not to Base64 encode/decode by default during get and %New
SEED
Parameter SEED = 2059198193;
Murmur hash seed to use for $zcrc(,9,)
%MODULENAME
Parameter %MODULENAME [ Internal ] = 15;
Properties
id
Property id As %Integer [ Internal, ReadOnly ];
Internal identifier of allocated memory for this HLL sketch's representation as managed by the callout library
type
Property type As %String [ Calculated, ReadOnly ];
Whether the estimator is currently sparse or dense
precision
Property precision As %Integer [ Calculated, ReadOnly ];
Precision of the estimator
libIndex
Property libIndex As %Integer [ Internal, MultiDimensional, Private ];
Index of $zf(-4) addresses
Methods
getFunctionID
Method getFunctionID(function As %String) As %Integer [ Internal ]
getLibraryID
ClassMethod getLibraryID() As %Integer [ Internal ]
%OnNew
Method %OnNew(sketch As %Binary = "", decode As %Boolean = {..#ENCODE}, Output err As %String = "") As %Status
Creates the memory and sets id for a new sketch. If you pass the sketch parameter, the new sketch will be initialized with the serialized sketch you passed in.
updateHash
Method updateHash(hash As %Binary) As %Integer [ Language = cpp ]
Updates this sketch with the user supplied hash value
Use $zcrc(yourdata,9,2059198193) or $zcrc(yourdata,9,hll.#SEED) to get the hash.
update
Method update(stringdata As %Binary) As %Integer [ Language = cpp ]
Updates this sketch with the $zcrc(,9,) hash of the stringdata. Hash done inside API.
merge
Method merge(other As %SQL.HLL, Output err As %String = "") As %Status
Merges the supplied sketch object into the current one. This merges the cardinality estimates.
estimate
Method estimate(Output err As %String = "") As %Integer
Returns the current unique value estimate (cardinality) for this sketch.
get
Method get(encode As %Boolean = {..#ENCODE}, Output err As %String = "") As %Binary
Returns the serialized form of the current sketch so that multiple sketches can be merged. Potentially you might obtain the sketch from a different process.
releaseSketch
Method releaseSketch(Output err As %String = "") As %Status [ Internal ]
Frees up the memory associated with this sketch. After this method has been called, subsequent calls for this sketch will yield a error. This method is called implicitly by the object destructor.
info
Method info(Output type As %String, Output precision As %String, Output err As %String) As %Status [ Internal ]
Helper method to retrieve metadata for the current sketch.
typeGet
Method typeGet() As %String [ Internal, ServerOnly = 1 ]
precisionGet
Method precisionGet() As %Integer [ Internal, ServerOnly = 1 ]
%OnClose
Method %OnClose() As %Status
version
ClassMethod version() As %Integer
Returns the version of the underlying callout library.