Adds an element to the whitelist.
Creates a whitelist that accepts a superset of anything accepted by this and anything accepted by other.
Creates a whitelist that accepts a superset of anything accepted by this and anything accepted by other.
Returns true if s is possibly contained in the whitelist and false if it definitely is not.
Returns true if s is possibly contained in the whitelist and false if it definitely is not.
An ApproximateWhitelist is a basic Bloom filter intended for holding natural-language vocabularies. It deals with String values natively and can be trained from a sequence or from an RDD of any element type T, as long as there is an implicit conversion in scope from T to String.
Known limitation: while this filter uses several hashes, some of these will exhibit unusually high collision rates when hashing strings that are permutations of one another. If you experience poor filter performance on a given vocabulary, this might be worth investigating. The choice of hash functions is subject to change in a future release.