A feature extraction function for data objects
A map from feature indexes into numbers of categories. Feature indexes that do not have an entry in the map are assumed to be numeric, not categorical. Defaults to category-info from Extractor, if the feature extraction function is of this type. Otherwise defaults to empty, i.e. all numeric features.
The size of synthetic (margin-sampled) data to be constructed. Defaults to the size of the input data.
The number of decision trees to train in the Random Forest Defaults to 10.
Maximum decision tree depth. Defaults to 5.
Maximum histogramming bins to use for numeric data. Defaults to 5.
The number of clusters to use when clustering leaf-id vectors. Defaults to an automatic estimation of a "good" number of clusters.
Maximum clustering refinement iterations to compute. Defaults to 25.
Halt clustering if clustering metric-cost changes by less than this value. Defaults to 0
Halt clustering if clustering metric-cost changes by this fractional value from previous iteration. Defaults to 0.0001
If data is larger, use this random sample size. Defaults to 1000.
Use this number of threads to accelerate clustering. Defaults to 1.
A seed to use for RNG. Defaults to using a randomized seed value.
A map from feature indexes into numbers of categories.
A map from feature indexes into numbers of categories. Feature indexes that do not have an entry in the map are assumed to be numeric, not categorical. Defaults to category-info from Extractor, if the feature extraction function is of this type. Otherwise defaults to empty, i.e. all numeric features.
Halt clustering if clustering metric-cost changes by less than this value.
Halt clustering if clustering metric-cost changes by less than this value. Defaults to 0
Halt clustering if clustering metric-cost changes by this fractional value from previous iteration.
Halt clustering if clustering metric-cost changes by this fractional value from previous iteration. Defaults to 0.0001
The number of clusters to use when clustering leaf-id vectors.
The number of clusters to use when clustering leaf-id vectors. Defaults to an automatic estimation of a "good" number of clusters.
Maximum clustering refinement iterations to compute.
Maximum clustering refinement iterations to compute. Defaults to 25.
If data is larger, use this random sample size.
If data is larger, use this random sample size. Defaults to 1000.
Use this number of threads to accelerate clustering.
Use this number of threads to accelerate clustering. Defaults to 1.
A feature extraction function for data objects
Maximum histogramming bins to use for numeric data.
Maximum histogramming bins to use for numeric data. Defaults to 5.
Maximum decision tree depth.
Maximum decision tree depth. Defaults to 5.
The number of decision trees to train in the Random Forest Defaults to 10.
Train a Random Forest clustering model from input data
Train a Random Forest clustering model from input data
The input data objects to cluster
An RF clustering model of the input data
A seed to use for RNG.
A seed to use for RNG. Defaults to using a randomized seed value.
Set a new category info map
Set a new category info map
New category-info map to use
Copy of this instance with new category info
Set a new clustering epsilon halting threshold
Set a new clustering epsilon halting threshold
New epsilon halting threshold
Copy of this instance with new clustering epsilon threshold
Set a new clustering fractional epsilon halting threshold
Set a new clustering fractional epsilon halting threshold
New fractional epsilon value
Copy of this instance with new fractional epsilon threshold
Set a new target cluster size
Set a new target cluster size
New target cluster number. Zero sets to automatic determination.
Copy of this instance with new target cluster size
Set a new maximum clustering refinement iteration
Set a new maximum clustering refinement iteration
New maximum number of refinement iterations
Copy of this instance with new maximum iteration
Set a new clustering sample size
Set a new clustering sample size
New clustering sample size
Copy of this instance with new sample size
Set a new clustering number of threads
Set a new clustering number of threads
New number of process threads to use
Copy of this instance with new threading number
Set a new feature extraction function for input objects
Set a new feature extraction function for input objects
The feature extraction function
Copy of this instance with new extractor
Set a new Random Forest maximum numeric binning value
Set a new Random Forest maximum numeric binning value
New maximum numeric binning value
Copy of this instance with new maximum binning value
Set a new Random Forest maximum tree depth
Set a new Random Forest maximum tree depth
New maximum decision tree depth
Copy of this instance with new maximum decision tree depth
Set a new number of Random Forest trees to train for the model
Set a new number of Random Forest trees to train for the model
New number of trees to use for the RF
Copy of this instance with new Random Forest size
Set a new RNG seed
Set a new RNG seed
New RNG seed to use
Copy of this instance with new RNG seed
Set a new synthetic data sample size
Set a new synthetic data sample size
New synthetic data size to use
Copy of this instance with new synthetic data size
The size of synthetic (margin-sampled) data to be constructed.
The size of synthetic (margin-sampled) data to be constructed. Defaults to the size of the input data.
An object for training a Random Forest clustering model on unsupervised data.
Data is required to have a mapping into a feature space of type Seq[Double].
A feature extraction function for data objects
A map from feature indexes into numbers of categories. Feature indexes that do not have an entry in the map are assumed to be numeric, not categorical. Defaults to category-info from Extractor, if the feature extraction function is of this type. Otherwise defaults to empty, i.e. all numeric features.
The size of synthetic (margin-sampled) data to be constructed. Defaults to the size of the input data.
The number of decision trees to train in the Random Forest Defaults to 10.
Maximum decision tree depth. Defaults to 5.
Maximum histogramming bins to use for numeric data. Defaults to 5.
The number of clusters to use when clustering leaf-id vectors. Defaults to an automatic estimation of a "good" number of clusters.
Maximum clustering refinement iterations to compute. Defaults to 25.
Halt clustering if clustering metric-cost changes by less than this value. Defaults to 0
Halt clustering if clustering metric-cost changes by this fractional value from previous iteration. Defaults to 0.0001
If data is larger, use this random sample size. Defaults to 1000.
Use this number of threads to accelerate clustering. Defaults to 1.
A seed to use for RNG. Defaults to using a randomized seed value.