Package

io.radanalytics.silex

cluster

Permalink

package cluster

Visibility
  1. Public
  2. All

Type Members

  1. class ClusteringRandomForestModel extends Serializable

    Permalink

    Enhance Spark RandomForestModel objects with methods for Random Forest Clustering

  2. class ClusteringTreeModel extends Serializable

    Permalink

    Enhance a Spark DecisionTreeModel object with methods for Random Forest clustering

  3. case class KMedoids[T](metric: (T, T) ⇒ Double, k: Int, maxIterations: Int, epsilon: Double, fractionEpsilon: Double, sampleSize: Int, numThreads: Int, seed: Long) extends Serializable with Logging with Product

    Permalink

    An object for training a K-Medoid clustering model on Seq or RDD data.

    An object for training a K-Medoid clustering model on Seq or RDD data.

    Data is required to have a metric function defined on it, but it does not require an algebra over data elements, as K-Means clustering does.

    metric

    The distance metric imposed on data elements

    k

    The number of clusters to use. If k is zero, the clustering will attempt to identify a number of clusters that is "good" w.r.t. Minimum Description Length.

    maxIterations

    The maximum number of model refinement iterations to run

    epsilon

    The epsilon threshold to use. Must be >= 0. If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) <= epsilon (Lower cost is better).

    fractionEpsilon

    The fractionEpsilon threshold to use. Must be >= 0. If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) / c0 <= fractionEpsilon (Lower cost is better).

    sampleSize

    The target size of the random sample. Must be > 0.

    numThreads

    The number of threads to use while clustering

    seed

    The random seed to use for RNG. Cluster training runs with the same starting random seed will be the same. By default, training runs will vary randomly.

  4. class KMedoidsModel[T] extends Serializable

    Permalink

    Represents a K-Medoids clustering model

  5. case class RandomForestCluster[T](extractor: (T) ⇒ Seq[Double], categoryInfo: Map[Int, Int], syntheticSS: Int, rfNumTrees: Int, rfMaxDepth: Int, rfMaxBins: Int, clusterK: Int, clusterMaxIter: Int, clusterEps: Double, clusterFractionEps: Double, clusterSS: Int, clusterThreads: Int, seed: Long) extends Serializable with Logging with Product

    Permalink

    An object for training a Random Forest clustering model on unsupervised data.

    An object for training a Random Forest clustering model on unsupervised data.

    Data is required to have a mapping into a feature space of type Seq[Double].

    extractor

    A feature extraction function for data objects

    categoryInfo

    A map from feature indexes into numbers of categories. Feature indexes that do not have an entry in the map are assumed to be numeric, not categorical. Defaults to category-info from Extractor, if the feature extraction function is of this type. Otherwise defaults to empty, i.e. all numeric features.

    syntheticSS

    The size of synthetic (margin-sampled) data to be constructed. Defaults to the size of the input data.

    rfNumTrees

    The number of decision trees to train in the Random Forest Defaults to 10.

    rfMaxDepth

    Maximum decision tree depth. Defaults to 5.

    rfMaxBins

    Maximum histogramming bins to use for numeric data. Defaults to 5.

    clusterK

    The number of clusters to use when clustering leaf-id vectors. Defaults to an automatic estimation of a "good" number of clusters.

    clusterMaxIter

    Maximum clustering refinement iterations to compute. Defaults to 25.

    clusterEps

    Halt clustering if clustering metric-cost changes by less than this value. Defaults to 0

    clusterFractionEps

    Halt clustering if clustering metric-cost changes by this fractional value from previous iteration. Defaults to 0.0001

    clusterSS

    If data is larger, use this random sample size. Defaults to 1000.

    clusterThreads

    Use this number of threads to accelerate clustering. Defaults to 1.

    seed

    A seed to use for RNG. Defaults to using a randomized seed value.

  6. class RandomForestClusterModel[T] extends Serializable

    Permalink

    Represents a Random Forest clustering model of some data objects

Value Members

  1. object ClusteringRandomForestModel extends Serializable

    Permalink
  2. object ClusteringTreeModel extends Serializable

    Permalink

    Class definitions for ClusteringTreeModel methods

  3. object KMedoids extends Logging with Serializable

    Permalink

    Utilities used by K-Medoids clustering

  4. object KMedoidsModel extends Serializable

    Permalink

    Utility functions for KMedoidsModel

  5. object RandomForestCluster extends Serializable

    Permalink

    Factory functions and implicits for RandomForestCluster

  6. object RandomForestClusterModel extends Serializable

    Permalink

    Factory functions and implicits for RandomForestClusterModel

  7. package infra

    Permalink

Ungrouped