Class/Object

io.radanalytics.silex.cluster

RandomForestCluster

Related Docs: object RandomForestCluster | package cluster

Permalink

case class RandomForestCluster[T](extractor: (T) ⇒ Seq[Double], categoryInfo: Map[Int, Int], syntheticSS: Int, rfNumTrees: Int, rfMaxDepth: Int, rfMaxBins: Int, clusterK: Int, clusterMaxIter: Int, clusterEps: Double, clusterFractionEps: Double, clusterSS: Int, clusterThreads: Int, seed: Long) extends Serializable with Logging with Product

An object for training a Random Forest clustering model on unsupervised data.

Data is required to have a mapping into a feature space of type Seq[Double].

extractor

A feature extraction function for data objects

categoryInfo

A map from feature indexes into numbers of categories. Feature indexes that do not have an entry in the map are assumed to be numeric, not categorical. Defaults to category-info from Extractor, if the feature extraction function is of this type. Otherwise defaults to empty, i.e. all numeric features.

syntheticSS

The size of synthetic (margin-sampled) data to be constructed. Defaults to the size of the input data.

rfNumTrees

The number of decision trees to train in the Random Forest Defaults to 10.

rfMaxDepth

Maximum decision tree depth. Defaults to 5.

rfMaxBins

Maximum histogramming bins to use for numeric data. Defaults to 5.

clusterK

The number of clusters to use when clustering leaf-id vectors. Defaults to an automatic estimation of a "good" number of clusters.

clusterMaxIter

Maximum clustering refinement iterations to compute. Defaults to 25.

clusterEps

Halt clustering if clustering metric-cost changes by less than this value. Defaults to 0

clusterFractionEps

Halt clustering if clustering metric-cost changes by this fractional value from previous iteration. Defaults to 0.0001

clusterSS

If data is larger, use this random sample size. Defaults to 1000.

clusterThreads

Use this number of threads to accelerate clustering. Defaults to 1.

seed

A seed to use for RNG. Defaults to using a randomized seed value.

Linear Supertypes
Product, Equals, Logging, Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. RandomForestCluster
  2. Product
  3. Equals
  4. Logging
  5. Serializable
  6. Serializable
  7. AnyRef
  8. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new RandomForestCluster(extractor: (T) ⇒ Seq[Double], categoryInfo: Map[Int, Int], syntheticSS: Int, rfNumTrees: Int, rfMaxDepth: Int, rfMaxBins: Int, clusterK: Int, clusterMaxIter: Int, clusterEps: Double, clusterFractionEps: Double, clusterSS: Int, clusterThreads: Int, seed: Long)

    Permalink

    extractor

    A feature extraction function for data objects

    categoryInfo

    A map from feature indexes into numbers of categories. Feature indexes that do not have an entry in the map are assumed to be numeric, not categorical. Defaults to category-info from Extractor, if the feature extraction function is of this type. Otherwise defaults to empty, i.e. all numeric features.

    syntheticSS

    The size of synthetic (margin-sampled) data to be constructed. Defaults to the size of the input data.

    rfNumTrees

    The number of decision trees to train in the Random Forest Defaults to 10.

    rfMaxDepth

    Maximum decision tree depth. Defaults to 5.

    rfMaxBins

    Maximum histogramming bins to use for numeric data. Defaults to 5.

    clusterK

    The number of clusters to use when clustering leaf-id vectors. Defaults to an automatic estimation of a "good" number of clusters.

    clusterMaxIter

    Maximum clustering refinement iterations to compute. Defaults to 25.

    clusterEps

    Halt clustering if clustering metric-cost changes by less than this value. Defaults to 0

    clusterFractionEps

    Halt clustering if clustering metric-cost changes by this fractional value from previous iteration. Defaults to 0.0001

    clusterSS

    If data is larger, use this random sample size. Defaults to 1000.

    clusterThreads

    Use this number of threads to accelerate clustering. Defaults to 1.

    seed

    A seed to use for RNG. Defaults to using a randomized seed value.

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. val categoryInfo: Map[Int, Int]

    Permalink

    A map from feature indexes into numbers of categories.

    A map from feature indexes into numbers of categories. Feature indexes that do not have an entry in the map are assumed to be numeric, not categorical. Defaults to category-info from Extractor, if the feature extraction function is of this type. Otherwise defaults to empty, i.e. all numeric features.

  6. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  7. val clusterEps: Double

    Permalink

    Halt clustering if clustering metric-cost changes by less than this value.

    Halt clustering if clustering metric-cost changes by less than this value. Defaults to 0

  8. val clusterFractionEps: Double

    Permalink

    Halt clustering if clustering metric-cost changes by this fractional value from previous iteration.

    Halt clustering if clustering metric-cost changes by this fractional value from previous iteration. Defaults to 0.0001

  9. val clusterK: Int

    Permalink

    The number of clusters to use when clustering leaf-id vectors.

    The number of clusters to use when clustering leaf-id vectors. Defaults to an automatic estimation of a "good" number of clusters.

  10. val clusterMaxIter: Int

    Permalink

    Maximum clustering refinement iterations to compute.

    Maximum clustering refinement iterations to compute. Defaults to 25.

  11. val clusterSS: Int

    Permalink

    If data is larger, use this random sample size.

    If data is larger, use this random sample size. Defaults to 1000.

  12. val clusterThreads: Int

    Permalink

    Use this number of threads to accelerate clustering.

    Use this number of threads to accelerate clustering. Defaults to 1.

  13. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  14. val extractor: (T) ⇒ Seq[Double]

    Permalink

    A feature extraction function for data objects

  15. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  16. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  17. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  18. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  19. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  20. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  21. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  22. def logger: Logger

    Permalink
    Definition Classes
    Logging
  23. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  24. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  25. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  26. val rfMaxBins: Int

    Permalink

    Maximum histogramming bins to use for numeric data.

    Maximum histogramming bins to use for numeric data. Defaults to 5.

  27. val rfMaxDepth: Int

    Permalink

    Maximum decision tree depth.

    Maximum decision tree depth. Defaults to 5.

  28. val rfNumTrees: Int

    Permalink

    The number of decision trees to train in the Random Forest Defaults to 10.

  29. def run(data: RDD[T]): RandomForestClusterModel[T]

    Permalink

    Train a Random Forest clustering model from input data

    Train a Random Forest clustering model from input data

    data

    The input data objects to cluster

    returns

    An RF clustering model of the input data

  30. val seed: Long

    Permalink

    A seed to use for RNG.

    A seed to use for RNG. Defaults to using a randomized seed value.

  31. def setCategoryInfo(categoryInfoNew: Map[Int, Int]): RandomForestCluster[T]

    Permalink

    Set a new category info map

    Set a new category info map

    categoryInfoNew

    New category-info map to use

    returns

    Copy of this instance with new category info

  32. def setClusterEps(clusterEpsNew: Double): RandomForestCluster[T]

    Permalink

    Set a new clustering epsilon halting threshold

    Set a new clustering epsilon halting threshold

    clusterEpsNew

    New epsilon halting threshold

    returns

    Copy of this instance with new clustering epsilon threshold

  33. def setClusterFractionEps(clusterFractionEpsNew: Double): RandomForestCluster[T]

    Permalink

    Set a new clustering fractional epsilon halting threshold

    Set a new clustering fractional epsilon halting threshold

    clusterFractionEpsNew

    New fractional epsilon value

    returns

    Copy of this instance with new fractional epsilon threshold

  34. def setClusterK(clusterKNew: Int): RandomForestCluster[T]

    Permalink

    Set a new target cluster size

    Set a new target cluster size

    clusterKNew

    New target cluster number. Zero sets to automatic determination.

    returns

    Copy of this instance with new target cluster size

  35. def setClusterMaxIter(clusterMaxIterNew: Int): RandomForestCluster[T]

    Permalink

    Set a new maximum clustering refinement iteration

    Set a new maximum clustering refinement iteration

    clusterMaxIterNew

    New maximum number of refinement iterations

    returns

    Copy of this instance with new maximum iteration

  36. def setClusterSS(clusterSSNew: Int): RandomForestCluster[T]

    Permalink

    Set a new clustering sample size

    Set a new clustering sample size

    clusterSSNew

    New clustering sample size

    returns

    Copy of this instance with new sample size

  37. def setClusterThreads(clusterThreadsNew: Int): RandomForestCluster[T]

    Permalink

    Set a new clustering number of threads

    Set a new clustering number of threads

    clusterThreadsNew

    New number of process threads to use

    returns

    Copy of this instance with new threading number

  38. def setExtractor(extractorNew: (T) ⇒ Seq[Double]): RandomForestCluster[T]

    Permalink

    Set a new feature extraction function for input objects

    Set a new feature extraction function for input objects

    extractorNew

    The feature extraction function

    returns

    Copy of this instance with new extractor

  39. def setRfMaxBins(rfMaxBinsNew: Int): RandomForestCluster[T]

    Permalink

    Set a new Random Forest maximum numeric binning value

    Set a new Random Forest maximum numeric binning value

    rfMaxBinsNew

    New maximum numeric binning value

    returns

    Copy of this instance with new maximum binning value

  40. def setRfMaxDepth(rfMaxDepthNew: Int): RandomForestCluster[T]

    Permalink

    Set a new Random Forest maximum tree depth

    Set a new Random Forest maximum tree depth

    rfMaxDepthNew

    New maximum decision tree depth

    returns

    Copy of this instance with new maximum decision tree depth

  41. def setRfNumTrees(rfNumTreesNew: Int): RandomForestCluster[T]

    Permalink

    Set a new number of Random Forest trees to train for the model

    Set a new number of Random Forest trees to train for the model

    rfNumTreesNew

    New number of trees to use for the RF

    returns

    Copy of this instance with new Random Forest size

  42. def setSeed(seedNew: Long): RandomForestCluster[T]

    Permalink

    Set a new RNG seed

    Set a new RNG seed

    seedNew

    New RNG seed to use

    returns

    Copy of this instance with new RNG seed

  43. def setSyntheticSS(syntheticSSNew: Int): RandomForestCluster[T]

    Permalink

    Set a new synthetic data sample size

    Set a new synthetic data sample size

    syntheticSSNew

    New synthetic data size to use

    returns

    Copy of this instance with new synthetic data size

  44. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  45. val syntheticSS: Int

    Permalink

    The size of synthetic (margin-sampled) data to be constructed.

    The size of synthetic (margin-sampled) data to be constructed. Defaults to the size of the input data.

  46. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  47. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  48. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Product

Inherited from Equals

Inherited from Logging

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped