com.redhat.et.silex.cluster

RandomForestCluster

case class RandomForestCluster[T](extractor: (T) ⇒ Seq[Double], categoryInfo: Map[Int, Int], syntheticSS: Int, rfNumTrees: Int, rfMaxDepth: Int, rfMaxBins: Int, clusterK: Int, clusterMaxIter: Int, clusterEps: Double, clusterFractionEps: Double, clusterSS: Int, clusterThreads: Int, seed: Long) extends Serializable with Logging with Product

An object for training a Random Forest clustering model on unsupervised data.

Data is required to have a mapping into a feature space of type Seq[Double].

extractor

A feature extraction function for data objects

categoryInfo

A map from feature indexes into numbers of categories. Feature indexes that do not have an entry in the map are assumed to be numeric, not categorical. Defaults to category-info from Extractor, if the feature extraction function is of this type. Otherwise defaults to empty, i.e. all numeric features.

syntheticSS

The size of synthetic (margin-sampled) data to be constructed. Defaults to the size of the input data.

rfNumTrees

The number of decision trees to train in the Random Forest Defaults to 10.

rfMaxDepth

Maximum decision tree depth. Defaults to 5.

rfMaxBins

Maximum histogramming bins to use for numeric data. Defaults to 5.

clusterK

The number of clusters to use when clustering leaf-id vectors. Defaults to an automatic estimation of a "good" number of clusters.

clusterMaxIter

Maximum clustering refinement iterations to compute. Defaults to 25.

clusterEps

Halt clustering if clustering metric-cost changes by less than this value. Defaults to 0

clusterFractionEps

Halt clustering if clustering metric-cost changes by this fractional value from previous iteration. Defaults to 0.0001

clusterSS

If data is larger, use this random sample size. Defaults to 1000.

clusterThreads

Use this number of threads to accelerate clustering. Defaults to 1.

seed

A seed to use for RNG. Defaults to using a randomized seed value.

Linear Supertypes
Product, Equals, Logging, Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. RandomForestCluster
  2. Product
  3. Equals
  4. Logging
  5. Serializable
  6. Serializable
  7. AnyRef
  8. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new RandomForestCluster(extractor: (T) ⇒ Seq[Double], categoryInfo: Map[Int, Int], syntheticSS: Int, rfNumTrees: Int, rfMaxDepth: Int, rfMaxBins: Int, clusterK: Int, clusterMaxIter: Int, clusterEps: Double, clusterFractionEps: Double, clusterSS: Int, clusterThreads: Int, seed: Long)

    extractor

    A feature extraction function for data objects

    categoryInfo

    A map from feature indexes into numbers of categories. Feature indexes that do not have an entry in the map are assumed to be numeric, not categorical. Defaults to category-info from Extractor, if the feature extraction function is of this type. Otherwise defaults to empty, i.e. all numeric features.

    syntheticSS

    The size of synthetic (margin-sampled) data to be constructed. Defaults to the size of the input data.

    rfNumTrees

    The number of decision trees to train in the Random Forest Defaults to 10.

    rfMaxDepth

    Maximum decision tree depth. Defaults to 5.

    rfMaxBins

    Maximum histogramming bins to use for numeric data. Defaults to 5.

    clusterK

    The number of clusters to use when clustering leaf-id vectors. Defaults to an automatic estimation of a "good" number of clusters.

    clusterMaxIter

    Maximum clustering refinement iterations to compute. Defaults to 25.

    clusterEps

    Halt clustering if clustering metric-cost changes by less than this value. Defaults to 0

    clusterFractionEps

    Halt clustering if clustering metric-cost changes by this fractional value from previous iteration. Defaults to 0.0001

    clusterSS

    If data is larger, use this random sample size. Defaults to 1000.

    clusterThreads

    Use this number of threads to accelerate clustering. Defaults to 1.

    seed

    A seed to use for RNG. Defaults to using a randomized seed value.

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. val categoryInfo: Map[Int, Int]

    A map from feature indexes into numbers of categories.

    A map from feature indexes into numbers of categories. Feature indexes that do not have an entry in the map are assumed to be numeric, not categorical. Defaults to category-info from Extractor, if the feature extraction function is of this type. Otherwise defaults to empty, i.e. all numeric features.

  8. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. val clusterEps: Double

    Halt clustering if clustering metric-cost changes by less than this value.

    Halt clustering if clustering metric-cost changes by less than this value. Defaults to 0

  10. val clusterFractionEps: Double

    Halt clustering if clustering metric-cost changes by this fractional value from previous iteration.

    Halt clustering if clustering metric-cost changes by this fractional value from previous iteration. Defaults to 0.0001

  11. val clusterK: Int

    The number of clusters to use when clustering leaf-id vectors.

    The number of clusters to use when clustering leaf-id vectors. Defaults to an automatic estimation of a "good" number of clusters.

  12. val clusterMaxIter: Int

    Maximum clustering refinement iterations to compute.

    Maximum clustering refinement iterations to compute. Defaults to 25.

  13. val clusterSS: Int

    If data is larger, use this random sample size.

    If data is larger, use this random sample size. Defaults to 1000.

  14. val clusterThreads: Int

    Use this number of threads to accelerate clustering.

    Use this number of threads to accelerate clustering. Defaults to 1.

  15. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  16. val extractor: (T) ⇒ Seq[Double]

    A feature extraction function for data objects

  17. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  18. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  19. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  20. def logDebug(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  21. def logError(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  22. def logInfo(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  23. def logWarning(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  24. def logger: Logger

    Definition Classes
    Logging
  25. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  26. final def notify(): Unit

    Definition Classes
    AnyRef
  27. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  28. val rfMaxBins: Int

    Maximum histogramming bins to use for numeric data.

    Maximum histogramming bins to use for numeric data. Defaults to 5.

  29. val rfMaxDepth: Int

    Maximum decision tree depth.

    Maximum decision tree depth. Defaults to 5.

  30. val rfNumTrees: Int

    The number of decision trees to train in the Random Forest Defaults to 10.

  31. def run(data: RDD[T]): RandomForestClusterModel[T]

    Train a Random Forest clustering model from input data

    Train a Random Forest clustering model from input data

    data

    The input data objects to cluster

    returns

    An RF clustering model of the input data

  32. val seed: Long

    A seed to use for RNG.

    A seed to use for RNG. Defaults to using a randomized seed value.

  33. def setCategoryInfo(categoryInfoNew: Map[Int, Int]): RandomForestCluster[T]

    Set a new category info map

    Set a new category info map

    categoryInfoNew

    New category-info map to use

    returns

    Copy of this instance with new category info

  34. def setClusterEps(clusterEpsNew: Double): RandomForestCluster[T]

    Set a new clustering epsilon halting threshold

    Set a new clustering epsilon halting threshold

    clusterEpsNew

    New epsilon halting threshold

    returns

    Copy of this instance with new clustering epsilon threshold

  35. def setClusterFractionEps(clusterFractionEpsNew: Double): RandomForestCluster[T]

    Set a new clustering fractional epsilon halting threshold

    Set a new clustering fractional epsilon halting threshold

    clusterFractionEpsNew

    New fractional epsilon value

    returns

    Copy of this instance with new fractional epsilon threshold

  36. def setClusterK(clusterKNew: Int): RandomForestCluster[T]

    Set a new target cluster size

    Set a new target cluster size

    clusterKNew

    New target cluster number. Zero sets to automatic determination.

    returns

    Copy of this instance with new target cluster size

  37. def setClusterMaxIter(clusterMaxIterNew: Int): RandomForestCluster[T]

    Set a new maximum clustering refinement iteration

    Set a new maximum clustering refinement iteration

    clusterMaxIterNew

    New maximum number of refinement iterations

    returns

    Copy of this instance with new maximum iteration

  38. def setClusterSS(clusterSSNew: Int): RandomForestCluster[T]

    Set a new clustering sample size

    Set a new clustering sample size

    clusterSSNew

    New clustering sample size

    returns

    Copy of this instance with new sample size

  39. def setClusterThreads(clusterThreadsNew: Int): RandomForestCluster[T]

    Set a new clustering number of threads

    Set a new clustering number of threads

    clusterThreadsNew

    New number of process threads to use

    returns

    Copy of this instance with new threading number

  40. def setExtractor(extractorNew: (T) ⇒ Seq[Double]): RandomForestCluster[T]

    Set a new feature extraction function for input objects

    Set a new feature extraction function for input objects

    extractorNew

    The feature extraction function

    returns

    Copy of this instance with new extractor

  41. def setRfMaxBins(rfMaxBinsNew: Int): RandomForestCluster[T]

    Set a new Random Forest maximum numeric binning value

    Set a new Random Forest maximum numeric binning value

    rfMaxBinsNew

    New maximum numeric binning value

    returns

    Copy of this instance with new maximum binning value

  42. def setRfMaxDepth(rfMaxDepthNew: Int): RandomForestCluster[T]

    Set a new Random Forest maximum tree depth

    Set a new Random Forest maximum tree depth

    rfMaxDepthNew

    New maximum decision tree depth

    returns

    Copy of this instance with new maximum decision tree depth

  43. def setRfNumTrees(rfNumTreesNew: Int): RandomForestCluster[T]

    Set a new number of Random Forest trees to train for the model

    Set a new number of Random Forest trees to train for the model

    rfNumTreesNew

    New number of trees to use for the RF

    returns

    Copy of this instance with new Random Forest size

  44. def setSeed(seedNew: Long): RandomForestCluster[T]

    Set a new RNG seed

    Set a new RNG seed

    seedNew

    New RNG seed to use

    returns

    Copy of this instance with new RNG seed

  45. def setSyntheticSS(syntheticSSNew: Int): RandomForestCluster[T]

    Set a new synthetic data sample size

    Set a new synthetic data sample size

    syntheticSSNew

    New synthetic data size to use

    returns

    Copy of this instance with new synthetic data size

  46. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  47. val syntheticSS: Int

    The size of synthetic (margin-sampled) data to be constructed.

    The size of synthetic (margin-sampled) data to be constructed. Defaults to the size of the input data.

  48. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  49. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  50. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Product

Inherited from Equals

Inherited from Logging

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped