RandomForestCluster

Instance Constructors

new RandomForestCluster(extractor: (T) ⇒ Seq[Double], categoryInfo: Map[Int, Int], syntheticSS: Int, rfNumTrees: Int, rfMaxDepth: Int, rfMaxBins: Int, clusterK: Int, clusterMaxIter: Int, clusterEps: Double, clusterFractionEps: Double, clusterSS: Int, clusterThreads: Int, seed: Long)

extractor
A feature extraction function for data objects
categoryInfo
A map from feature indexes into numbers of categories. Feature indexes that do not have an entry in the map are assumed to be numeric, not categorical. Defaults to category-info from Extractor, if the feature extraction function is of this type. Otherwise defaults to empty, i.e. all numeric features.
syntheticSS
The size of synthetic (margin-sampled) data to be constructed. Defaults to the size of the input data.
rfNumTrees
The number of decision trees to train in the Random Forest Defaults to 10.
rfMaxDepth
Maximum decision tree depth. Defaults to 5.
rfMaxBins
Maximum histogramming bins to use for numeric data. Defaults to 5.
clusterK
The number of clusters to use when clustering leaf-id vectors. Defaults to an automatic estimation of a "good" number of clusters.
clusterMaxIter
Maximum clustering refinement iterations to compute. Defaults to 25.
clusterEps
Halt clustering if clustering metric-cost changes by less than this value. Defaults to 0
clusterFractionEps
Halt clustering if clustering metric-cost changes by this fractional value from previous iteration. Defaults to 0.0001
clusterSS
If data is larger, use this random sample size. Defaults to 1000.
clusterThreads
Use this number of threads to accelerate clustering. Defaults to 1.
seed
A seed to use for RNG. Defaults to using a randomized seed value.

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
val categoryInfo: Map[Int, Int]

A map from feature indexes into numbers of categories.
A map from feature indexes into numbers of categories. Feature indexes that do not have an entry in the map are assumed to be numeric, not categorical. Defaults to category-info from Extractor, if the feature extraction function is of this type. Otherwise defaults to empty, i.e. all numeric features.
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
val clusterEps: Double

Halt clustering if clustering metric-cost changes by less than this value.
Halt clustering if clustering metric-cost changes by less than this value. Defaults to 0
val clusterFractionEps: Double

Halt clustering if clustering metric-cost changes by this fractional value from previous iteration.
Halt clustering if clustering metric-cost changes by this fractional value from previous iteration. Defaults to 0.0001
val clusterK: Int

The number of clusters to use when clustering leaf-id vectors.
The number of clusters to use when clustering leaf-id vectors. Defaults to an automatic estimation of a "good" number of clusters.
val clusterMaxIter: Int

Maximum clustering refinement iterations to compute.
Maximum clustering refinement iterations to compute. Defaults to 25.
val clusterSS: Int

If data is larger, use this random sample size.
If data is larger, use this random sample size. Defaults to 1000.
val clusterThreads: Int

Use this number of threads to accelerate clustering.
Use this number of threads to accelerate clustering. Defaults to 1.
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
val extractor: (T) ⇒ Seq[Double]

A feature extraction function for data objects
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def logDebug(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logger: Logger

Definition Classes
Logging
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
val rfMaxBins: Int

Maximum histogramming bins to use for numeric data.
Maximum histogramming bins to use for numeric data. Defaults to 5.
val rfMaxDepth: Int

Maximum decision tree depth.
Maximum decision tree depth. Defaults to 5.
val rfNumTrees: Int

The number of decision trees to train in the Random Forest Defaults to 10.
def run(data: RDD[T]): RandomForestClusterModel[T]

Train a Random Forest clustering model from input data
Train a Random Forest clustering model from input data
data
The input data objects to cluster
returns
An RF clustering model of the input data
val seed: Long

A seed to use for RNG.
A seed to use for RNG. Defaults to using a randomized seed value.
def setCategoryInfo(categoryInfoNew: Map[Int, Int]): RandomForestCluster[T]

Set a new category info map
Set a new category info map
categoryInfoNew
New category-info map to use
returns
Copy of this instance with new category info
def setClusterEps(clusterEpsNew: Double): RandomForestCluster[T]

Set a new clustering epsilon halting threshold
Set a new clustering epsilon halting threshold
clusterEpsNew
New epsilon halting threshold
returns
Copy of this instance with new clustering epsilon threshold
def setClusterFractionEps(clusterFractionEpsNew: Double): RandomForestCluster[T]

Set a new clustering fractional epsilon halting threshold
Set a new clustering fractional epsilon halting threshold
clusterFractionEpsNew
New fractional epsilon value
returns
Copy of this instance with new fractional epsilon threshold
def setClusterK(clusterKNew: Int): RandomForestCluster[T]

Set a new target cluster size
Set a new target cluster size
clusterKNew
New target cluster number. Zero sets to automatic determination.
returns
Copy of this instance with new target cluster size
def setClusterMaxIter(clusterMaxIterNew: Int): RandomForestCluster[T]

Set a new maximum clustering refinement iteration
Set a new maximum clustering refinement iteration
clusterMaxIterNew
New maximum number of refinement iterations
returns
Copy of this instance with new maximum iteration
def setClusterSS(clusterSSNew: Int): RandomForestCluster[T]

Set a new clustering sample size
Set a new clustering sample size
clusterSSNew
New clustering sample size
returns
Copy of this instance with new sample size
def setClusterThreads(clusterThreadsNew: Int): RandomForestCluster[T]

Set a new clustering number of threads
Set a new clustering number of threads
clusterThreadsNew
New number of process threads to use
returns
Copy of this instance with new threading number
def setExtractor(extractorNew: (T) ⇒ Seq[Double]): RandomForestCluster[T]

Set a new feature extraction function for input objects
Set a new feature extraction function for input objects
extractorNew
The feature extraction function
returns
Copy of this instance with new extractor
def setRfMaxBins(rfMaxBinsNew: Int): RandomForestCluster[T]

Set a new Random Forest maximum numeric binning value
Set a new Random Forest maximum numeric binning value
rfMaxBinsNew
New maximum numeric binning value
returns
Copy of this instance with new maximum binning value
def setRfMaxDepth(rfMaxDepthNew: Int): RandomForestCluster[T]

Set a new Random Forest maximum tree depth
Set a new Random Forest maximum tree depth
rfMaxDepthNew
New maximum decision tree depth
returns
Copy of this instance with new maximum decision tree depth
def setRfNumTrees(rfNumTreesNew: Int): RandomForestCluster[T]

Set a new number of Random Forest trees to train for the model
Set a new number of Random Forest trees to train for the model
rfNumTreesNew
New number of trees to use for the RF
returns
Copy of this instance with new Random Forest size
def setSeed(seedNew: Long): RandomForestCluster[T]

Set a new RNG seed
Set a new RNG seed
seedNew
New RNG seed to use
returns
Copy of this instance with new RNG seed
def setSyntheticSS(syntheticSSNew: Int): RandomForestCluster[T]

Set a new synthetic data sample size
Set a new synthetic data sample size
syntheticSSNew
New synthetic data size to use
returns
Copy of this instance with new synthetic data size
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
val syntheticSS: Int

The size of synthetic (margin-sampled) data to be constructed.
The size of synthetic (margin-sampled) data to be constructed. Defaults to the size of the input data.
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Docs: object RandomForestCluster | package cluster

Instance Constructors

new RandomForestCluster(extractor: (T) ⇒ Seq[Double], categoryInfo: Map[Int, Int], syntheticSS: Int, rfNumTrees: Int, rfMaxDepth: Int, rfMaxBins: Int, clusterK: Int, clusterMaxIter: Int, clusterEps: Double, clusterFractionEps: Double, clusterSS: Int, clusterThreads: Int, seed: Long)

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

val categoryInfo: Map[Int, Int]

def clone(): AnyRef

val clusterEps: Double

val clusterFractionEps: Double

val clusterK: Int

val clusterMaxIter: Int

val clusterSS: Int

val clusterThreads: Int

final def eq(arg0: AnyRef): Boolean

val extractor: (T) ⇒ Seq[Double]

def finalize(): Unit

final def getClass(): Class[_]

final def isInstanceOf[T0]: Boolean

def logDebug(msg: ⇒ String): Unit

def logError(msg: ⇒ String): Unit

def logInfo(msg: ⇒ String): Unit

def logWarning(msg: ⇒ String): Unit

def logger: Logger

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

val rfMaxBins: Int

val rfMaxDepth: Int

val rfNumTrees: Int

def run(data: RDD[T]): RandomForestClusterModel[T]

val seed: Long

def setCategoryInfo(categoryInfoNew: Map[Int, Int]): RandomForestCluster[T]

def setClusterEps(clusterEpsNew: Double): RandomForestCluster[T]

def setClusterFractionEps(clusterFractionEpsNew: Double): RandomForestCluster[T]

def setClusterK(clusterKNew: Int): RandomForestCluster[T]

def setClusterMaxIter(clusterMaxIterNew: Int): RandomForestCluster[T]

def setClusterSS(clusterSSNew: Int): RandomForestCluster[T]

def setClusterThreads(clusterThreadsNew: Int): RandomForestCluster[T]

def setExtractor(extractorNew: (T) ⇒ Seq[Double]): RandomForestCluster[T]

def setRfMaxBins(rfMaxBinsNew: Int): RandomForestCluster[T]

def setRfMaxDepth(rfMaxDepthNew: Int): RandomForestCluster[T]

def setRfNumTrees(rfNumTreesNew: Int): RandomForestCluster[T]

def setSeed(seedNew: Long): RandomForestCluster[T]

def setSyntheticSS(syntheticSSNew: Int): RandomForestCluster[T]

final def synchronized[T0](arg0: ⇒ T0): T0

val syntheticSS: Int

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from Product

Inherited from Equals

Inherited from Logging

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped