com.redhat.et.silex.cluster

KMedoids

case class KMedoids[T](metric: (T, T) ⇒ Double, k: Int, maxIterations: Int, epsilon: Double, fractionEpsilon: Double, sampleSize: Int, numThreads: Int, seed: Long) extends Serializable with Logging with Product

An object for training a K-Medoid clustering model on Seq or RDD data.

Data is required to have a metric function defined on it, but it does not require an algebra over data elements, as K-Means clustering does.

metric

The distance metric imposed on data elements

k

The number of clusters to use. If k is zero, the clustering will attempt to identify a number of clusters that is "good" w.r.t. Minimum Description Length.

maxIterations

The maximum number of model refinement iterations to run

epsilon

The epsilon threshold to use. Must be >= 0.

If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) <= epsilon (Lower cost is better).

fractionEpsilon

The fractionEpsilon threshold to use. Must be >= 0.

If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) / c0 <= fractionEpsilon (Lower cost is better).

sampleSize

The target size of the random sample. Must be > 0.

numThreads

The number of threads to use while clustering

seed

The random seed to use for RNG.

Cluster training runs with the same starting random seed will be the same. By default, training runs will vary randomly.

Linear Supertypes
Product, Equals, Logging, Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. KMedoids
  2. Product
  3. Equals
  4. Logging
  5. Serializable
  6. Serializable
  7. AnyRef
  8. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new KMedoids(metric: (T, T) ⇒ Double, k: Int, maxIterations: Int, epsilon: Double, fractionEpsilon: Double, sampleSize: Int, numThreads: Int, seed: Long)

    metric

    The distance metric imposed on data elements

    k

    The number of clusters to use. If k is zero, the clustering will attempt to identify a number of clusters that is "good" w.r.t. Minimum Description Length.

    maxIterations

    The maximum number of model refinement iterations to run

    epsilon

    The epsilon threshold to use. Must be >= 0.

    If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) <= epsilon (Lower cost is better).

    fractionEpsilon

    The fractionEpsilon threshold to use. Must be >= 0.

    If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) / c0 <= fractionEpsilon (Lower cost is better).

    sampleSize

    The target size of the random sample. Must be > 0.

    numThreads

    The number of threads to use while clustering

    seed

    The random seed to use for RNG.

    Cluster training runs with the same starting random seed will be the same. By default, training runs will vary randomly.

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. val epsilon: Double

    The epsilon threshold to use.

    The epsilon threshold to use. Must be >= 0.

    If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) <= epsilon (Lower cost is better).

  9. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  10. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. val fractionEpsilon: Double

    The fractionEpsilon threshold to use.

    The fractionEpsilon threshold to use. Must be >= 0.

    If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) / c0 <= fractionEpsilon (Lower cost is better).

  12. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  13. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  14. val k: Int

    The number of clusters to use.

    The number of clusters to use. If k is zero, the clustering will attempt to identify a number of clusters that is "good" w.r.t. Minimum Description Length.

  15. def logDebug(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  16. def logError(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  17. def logInfo(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  18. def logWarning(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  19. def logger: Logger

    Definition Classes
    Logging
  20. val maxIterations: Int

    The maximum number of model refinement iterations to run

  21. val metric: (T, T) ⇒ Double

    The distance metric imposed on data elements

  22. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  23. final def notify(): Unit

    Definition Classes
    AnyRef
  24. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  25. val numThreads: Int

    The number of threads to use while clustering

  26. def run(data: Seq[T]): KMedoidsModel[T]

    Perform a K-Medoid clustering model training run on some input data

    Perform a K-Medoid clustering model training run on some input data

    data

    The input data to train the clustering model on.

    returns

    A KMedoidsModel object representing the clustering model.

  27. def run(data: RDD[T]): KMedoidsModel[T]

    Perform a K-Medoid clustering model training run on some input data

    Perform a K-Medoid clustering model training run on some input data

    data

    The input data to train the clustering model on.

    returns

    A KMedoidsModel object representing the clustering model.

  28. val sampleSize: Int

    The target size of the random sample.

    The target size of the random sample. Must be > 0.

  29. val seed: Long

    The random seed to use for RNG.

    The random seed to use for RNG.

    Cluster training runs with the same starting random seed will be the same. By default, training runs will vary randomly.

  30. def setEpsilon(epsilon_: Double): KMedoids[T]

    Set epsilon halting threshold for clustering cost improvement between refinements.

    Set epsilon halting threshold for clustering cost improvement between refinements.

    If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) <= epsilon (Lower cost is better).

    epsilon_

    The epsilon threshold to use. Must be >= 0.

    returns

    Copy of this instance, with updated value of epsilon

  31. def setFractionEpsilon(fractionEpsilon_: Double): KMedoids[T]

    Set fractionEpsilon threshold for clustering cost improvement between refinements.

    Set fractionEpsilon threshold for clustering cost improvement between refinements.

    If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) / c0 <= fractionEpsilon (Lower cost is better).

    fractionEpsilon_

    The fractionEpsilon threshold to use. Must be >= 0.

    returns

    Copy of this instance, with updated fractionEpsilon setting

  32. def setK(k_: Int): KMedoids[T]

    Set the number of clusters to train

    Set the number of clusters to train

    k_

    The number of clusters. Must be >= 0. If k is zero, the clustering will attempt to identify a number of clusters that is "good" w.r.t. Minimum Description Length.

    returns

    Copy of this instance with new value for k

  33. def setMaxIterations(maxIterations_: Int): KMedoids[T]

    Set the maximum number of iterations to allow before halting cluster refinement.

    Set the maximum number of iterations to allow before halting cluster refinement.

    maxIterations_

    The maximum number of refinement iterations. Must be > 0.

    returns

    Copy of this instance, with updated value for maxIterations

  34. def setMetric(metric_: (T, T) ⇒ Double): KMedoids[T]

    Set the distance metric to use over data elements

    Set the distance metric to use over data elements

    metric_

    The distance metric

    returns

    Copy of this instance with new metric

  35. def setNumThreads(numThreads_: Int): KMedoids[T]

    Set the number of threads to use for clustering runs

    Set the number of threads to use for clustering runs

    numThreads_

    The number of threads to use while clustering. Must be > 0.

    returns

    Copy of this instance with updated value of numThreads

  36. def setSampleSize(sampleSize_: Int): KMedoids[T]

    Set the size of the random sample to take from input data to use for clustering.

    Set the size of the random sample to take from input data to use for clustering.

    sampleSize_

    The target size of the random sample. Must be > 0.

    returns

    Copy of this instance, with updated value of sampleSize

  37. def setSeed(seed_: Long): KMedoids[T]

    Set the random number generation (RNG) seed.

    Set the random number generation (RNG) seed.

    Cluster training runs with the same starting random seed will be the same. By default, training runs will vary randomly.

    seed_

    The random seed to use for RNG

    returns

    Copy of this instance, with updated random seed

  38. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  39. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  40. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  41. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Product

Inherited from Equals

Inherited from Logging

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped