Class/Object

io.radanalytics.silex.cluster

KMedoids

Related Docs: object KMedoids | package cluster

Permalink

case class KMedoids[T](metric: (T, T) ⇒ Double, k: Int, maxIterations: Int, epsilon: Double, fractionEpsilon: Double, sampleSize: Int, numThreads: Int, seed: Long) extends Serializable with Logging with Product

An object for training a K-Medoid clustering model on Seq or RDD data.

Data is required to have a metric function defined on it, but it does not require an algebra over data elements, as K-Means clustering does.

metric

The distance metric imposed on data elements

k

The number of clusters to use. If k is zero, the clustering will attempt to identify a number of clusters that is "good" w.r.t. Minimum Description Length.

maxIterations

The maximum number of model refinement iterations to run

epsilon

The epsilon threshold to use. Must be >= 0. If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) <= epsilon (Lower cost is better).

fractionEpsilon

The fractionEpsilon threshold to use. Must be >= 0. If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) / c0 <= fractionEpsilon (Lower cost is better).

sampleSize

The target size of the random sample. Must be > 0.

numThreads

The number of threads to use while clustering

seed

The random seed to use for RNG. Cluster training runs with the same starting random seed will be the same. By default, training runs will vary randomly.

Linear Supertypes
Product, Equals, Logging, Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. KMedoids
  2. Product
  3. Equals
  4. Logging
  5. Serializable
  6. Serializable
  7. AnyRef
  8. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new KMedoids(metric: (T, T) ⇒ Double, k: Int, maxIterations: Int, epsilon: Double, fractionEpsilon: Double, sampleSize: Int, numThreads: Int, seed: Long)

    Permalink

    metric

    The distance metric imposed on data elements

    k

    The number of clusters to use. If k is zero, the clustering will attempt to identify a number of clusters that is "good" w.r.t. Minimum Description Length.

    maxIterations

    The maximum number of model refinement iterations to run

    epsilon

    The epsilon threshold to use. Must be >= 0. If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) <= epsilon (Lower cost is better).

    fractionEpsilon

    The fractionEpsilon threshold to use. Must be >= 0. If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) / c0 <= fractionEpsilon (Lower cost is better).

    sampleSize

    The target size of the random sample. Must be > 0.

    numThreads

    The number of threads to use while clustering

    seed

    The random seed to use for RNG. Cluster training runs with the same starting random seed will be the same. By default, training runs will vary randomly.

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. val epsilon: Double

    Permalink

    The epsilon threshold to use.

    The epsilon threshold to use. Must be >= 0. If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) <= epsilon (Lower cost is better).

  7. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  8. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. val fractionEpsilon: Double

    Permalink

    The fractionEpsilon threshold to use.

    The fractionEpsilon threshold to use. Must be >= 0. If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) / c0 <= fractionEpsilon (Lower cost is better).

  10. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  11. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  12. val k: Int

    Permalink

    The number of clusters to use.

    The number of clusters to use. If k is zero, the clustering will attempt to identify a number of clusters that is "good" w.r.t. Minimum Description Length.

  13. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  14. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  15. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  16. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  17. def logger: Logger

    Permalink
    Definition Classes
    Logging
  18. val maxIterations: Int

    Permalink

    The maximum number of model refinement iterations to run

  19. val metric: (T, T) ⇒ Double

    Permalink

    The distance metric imposed on data elements

  20. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  21. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  22. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  23. val numThreads: Int

    Permalink

    The number of threads to use while clustering

  24. def run(data: Seq[T]): KMedoidsModel[T]

    Permalink

    Perform a K-Medoid clustering model training run on some input data

    Perform a K-Medoid clustering model training run on some input data

    data

    The input data to train the clustering model on.

    returns

    A KMedoidsModel object representing the clustering model.

  25. def run(data: RDD[T]): KMedoidsModel[T]

    Permalink

    Perform a K-Medoid clustering model training run on some input data

    Perform a K-Medoid clustering model training run on some input data

    data

    The input data to train the clustering model on.

    returns

    A KMedoidsModel object representing the clustering model.

  26. val sampleSize: Int

    Permalink

    The target size of the random sample.

    The target size of the random sample. Must be > 0.

  27. val seed: Long

    Permalink

    The random seed to use for RNG.

    The random seed to use for RNG. Cluster training runs with the same starting random seed will be the same. By default, training runs will vary randomly.

  28. def setEpsilon(epsilon_: Double): KMedoids[T]

    Permalink

    Set epsilon halting threshold for clustering cost improvement between refinements.

    Set epsilon halting threshold for clustering cost improvement between refinements.

    If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) <= epsilon (Lower cost is better).

    epsilon_

    The epsilon threshold to use. Must be >= 0.

    returns

    Copy of this instance, with updated value of epsilon

  29. def setFractionEpsilon(fractionEpsilon_: Double): KMedoids[T]

    Permalink

    Set fractionEpsilon threshold for clustering cost improvement between refinements.

    Set fractionEpsilon threshold for clustering cost improvement between refinements.

    If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) / c0 <= fractionEpsilon (Lower cost is better).

    fractionEpsilon_

    The fractionEpsilon threshold to use. Must be >= 0.

    returns

    Copy of this instance, with updated fractionEpsilon setting

  30. def setK(k_: Int): KMedoids[T]

    Permalink

    Set the number of clusters to train

    Set the number of clusters to train

    k_

    The number of clusters. Must be >= 0. If k is zero, the clustering will attempt to identify a number of clusters that is "good" w.r.t. Minimum Description Length.

    returns

    Copy of this instance with new value for k

  31. def setMaxIterations(maxIterations_: Int): KMedoids[T]

    Permalink

    Set the maximum number of iterations to allow before halting cluster refinement.

    Set the maximum number of iterations to allow before halting cluster refinement.

    maxIterations_

    The maximum number of refinement iterations. Must be > 0.

    returns

    Copy of this instance, with updated value for maxIterations

  32. def setMetric(metric_: (T, T) ⇒ Double): KMedoids[T]

    Permalink

    Set the distance metric to use over data elements

    Set the distance metric to use over data elements

    metric_

    The distance metric

    returns

    Copy of this instance with new metric

  33. def setNumThreads(numThreads_: Int): KMedoids[T]

    Permalink

    Set the number of threads to use for clustering runs

    Set the number of threads to use for clustering runs

    numThreads_

    The number of threads to use while clustering. Must be > 0.

    returns

    Copy of this instance with updated value of numThreads

  34. def setSampleSize(sampleSize_: Int): KMedoids[T]

    Permalink

    Set the size of the random sample to take from input data to use for clustering.

    Set the size of the random sample to take from input data to use for clustering.

    sampleSize_

    The target size of the random sample. Must be > 0.

    returns

    Copy of this instance, with updated value of sampleSize

  35. def setSeed(seed_: Long): KMedoids[T]

    Permalink

    Set the random number generation (RNG) seed.

    Set the random number generation (RNG) seed.

    Cluster training runs with the same starting random seed will be the same. By default, training runs will vary randomly.

    seed_

    The random seed to use for RNG

    returns

    Copy of this instance, with updated random seed

  36. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  37. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  38. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  39. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Product

Inherited from Equals

Inherited from Logging

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped