Class/Object

io.radanalytics.silex.sample.split

SplitSampleRDDFunctions

Related Docs: object SplitSampleRDDFunctions | package split

Permalink

class SplitSampleRDDFunctions[T] extends Serializable

Enhances RDDs with methods for split-sampling

T

The row type of the RDD

// import conversions to enhance RDDs with split sampling
import io.radanalytics.silex.sample.split.implicits._
// obtain a sequence of 5 RDDs randomly split from RDD 'data', where each element
// has probability 1/5 of being assigned to each output.
val splits = data.splitSample(5)
// randomly split data so that the second output has twice the probability of receiving
// a data element as the first, and the third output has three times the probability.
val splitsW = data.weightedSplitSample(Seq(1.0, 2.0, 3.0))
Linear Supertypes
Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. SplitSampleRDDFunctions
  2. Serializable
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new SplitSampleRDDFunctions(self: RDD[T])(implicit arg0: ClassTag[T])

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  10. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  11. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  12. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  13. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  14. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  15. def splitSample(n: Int, persist: StorageLevel = defaultSL, seed: Long = scala.util.Random.nextLong): Seq[RDD[T]]

    Permalink

    Split an RDD into n random subsets, where each row is assigned to an output with equal probability 1/n.

    Split an RDD into n random subsets, where each row is assigned to an output with equal probability 1/n.

    n

    The number of output RDDs to split into

    persist

    The storage level to use for persisting the intermediate result.

    seed

    A random seed to use for sampling. Will be modified, deterministically, by partition id.

  16. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  17. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  18. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  19. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  20. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  21. def weightedSplitSample(weights: Seq[Double], persist: StorageLevel = defaultSL, seed: Long = scala.util.Random.nextLong): Seq[RDD[T]]

    Permalink

    Split an RDD into weighted random subsets, where each row is assigned to an output (j) with probability proportional to the corresponding jth weight.

    Split an RDD into weighted random subsets, where each row is assigned to an output (j) with probability proportional to the corresponding jth weight.

    weights

    A sequence of weights that determine the relative probabilities of sampling into the corresponding RDD outputs. Weights will be normalized so that they sum to 1. Individual weights must be strictly > 0.

    persist

    The storage level to use for persisting the intermediate result.

    seed

    A random seed to use for sampling. Will be modified, deterministically, by partition id.

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped