io.radanalytics.silex.sample.split
Split an RDD into n
random subsets, where each row is assigned to an output with
equal probability 1/n.
Split an RDD into n
random subsets, where each row is assigned to an output with
equal probability 1/n.
The number of output RDDs to split into
The storage level to use for persisting the intermediate result.
A random seed to use for sampling. Will be modified, deterministically, by partition id.
Split an RDD into weighted random subsets, where each row is assigned to an output (j) with probability proportional to the corresponding jth weight.
Split an RDD into weighted random subsets, where each row is assigned to an output (j) with probability proportional to the corresponding jth weight.
A sequence of weights that determine the relative probabilities of sampling into the corresponding RDD outputs. Weights will be normalized so that they sum to 1. Individual weights must be strictly > 0.
The storage level to use for persisting the intermediate result.
A random seed to use for sampling. Will be modified, deterministically, by partition id.
Enhances RDDs with methods for split-sampling
The row type of the RDD