dataset keys and dataset find … support the -sample
option.
The sample option expects a sample size as an argument. If the sample
size is greater than zero then a sample output will be taken. On
dataset keys the sample is taken after any filter is supplied.
Likewise for dataset find the sample is taken after the query results
have been calculated. Is the sample size is greater then the results
returned then the who results set is return without random sampling.
If sample size is less than result set then a random sampling of the
results is taken.
If you are doing Machine Leaning type of sampling (e.g. calculating a
test and training set) then normally you create a test key list like
this dataset -sample="$N" keys
where $N
holds the test sample size.
After keylist is generated you can then create a training set by
excluding the keys associated with the sample.