19 Working with an SGE scheduler

If you are working on a cluster with an SGE scheduler, you can use the sge_submit() function to submit jobs to the scheduler. This function generates the required R script and shell script, and submits to the SGE scheduler.

Arguments:

The first argument is an R expression (surrounded in curly brackets) that will be evaluated. In this example, we are training a LightGBM model on a dataset dat.
obj_names: a character vector of the names of objects to be exported to the job. These objects must be available in the current R session.
packages: a character vector of packages to be loaded in the job.
n_threads: the number of threads to be used in the job.
sge_out: the directory where the SGE output files will be written.
h_rt: the maximum runtime for the job.
system_command: Optional system command to be used before running the R script. In this example, it is used to load the R module.

dat <- read("some_data.csv")
resultsdir <- "./results"
logdir <- "./sge_logs"
mod <- sge_submit(
    {
        train_cv(
            dat,
            alg = "lightgbm",
            alg.params = list(num_leaves = 16, learning.rate = 0.01),
            outer.resampling = setup.resample(
                resampler = "kfold", n.resamples = 10, seed = 2023
            ),
            outdir = file.path(resultsdir, "mod_LightGBM16")
        )
    },
    obj_names = c("dat", "resultsdir"),
    packages = "rtemis",
    n_threads = 12,
    sge_out = logdir,
    h_rt = "10:00:00",
    system_command = "module load r"
)