library(rtemis)
2 rtemis in 60 seconds
2.1 Load rtemis
2.2 Regression
For regression, the outcome must be continuous
<- rnormmat(500, 50, seed = 2019)
x <- rnorm(50)
w <- x %*% w + rnorm(500)
y <- data.frame(x, y)
dat <- resample(dat) res
05-20-25 07:06:21 Input contains more than one columns; will stratify on last :resample
.:Resampling Parameters
n.resamples: 10
resampler: strat.sub
stratify.var: y
train.p: 0.75
strat.n.bins: 4
05-20-25 07:06:21 Created 10 stratified subsamples :resample
<- dat[res$Subsample_1, ]
dat.train <- dat[-res$Subsample_1, ] dat.test
2.2.1 Check Data
check_data(x)
x: A data.table with 500 rows and 50 columns
Data types
* 50 numeric features
* 0 integer features
* 0 factors
* 0 character features
* 0 date features
Issues
* 0 constant features
* 0 duplicate cases
* 0 missing values
Recommendations
* Everything looks good
2.2.2 Single Model
<- s_GLM(dat.train, dat.test) mod
05-20-25 07:06:21 Hello, egenn :s_GLM
.:Regression Input Summary
Training features: 374 x 50
Training outcome: 374 x 1
Testing features: 126 x 50
Testing outcome: 126 x 1
05-20-25 07:06:21 Training GLM... :s_GLM
.:GLM Regression Training Summary
MSE = 1.02
RMSE = 1.01
MAE = 0.81
r = 0.99 (p = 1.3e-310)
R sq = 0.98
.:GLM Regression Testing Summary
MSE = 0.98
RMSE = 0.99
MAE = 0.76
r = 0.99 (p = 2.7e-105)
R sq = 0.98
05-20-25 07:06:21 Completed in 3.2e-04 minutes (Real: 0.02; User: 0.02; System: 1e-03) :s_GLM
2.2.3 Crossvalidated Model
<- train_cv(dat, mod = "glm") mod
05-20-25 07:06:21 Hello, egenn :train_cv
.:Regression Input Summary
Training features: 500 x 50
Training outcome: 500 x 1
05-20-25 07:06:21 Training Ranger Random Forest on 10 stratified subsamples... :train_cv
05-20-25 07:06:21 Outer resampling plan set to sequential :resLearn
.:Cross-validated Ranger
Mean MSE of 10 stratified subsamples: 27.48
Mean MSE reduction: 44.11%
05-20-25 07:06:24 Completed in 0.04 minutes (Real: 2.26; User: 12.21; System: 0.23) :train_cv
Use the describe
function to get a summary in (plain) English:
$describe() mod
Regression was performed using Ranger Random Forest. Model generalizability was assessed using 10 stratified subsamples. The mean R-squared across all testing set resamples was 0.44.
$plot() mod
2.3 Classification
For classification the outcome must be a factor. In the case of binary classification, the first level should be the “positive” class.
2.3.1 Check Data
data(Sonar, package = 'mlbench')
check_data(Sonar)
Sonar: A data.table with 208 rows and 61 columns
Data types
* 60 numeric features
* 0 integer features
* 1 factor, which is not ordered
* 0 character features
* 0 date features
Issues
* 0 constant features
* 0 duplicate cases
* 0 missing values
Recommendations
* Everything looks good
<- resample(Sonar) res
05-20-25 07:06:24 Input contains more than one columns; will stratify on last :resample
.:Resampling Parameters
n.resamples: 10
resampler: strat.sub
stratify.var: y
train.p: 0.75
strat.n.bins: 4
05-20-25 07:06:24 Using max n bins possible = 2 :strat.sub
05-20-25 07:06:24 Created 10 stratified subsamples :resample
<- Sonar[res$Subsample_1, ]
sonar.train <- Sonar[-res$Subsample_1, ] sonar.test
2.3.2 Single model
<- s_Ranger(sonar.train, sonar.test) mod
05-20-25 07:06:24 Hello, egenn :s_Ranger
05-20-25 07:06:24 Imbalanced classes: using Inverse Frequency Weighting :prepare_data
.:Classification Input Summary
Training features: 155 x 60
Training outcome: 155 x 1
Testing features: 53 x 60
Testing outcome: 53 x 1
.:Parameters
n.trees: 1000
mtry: NULL
05-20-25 07:06:24 Training Random Forest (ranger) Classification with 1000 trees... :s_Ranger
.:Ranger Classification Training Summary
Estimated
Reference M R
M 83 0
R 0 72
Overall
Sensitivity 1.0000
Specificity 1.0000
Balanced Accuracy 1.0000
PPV 1.0000
NPV 1.0000
F1 1.0000
Accuracy 1.0000
AUC 1.0000
Brier Score 0.0176
Positive Class: M
.:Ranger Classification Testing Summary
Estimated
Reference M R
M 25 3
R 11 14
Overall
Sensitivity 0.8929
Specificity 0.5600
Balanced Accuracy 0.7264
PPV 0.6944
NPV 0.8235
F1 0.7812
Accuracy 0.7358
AUC 0.8643
Brier Score 0.1652
Positive Class: M
05-20-25 07:06:24 Completed in 1.2e-03 minutes (Real: 0.07; User: 0.16; System: 0.03) :s_Ranger
2.3.3 Crossvalidated Model
<- train_cv(Sonar) mod
05-20-25 07:06:24 Hello, egenn :train_cv
.:Classification Input Summary
Training features: 208 x 60
Training outcome: 208 x 1
05-20-25 07:06:24 Training Ranger Random Forest on 10 stratified subsamples... :train_cv
05-20-25 07:06:24 Outer resampling plan set to sequential :resLearn
.:Cross-validated Ranger
Mean Balanced Accuracy of 10 stratified subsamples: 0.83
05-20-25 07:06:24 Completed in 0.01 minutes (Real: 0.70; User: 1.63; System: 0.15) :train_cv
$describe() mod
Classification was performed using Ranger Random Forest. Model generalizability was assessed using 10 stratified subsamples. The mean Balanced Accuracy across all testing set resamples was 0.83.
$plot() mod
$plotROC() mod
$plotPR() mod