Survminer Drawing Survival Curves Using Ggplot2 R Package Version 0.4.3

\([![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3635430.svg)](https://doi.org/10.5281/zenodo.3635430)\)

https://doi.org/10.5281/zenodo.3635430

https://osf.io/3tjfk/

Histopathology Research Template 🔬

Introduction

State the marker of interest, the study objectives, and hypotheses (Knijn, Simmer, and Nagtegaal 2015) .¹

Materials & Methods

Describe Materials and Methods as highlighted in (Knijn, Simmer, and Nagtegaal 2015) .²

Describe patient characteristics, and inclusion and exclusion criteria
Describe treatment details
Describe the type of material used
Specify how expression of the biomarker was assessed
Describe the number of independent (blinded) scorers and how they scored
State the method of case selection, study design, origin of the cases, and time frame
Describe the end of the follow-up period and median follow-up time
Define all clinical endpoints examined
Specify all applied statistical methods
Describe how interactions with other clinical/pathological factors were analyzed

Statistical Methods

Generate Fake Data

Codes for generating fake data.⁵

Generate Fake Data

This code generates a fake histopathological data. Some sources for fake data generation here⁶ , here⁷ , here⁸ , here⁹ , here¹⁰ , here¹¹ , here¹² , here¹³ , and here¹⁴ .

Use this code to generate fake clinicopathologic data

                                                      source(file =                    here::                    here("R",                    "gc_fake_data.R"))

                                  wakefield::                    table_heat(x =                    fakedata,                    palette =                    "Set1",                    flip =                    TRUE,                    print =                    TRUE)

Import Data

Codes for importing data. ¹⁵

Read the data

                                                      library(readxl)                  mydata <-                    readxl::                    read_excel(here::                    here("data",                    "mydata.xlsx"))                                      # View(mydata) # Use to view data after importing

Add code for import multiple data purrr reduce

Study Population

Report General Features

Codes for reporting general features. ¹⁶

Dataframe Report

                                                            # Dataframe report                                        mydata                      %>%                                            dplyr::                      select(-                      contains("Date"))                      %>%                                            report::                      report(.)

                The data contains 250 observations of the following variables:   - ID: 250 entries: 001, n = 1; 002, n = 1; 003, n = 1 and 247 others (0 missing)   - Name: 249 entries: Aceyn, n = 1; Adalaide, n = 1; Adidas, n = 1 and 246 others (1 missing)   - Sex: 2 entries: Male, n = 127; Female, n = 122 (1 missing)   - Age: Mean = 49.54, SD = 14.16, Median = , MAD = 17.79, range: [25, 73], Skewness = 0.00, Kurtosis = -1.15, 1 missing   - Race: 7 entries: White, n = 158; Hispanic, n = 38; Black, n = 30 and 4 others (1 missing)   - PreinvasiveComponent: 2 entries: Absent, n = 203; Present, n = 46 (1 missing)   - LVI: 2 entries: Absent, n = 147; Present, n = 102 (1 missing)   - PNI: 2 entries: Absent, n = 171; Present, n = 78 (1 missing)   - Death: 2 levels: FALSE (n = 83, 33.20%); TRUE (n = 166, 66.40%) and missing (n = 1, 0.40%)   - Group: 2 entries: Treatment, n = 131; Control, n = 118 (1 missing)   - Grade: 3 entries: 3, n = 109; 1, n = 78; 2, n = 62 (1 missing)   - TStage: 4 entries: 4, n = 118; 3, n = 65; 2, n = 43 and 1 other (0 missing)   - AntiX_intensity: Mean = 2.39, SD = 0.66, Median = , MAD = 1.48, range: [1, 3], Skewness = -0.63, Kurtosis = -0.65, 1 missing   - AntiY_intensity: Mean = 2.02, SD = 0.80, Median = , MAD = 1.48, range: [1, 3], Skewness = -0.03, Kurtosis = -1.42, 1 missing   - LymphNodeMetastasis: 2 entries: Absent, n = 144; Present, n = 105 (1 missing)   - Valid: 2 levels: FALSE (n = 116, 46.40%); TRUE (n = 133, 53.20%) and missing (n = 1, 0.40%)   - Smoker: 2 levels: FALSE (n = 130, 52.00%); TRUE (n = 119, 47.60%) and missing (n = 1, 0.40%)   - Grade_Level: 3 entries: high, n = 109; low, n = 77; moderate, n = 63 (1 missing)   - DeathTime: 2 entries: Within1Year, n = 149; MoreThan1Year, n = 101 (0 missing)

                                      mydata                      %>%                                            explore::                      describe_tbl()

                250 observations with 21 variables 18 variables containing missings (NA) 0 variables with no variance

Ethics and IRB

Always Respect Patient Privacy

Always Respect Patient Privacy
- Health Information Privacy¹⁷
- Kişisel Verilerin Korunması¹⁸

Define Variable Types

Codes for defining variable types.¹⁹

print column names as vector

              c("ID", "Name", "Sex", "Age", "Race", "PreinvasiveComponent",  "LVI", "PNI", "LastFollowUpDate", "Death", "Group", "Grade",  "TStage", "AntiX_intensity", "AntiY_intensity", "LymphNodeMetastasis",  "Valid", "Smoker", "Grade_Level", "SurgeryDate", "DeathTime")

Find Key Columns

Find ID and key columns to exclude from analysis

                  vctrs::vec_assert()  dplyr::all_equal()  arsenal::compare()  visdat::vis_compare()

See the code as function in R/find_key.R.

                                          keycolumns <-                        mydata                        %>%                                                                        sapply(.,                        FUN =                        dataMaid::isKey)                        %>%                                                tibble::                        as_tibble()                        %>%                                                                                                                                            dplyr::                        select(which(.[1, ]                        ==                                                                        TRUE))                        %>%                                                                        names()                      keycolumns

                  [1] "ID"   "Name"

Variable Types

Get variable types

                                      mydata                      %>%                                            dplyr::                      select(-keycolumns)                      %>%                                            inspectdf::                      inspect_types()

                # A tibble: 4 x 4   type             cnt  pcnt col_name     <chr>          <int> <dbl> <list>     1 character         11  57.9 <chr [11]> 2 logical            3  15.8 <chr [3]>  3 numeric            3  15.8 <chr [3]>  4 POSIXct POSIXt     2  10.5 <chr [2]>

                                      mydata                      %>%                                            dplyr::                      select(-keycolumns,                      -                      contains("Date"))                      %>%                                            describer::                      describe()                      %>%                                                                                                                                knitr::                      kable(format =                      "markdown")

.column_name	.column_class	.column_type	.count_elements	.mean_value	.sd_value	.q0_value	.q25_value	.q50_value	.q75_value	.q100_value
Sex	character	character	250	NA	NA	Female	NA	NA	NA	Male
Age	numeric	double	250	49.538153	14.1595015	25	37	49	61	73
Race	character	character	250	NA	NA	Asian	NA	NA	NA	White
PreinvasiveComponent	character	character	250	NA	NA	Absent	NA	NA	NA	Present
LVI	character	character	250	NA	NA	Absent	NA	NA	NA	Present
PNI	character	character	250	NA	NA	Absent	NA	NA	NA	Present
Death	logical	logical	250	NA	NA	FALSE	NA	NA	NA	TRUE
Group	character	character	250	NA	NA	Control	NA	NA	NA	Treatment
Grade	character	character	250	NA	NA	1	NA	NA	NA	3
TStage	character	character	250	NA	NA	1	NA	NA	NA	4
AntiX_intensity	numeric	double	250	2.389558	0.6636071	1	2	2	3	3
AntiY_intensity	numeric	double	250	2.016064	0.7980211	1	1	2	3	3
LymphNodeMetastasis	character	character	250	NA	NA	Absent	NA	NA	NA	Present
Valid	logical	logical	250	NA	NA	FALSE	NA	NA	NA	TRUE
Smoker	logical	logical	250	NA	NA	FALSE	NA	NA	NA	TRUE
Grade_Level	character	character	250	NA	NA	high	NA	NA	NA	moderate
DeathTime	character	character	250	NA	NA	MoreThan1Year	NA	NA	NA	Within1Year

Plot variable types

                                      mydata                      %>%                                            dplyr::                      select(-keycolumns)                      %>%                                            inspectdf::                      inspect_types()                      %>%                                            inspectdf::                      show_plot()

                                                            # https://github.com/ropensci/visdat                                                              # http://visdat.njtierney.com/articles/using_visdat.html                                                              # https://cran.r-project.org/web/packages/visdat/index.html                                                              # http://visdat.njtierney.com/                                                                                  # visdat::vis_guess(mydata)                                                            visdat::                      vis_dat(mydata)

                                      mydata                      %>%                                            explore::                      explore_tbl()

Define Variable Types

Find `character` variables

                                          characterVariables <-                        mydata                        %>%                                                dplyr::                        select(-keycolumns)                        %>%                                                inspectdf::                        inspect_types()                        %>%                                                                                                                                            dplyr::                        filter(type                        ==                                                  "character")                        %>%                                                dplyr::                        select(col_name)                        %>%                                                dplyr::                        pull()                        %>%                                                                                                                                                                    unlist()                                            characterVariables

                                      [1] "Sex"                  "Race"                 "PreinvasiveComponent"  [4] "LVI"                  "PNI"                  "Group"                 [7] "Grade"                "TStage"               "LymphNodeMetastasis"  [10] "Grade_Level"          "DeathTime"

Find `categorical` variables

                                          categoricalVariables <-                        mydata                        %>%                                                dplyr::                        select(-keycolumns,                        -                        contains("Date"))                        %>%                                                                                                                                            describer::                        describe()                        %>%                                                janitor::                        clean_names()                        %>%                                                dplyr::                        filter(column_type                        ==                                                                                                                                              "factor")                        %>%                                                dplyr::                        select(column_name)                        %>%                                                dplyr::                        pull()                                            categoricalVariables

                  character(0)

Find `continious` variables

                                          continiousVariables <-                        mydata                        %>%                                                dplyr::                        select(-keycolumns,                        -                        contains("Date"))                        %>%                                                                                                                                            describer::                        describe()                        %>%                                                janitor::                        clean_names()                        %>%                                                dplyr::                        filter(column_type                        ==                                                                                                                                              "numeric"                        |                                                column_type                        ==                                                  "double")                        %>%                                                dplyr::                        select(column_name)                        %>%                                                dplyr::                        pull()                                            continiousVariables

                  [1] "Age"             "AntiX_intensity" "AntiY_intensity"

Find `numeric` variables

                                          numericVariables <-                        mydata                        %>%                                                dplyr::                        select(-keycolumns)                        %>%                                                inspectdf::                        inspect_types()                        %>%                                                                                                                                            dplyr::                        filter(type                        ==                                                  "numeric")                        %>%                                                dplyr::                        select(col_name)                        %>%                                                dplyr::                        pull()                        %>%                                                                                                                                                                    unlist()                                            numericVariables

                  [1] "Age"             "AntiX_intensity" "AntiY_intensity"

Find `integer` variables

                                          integerVariables <-                        mydata                        %>%                                                dplyr::                        select(-keycolumns)                        %>%                                                inspectdf::                        inspect_types()                        %>%                                                                                                                                            dplyr::                        filter(type                        ==                                                  "integer")                        %>%                                                dplyr::                        select(col_name)                        %>%                                                dplyr::                        pull()                        %>%                                                                                                                                                                    unlist()                                            integerVariables

                  NULL

Find `list` variables

                                          listVariables <-                        mydata                        %>%                                                dplyr::                        select(-keycolumns)                        %>%                                                inspectdf::                        inspect_types()                        %>%                                                                                                                                            dplyr::                        filter(type                        ==                                                  "list")                        %>%                                                dplyr::                        select(col_name)                        %>%                                                dplyr::                        pull()                        %>%                                                                                                                                                                    unlist()                      listVariables

                  NULL

Find `date` variables

                                          is_date <-                                                function(x)                        inherits(x,                        c("POSIXct",                        "POSIXt"))                                            dateVariables <-                                                names(which(sapply(mydata,                        FUN =                        is_date)                        ==                                                                        TRUE))                      dateVariables

                  [1] "LastFollowUpDate" "SurgeryDate"

Overview the Data

Codes for overviewing the data. ²⁰

View Data

                                      reactable::                      reactable(data =                      mydata,                      sortable =                      TRUE,                      resizable =                      TRUE,                      filterable =                      TRUE,                                                              searchable =                      TRUE,                      pagination =                      TRUE,                      paginationType =                      "numbers",                      showPageSizeOptions =                      TRUE,                                                              highlight =                      TRUE,                      striped =                      TRUE,                      outlined =                      TRUE,                      compact =                      TRUE,                      wrap =                      FALSE,                                                              showSortIcon =                      TRUE,                      showSortable =                      TRUE)

Overview / Exploratory Data Analysis (EDA)

Summary of Data via summarytools 📦

                                      summarytools::                      view(summarytools::                      dfSummary(mydata                      %>%                                            dplyr::                      select(-keycolumns)))

                                                            if                      (!                      dir.exists(here::                      here("out"))) {                                          dir.create(here::                      here("out"))                    }                                        summarytools::                      view(x =                      summarytools::                      dfSummary(mydata                      %>%                                            dplyr::                      select(-keycolumns)),                                                              file =                      here::                      here("out",                      "mydata_summary.html"))

Summary via dataMaid 📦

                                                            if                      (!                      dir.exists(here::                      here("out"))) {                                          dir.create(here::                      here("out"))                    }                                        dataMaid::                      makeDataReport(data =                      mydata,                      file =                      here::                      here("out",                      "dataMaid_mydata.Rmd"),                                                              replace =                      TRUE,                      openResult =                      FALSE,                      render =                      FALSE,                      quiet =                      TRUE)

Summary via explore 📦

                                                            if                      (!                      dir.exists(here::                      here("out"))) {                                          dir.create(here::                      here("out"))                    }                                        mydata                      %>%                                            dplyr::                      select(-dateVariables)                      %>%                                            explore::                      report(output_file =                      "mydata_report.html",                                                              output_dir =                      here::                      here("out"))

Glimpse of Data

                                      dplyr::                      glimpse(mydata                      %>%                                            dplyr::                      select(-keycolumns,                      -dateVariables))

                Observations: 250 Variables: 17 $ Sex                  <chr> "Female", "Female", "Female", "Female", "Male", … $ Age                  <dbl> 30, 32, 53, 57, 47, 58, 59, 54, 35, 27, 53, 55, … $ Race                 <chr> "White", "White", "White", "Hispanic", "White", … $ PreinvasiveComponent <chr> "Absent", "Absent", "Absent", "Absent", "Absent"… $ LVI                  <chr> "Present", "Absent", "Absent", "Present", "Absen… $ PNI                  <chr> "Absent", "Absent", "Absent", "Present", "Absent… $ Death                <lgl> FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, TRU… $ Group                <chr> "Control", "Control", "Control", "Control", "Con… $ Grade                <chr> "1", "1", "2", "1", "2", "2", "3", "1", "2", "1"… $ TStage               <chr> "4", "4", "3", "3", "1", "3", "3", "3", "4", "4"… $ AntiX_intensity      <dbl> 2, 2, 2, 2, 3, 1, 1, 3, 2, 3, 2, 3, 1, 3, 1, 2, … $ AntiY_intensity      <dbl> 2, 2, 2, 3, 2, 1, 2, 3, 3, 1, 1, 2, 1, 3, 1, 2, … $ LymphNodeMetastasis  <chr> "Present", "Absent", "Present", "Present", "Pres… $ Valid                <lgl> TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRU… $ Smoker               <lgl> TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TR… $ Grade_Level          <chr> "moderate", "moderate", "high", "low", "high", "… $ DeathTime            <chr> "Within1Year", "Within1Year", "Within1Year", "Wi…

                                      mydata                      %>%                                            explore::                      describe()

                # A tibble: 21 x 8    variable             type     na na_pct unique   min  mean   max    <chr>                <chr> <int>  <dbl>  <int> <dbl> <dbl> <dbl>  1 ID                   chr       0    0      250    NA NA       NA  2 Name                 chr       1    0.4    250    NA NA       NA  3 Sex                  chr       1    0.4      3    NA NA       NA  4 Age                  dbl       1    0.4     50    25 49.5     73  5 Race                 chr       1    0.4      8    NA NA       NA  6 PreinvasiveComponent chr       1    0.4      3    NA NA       NA  7 LVI                  chr       1    0.4      3    NA NA       NA  8 PNI                  chr       1    0.4      3    NA NA       NA  9 LastFollowUpDate     dat       1    0.4     13    NA NA       NA 10 Death                lgl       1    0.4      3     0  0.67     1 # … with 11 more rows

Explore

Control Data

Control Data if matching expectations

                                      visdat::                      vis_expect(data =                      mydata,                      expectation =                      ~.x                      ==                                                                  -1,                      show_perc =                      TRUE)                                        visdat::                      vis_expect(mydata,                      ~.x                      >=                                                                  25)

See missing values

                                      visdat::                      vis_miss(airquality,                      cluster =                      TRUE)

                                      visdat::                      vis_miss(airquality,                      sort_miss =                      TRUE)

                $variables                Variable   q qNA  pNA qZero pZero qBlank pBlank qInf pInf 1                Smoker 250   1 0.4%   130   52%      0      -    0    - 2                 Valid 250   1 0.4%   116 46.4%      0      -    0    - 3                 Death 250   1 0.4%    83 33.2%      0      -    0    - 4                   Sex 250   1 0.4%     0     -      0      -    0    - 5  PreinvasiveComponent 250   1 0.4%     0     -      0      -    0    - 6                   LVI 250   1 0.4%     0     -      0      -    0    - 7                   PNI 250   1 0.4%     0     -      0      -    0    - 8                 Group 250   1 0.4%     0     -      0      -    0    - 9   LymphNodeMetastasis 250   1 0.4%     0     -      0      -    0    - 10                Grade 250   1 0.4%     0     -      0      -    0    - 11      AntiX_intensity 250   1 0.4%     0     -      0      -    0    - 12      AntiY_intensity 250   1 0.4%     0     -      0      -    0    - 13          Grade_Level 250   1 0.4%     0     -      0      -    0    - 14                 Race 250   1 0.4%     0     -      0      -    0    - 15     LastFollowUpDate 250   1 0.4%     0     -      0      -    0    - 16                  Age 250   1 0.4%     0     -      0      -    0    - 17          SurgeryDate 250   1 0.4%     0     -      0      -    0    - 18                 Name 250   1 0.4%     0     -      0      -    0    - 19            DeathTime 250   0    -     0     -      0      -    0    - 20               TStage 250   0    -     0     -      0      -    0    - 21                   ID 250   0    -     0     -      0      -    0    -    qDistinct      type anomalous_percent 1          3   Logical             52.4% 2          3   Logical             46.8% 3          3   Logical             33.6% 4          3 Character              0.4% 5          3 Character              0.4% 6          3 Character              0.4% 7          3 Character              0.4% 8          3 Character              0.4% 9          3 Character              0.4% 10         4 Character              0.4% 11         4   Numeric              0.4% 12         4   Numeric              0.4% 13         4 Character              0.4% 14         8 Character              0.4% 15        13 Timestamp              0.4% 16        50   Numeric              0.4% 17       233 Timestamp              0.4% 18       250 Character              0.4% 19         2 Character                 - 20         4 Character                 - 21       250 Character                 -  $problem_variables  [1] Variable          q                 qNA               pNA                [5] qZero             pZero             qBlank            pBlank             [9] qInf              pInf              qDistinct         type              [13] anomalous_percent problems          <0 rows> (or 0-length row.names)

                                      xray::                      distributions(mydata)

                ================================================================================

                [1] "Ignoring variable LastFollowUpDate: Unsupported type for visualization."

                [1] "Ignoring variable SurgeryDate: Unsupported type for visualization."

                                  Variable p_1 p_10 p_25 p_50 p_75 p_90 p_99 1 AntiX_intensity   1  1.8    2    2    3    3    3 2 AntiY_intensity   1    1    1    2    3    3    3 3             Age  25 30.8   37   49   61   70   73

Explore Data

Summary of Data via DataExplorer 📦

                                      DataExplorer::                      plot_str(mydata)

                                      DataExplorer::                      plot_str(mydata,                      type =                      "r")

                                      DataExplorer::                      introduce(mydata)

                # A tibble: 1 x 9    rows columns discrete_columns continuous_colu… all_missing_col…   <int>   <int>            <int>            <int>            <int> 1   250      21               18                3                0 # … with 4 more variables: total_missing_values <int>, complete_rows <int>, #   total_observations <int>, memory_usage <dbl>

                                      DataExplorer::                      plot_intro(mydata)

                                      DataExplorer::                      plot_missing(mydata)

Drop columns

                                      mydata2 <-                      DataExplorer::                      drop_columns(mydata,                      "TStage")

                                      DataExplorer::                      plot_bar(mydata)

                                      DataExplorer::                      plot_bar(mydata,                      with =                      "Death")

                                      DataExplorer::                      plot_histogram(mydata)

Statistical Analysis

Learn these tests as highlighted in (Schmidt et al. 2017). ²¹

Results

Write results as described in (Knijn, Simmer, and Nagtegaal 2015) ²²

Describe the number of patients included in the analysis and reason for dropout
Report patient/disease characteristics (including the biomarker of interest) with the number of missing values
Describe the interaction of the biomarker of interest with established prognostic variables
Include at least 90 % of initial cases included in univariate and multivariate analyses
Report the estimated effect (relative risk/odds ratio, confidence interval, and p value) in univariate analysis
Report the estimated effect (hazard rate/odds ratio, confidence interval, and p value) in multivariate analysis
Report the estimated effects (hazard ratio/odds ratio, confidence interval, and p value) of other prognostic factors included in multivariate analysis

Data Dictionary

Codes for generating data dictionary.²³

Clean and Recode Data

Codes for clean and recode data.²⁴

questionr::irec()

questionr::iorder()

questionr::icut()

iris %>% mutate(sumVar = rowSums(.[1:4]))

iris %>% mutate(sumVar = rowSums(select(., contains("Sepal")))) %>% head

iris %>% mutate(sumVar = select(., contains("Sepal")) %>% rowSums()) %>% head

iRenameColumn.R

iSelectColumn.R

              <= 22 Low >= 23 & <= 41 Average  >=42 High

Impute Missing Data

transform

min -max

skewness

log

binning

optimal binning

standardize

data transformation report

inspectdf

Descriptive Statistics

Codes for Descriptive Statistics.²⁶

Table One

Report Data properties via report 📦

                                      mydata                      %>%                                            dplyr::                      select(-dplyr::                      contains("Date"))                      %>%                                            report::                      report()

                The data contains 250 observations of the following variables:   - ID: 250 entries: 001, n = 1; 002, n = 1; 003, n = 1 and 247 others (0 missing)   - Name: 249 entries: Aceyn, n = 1; Adalaide, n = 1; Adidas, n = 1 and 246 others (1 missing)   - Sex: 2 entries: Male, n = 127; Female, n = 122 (1 missing)   - Age: Mean = 49.54, SD = 14.16, Median = , MAD = 17.79, range: [25, 73], Skewness = 0.00, Kurtosis = -1.15, 1 missing   - Race: 7 entries: White, n = 158; Hispanic, n = 38; Black, n = 30 and 4 others (1 missing)   - PreinvasiveComponent: 2 entries: Absent, n = 203; Present, n = 46 (1 missing)   - LVI: 2 entries: Absent, n = 147; Present, n = 102 (1 missing)   - PNI: 2 entries: Absent, n = 171; Present, n = 78 (1 missing)   - Death: 2 levels: FALSE (n = 83, 33.20%); TRUE (n = 166, 66.40%) and missing (n = 1, 0.40%)   - Group: 2 entries: Treatment, n = 131; Control, n = 118 (1 missing)   - Grade: 3 entries: 3, n = 109; 1, n = 78; 2, n = 62 (1 missing)   - TStage: 4 entries: 4, n = 118; 3, n = 65; 2, n = 43 and 1 other (0 missing)   - AntiX_intensity: Mean = 2.39, SD = 0.66, Median = , MAD = 1.48, range: [1, 3], Skewness = -0.63, Kurtosis = -0.65, 1 missing   - AntiY_intensity: Mean = 2.02, SD = 0.80, Median = , MAD = 1.48, range: [1, 3], Skewness = -0.03, Kurtosis = -1.42, 1 missing   - LymphNodeMetastasis: 2 entries: Absent, n = 144; Present, n = 105 (1 missing)   - Valid: 2 levels: FALSE (n = 116, 46.40%); TRUE (n = 133, 53.20%) and missing (n = 1, 0.40%)   - Smoker: 2 levels: FALSE (n = 130, 52.00%); TRUE (n = 119, 47.60%) and missing (n = 1, 0.40%)   - Grade_Level: 3 entries: high, n = 109; low, n = 77; moderate, n = 63 (1 missing)   - DeathTime: 2 entries: Within1Year, n = 149; MoreThan1Year, n = 101 (0 missing)

Table 1 via arsenal 📦

                                                            # cat(names(mydata), sep = " + \n")                                                              library(arsenal)                    tab1 <-                      arsenal::                      tableby(                                          ~                                            Sex                      +                                                                                    Age                      +                                                                                    Race                      +                                                                                    PreinvasiveComponent                      +                                                                                    LVI                      +                                                                                    PNI                      +                                                                                    Death                      +                                                                                    Group                      +                                                                                    Grade                      +                                                                                    TStage                      +                                                                                                          # `Anti-X-intensity` +                                                                                                          # `Anti-Y-intensity` +                                                                                    LymphNodeMetastasis                      +                                                                                    Valid                      +                                                                                    Smoker                      +                                                                                    Grade_Level                                          ,                                          data =                      mydata                                        )                                          summary(tab1)

	Overall (N=250)
Sex
N-Miss	1
Female	122 (49.0%)
Male	127 (51.0%)
Age
N-Miss	1
Mean (SD)	49.538 (14.160)
Range	25.000 - 73.000
Race
N-Miss	1
Asian	15 (6.0%)
Bi-Racial	5 (2.0%)
Black	30 (12.0%)
Hispanic	38 (15.3%)
Native	2 (0.8%)
Other	1 (0.4%)
White	158 (63.5%)
PreinvasiveComponent
N-Miss	1
Absent	203 (81.5%)
Present	46 (18.5%)
LVI
N-Miss	1
Absent	147 (59.0%)
Present	102 (41.0%)
PNI
N-Miss	1
Absent	171 (68.7%)
Present	78 (31.3%)
Death
N-Miss	1
FALSE	83 (33.3%)
TRUE	166 (66.7%)
Group
N-Miss	1
Control	118 (47.4%)
Treatment	131 (52.6%)
Grade
N-Miss	1
1	78 (31.3%)
2	62 (24.9%)
3	109 (43.8%)
TStage
1	24 (9.6%)
2	43 (17.2%)
3	65 (26.0%)
4	118 (47.2%)
LymphNodeMetastasis
N-Miss	1
Absent	144 (57.8%)
Present	105 (42.2%)
Valid
N-Miss	1
FALSE	116 (46.6%)
TRUE	133 (53.4%)
Smoker
N-Miss	1
FALSE	130 (52.2%)
TRUE	119 (47.8%)
Grade_Level
N-Miss	1
high	109 (43.8%)
low	77 (30.9%)
moderate	63 (25.3%)

Table 1 via tableone 📦

                                                            library(tableone)                    mydata                      %>%                                            dplyr::                      select(-keycolumns,                      -dateVariables)                      %>%                                            tableone::                      CreateTableOne(data =                      .)

                                  Overall         n                                    250           Sex = Male (%)                       127 (51.0)    Age (mean (SD))                    49.54 (14.16)   Race (%)                                              Asian                              15 ( 6.0)       Bi-Racial                           5 ( 2.0)       Black                              30 (12.0)       Hispanic                           38 (15.3)       Native                              2 ( 0.8)       Other                               1 ( 0.4)       White                             158 (63.5)    PreinvasiveComponent = Present (%)    46 (18.5)    LVI = Present (%)                    102 (41.0)    PNI = Present (%)                     78 (31.3)    Death = TRUE (%)                     166 (66.7)    Group = Treatment (%)                131 (52.6)    Grade (%)                                             1                                  78 (31.3)       2                                  62 (24.9)       3                                 109 (43.8)    TStage (%)                                            1                                  24 ( 9.6)       2                                  43 (17.2)       3                                  65 (26.0)       4                                 118 (47.2)    AntiX_intensity (mean (SD))         2.39 (0.66)    AntiY_intensity (mean (SD))         2.02 (0.80)    LymphNodeMetastasis = Present (%)    105 (42.2)    Valid = TRUE (%)                     133 (53.4)    Smoker = TRUE (%)                    119 (47.8)    Grade_Level (%)                                       high                              109 (43.8)       low                                77 (30.9)       moderate                           63 (25.3)    DeathTime = Within1Year (%)          149 (59.6)

Descriptive Statistics of Continuous Variables

                                      mydata                      %>%                                            dplyr::                      select(continiousVariables, numericVariables, integerVariables)                      %>%                                                                                                                                summarytools::                      descr(.,                      style =                      "rmarkdown")

                                                            print(summarytools::                      descr(mydata),                      method =                      "render",                      table.classes =                      "st-small")

                                      mydata                      %>%                                            summarytools::                      descr(.,                      stats =                      "common",                      transpose =                      TRUE,                      headings =                      FALSE)

                                      mydata                      %>%                                            summarytools::                      descr(stats =                      "common")                      %>%                                            summarytools::                      tb()

                                      mydata$Sex                      %>%                                            summarytools::                      freq(cumul =                      FALSE,                      report.nas =                      FALSE)                      %>%                                            summarytools::                      tb()

                                      mydata                      %>%                                            explore::                      describe()                      %>%                                            dplyr::                      filter(unique                      <                                                                  5)

                # A tibble: 15 x 8    variable             type     na na_pct unique   min  mean   max    <chr>                <chr> <int>  <dbl>  <int> <dbl> <dbl> <dbl>  1 Sex                  chr       1    0.4      3    NA NA       NA  2 PreinvasiveComponent chr       1    0.4      3    NA NA       NA  3 LVI                  chr       1    0.4      3    NA NA       NA  4 PNI                  chr       1    0.4      3    NA NA       NA  5 Death                lgl       1    0.4      3     0  0.67     1  6 Group                chr       1    0.4      3    NA NA       NA  7 Grade                chr       1    0.4      4    NA NA       NA  8 TStage               chr       0    0        4    NA NA       NA  9 AntiX_intensity      dbl       1    0.4      4     1  2.39     3 10 AntiY_intensity      dbl       1    0.4      4     1  2.02     3 11 LymphNodeMetastasis  chr       1    0.4      3    NA NA       NA 12 Valid                lgl       1    0.4      3     0  0.53     1 13 Smoker               lgl       1    0.4      3     0  0.48     1 14 Grade_Level          chr       1    0.4      4    NA NA       NA 15 DeathTime            chr       0    0        2    NA NA       NA

                                      mydata                      %>%                                            explore::                      describe()                      %>%                                            dplyr::                      filter(na                      >                                                                  0)

                # A tibble: 18 x 8    variable             type     na na_pct unique   min  mean   max    <chr>                <chr> <int>  <dbl>  <int> <dbl> <dbl> <dbl>  1 Name                 chr       1    0.4    250    NA NA       NA  2 Sex                  chr       1    0.4      3    NA NA       NA  3 Age                  dbl       1    0.4     50    25 49.5     73  4 Race                 chr       1    0.4      8    NA NA       NA  5 PreinvasiveComponent chr       1    0.4      3    NA NA       NA  6 LVI                  chr       1    0.4      3    NA NA       NA  7 PNI                  chr       1    0.4      3    NA NA       NA  8 LastFollowUpDate     dat       1    0.4     13    NA NA       NA  9 Death                lgl       1    0.4      3     0  0.67     1 10 Group                chr       1    0.4      3    NA NA       NA 11 Grade                chr       1    0.4      4    NA NA       NA 12 AntiX_intensity      dbl       1    0.4      4     1  2.39     3 13 AntiY_intensity      dbl       1    0.4      4     1  2.02     3 14 LymphNodeMetastasis  chr       1    0.4      3    NA NA       NA 15 Valid                lgl       1    0.4      3     0  0.53     1 16 Smoker               lgl       1    0.4      3     0  0.48     1 17 Grade_Level          chr       1    0.4      4    NA NA       NA 18 SurgeryDate          dat       1    0.4    233    NA NA       NA

                                      mydata                      %>%                                            explore::                      describe()

                # A tibble: 21 x 8    variable             type     na na_pct unique   min  mean   max    <chr>                <chr> <int>  <dbl>  <int> <dbl> <dbl> <dbl>  1 ID                   chr       0    0      250    NA NA       NA  2 Name                 chr       1    0.4    250    NA NA       NA  3 Sex                  chr       1    0.4      3    NA NA       NA  4 Age                  dbl       1    0.4     50    25 49.5     73  5 Race                 chr       1    0.4      8    NA NA       NA  6 PreinvasiveComponent chr       1    0.4      3    NA NA       NA  7 LVI                  chr       1    0.4      3    NA NA       NA  8 PNI                  chr       1    0.4      3    NA NA       NA  9 LastFollowUpDate     dat       1    0.4     13    NA NA       NA 10 Death                lgl       1    0.4      3     0  0.67     1 # … with 11 more rows

Categorical Variables

Use R/gc_desc_cat.R to generate gc_desc_cat.Rmd containing descriptive statistics for categorical variables

                                                            source(here::                      here("R",                      "gc_desc_cat.R"))

Descriptive Statistics Sex

                                          mydata                        %>%                                                janitor::                        tabyl(Sex)                        %>%                                                janitor::                        adorn_pct_formatting(rounding =                        "half up",                                                                    digits =                        1)                        %>%                                                knitr::                        kable()

Sex	n	percent	valid_percent
Female	122	48.8%	49.0%
Male	127	50.8%	51.0%
NA	1	0.4%	-

Descriptive Statistics Race

                                          mydata                        %>%                                                janitor::                        tabyl(Race)                        %>%                                                janitor::                        adorn_pct_formatting(rounding =                        "half up",                                                                    digits =                        1)                        %>%                                                knitr::                        kable()

Race	n	percent	valid_percent
Asian	15	6.0%	6.0%
Bi-Racial	5	2.0%	2.0%
Black	30	12.0%	12.0%
Hispanic	38	15.2%	15.3%
Native	2	0.8%	0.8%
Other	1	0.4%	0.4%
White	158	63.2%	63.5%
NA	1	0.4%	-

Descriptive Statistics PreinvasiveComponent

                                          mydata                        %>%                                                janitor::                        tabyl(PreinvasiveComponent)                        %>%                                                janitor::                        adorn_pct_formatting(rounding =                        "half up",                                                                    digits =                        1)                        %>%                                                knitr::                        kable()

PreinvasiveComponent	n	percent	valid_percent
Absent	203	81.2%	81.5%
Present	46	18.4%	18.5%
NA	1	0.4%	-

Descriptive Statistics LVI

                                          mydata                        %>%                                                janitor::                        tabyl(LVI)                        %>%                                                janitor::                        adorn_pct_formatting(rounding =                        "half up",                                                                    digits =                        1)                        %>%                                                knitr::                        kable()

LVI	n	percent	valid_percent
Absent	147	58.8%	59.0%
Present	102	40.8%	41.0%
NA	1	0.4%	-

Descriptive Statistics PNI

                                          mydata                        %>%                                                janitor::                        tabyl(PNI)                        %>%                                                janitor::                        adorn_pct_formatting(rounding =                        "half up",                                                                    digits =                        1)                        %>%                                                knitr::                        kable()

PNI	n	percent	valid_percent
Absent	171	68.4%	68.7%
Present	78	31.2%	31.3%
NA	1	0.4%	-

Descriptive Statistics Group

                                          mydata                        %>%                                                janitor::                        tabyl(Group)                        %>%                                                janitor::                        adorn_pct_formatting(rounding =                        "half up",                                                                    digits =                        1)                        %>%                                                knitr::                        kable()

Group	n	percent	valid_percent
Control	118	47.2%	47.4%
Treatment	131	52.4%	52.6%
NA	1	0.4%	-

Descriptive Statistics Grade

                                          mydata                        %>%                                                janitor::                        tabyl(Grade)                        %>%                                                janitor::                        adorn_pct_formatting(rounding =                        "half up",                                                                    digits =                        1)                        %>%                                                knitr::                        kable()

Grade	n	percent	valid_percent
1	78	31.2%	31.3%
2	62	24.8%	24.9%
3	109	43.6%	43.8%
NA	1	0.4%	-

Descriptive Statistics TStage

                                          mydata                        %>%                                                janitor::                        tabyl(TStage)                        %>%                                                janitor::                        adorn_pct_formatting(rounding =                        "half up",                                                                    digits =                        1)                        %>%                                                knitr::                        kable()

TStage	n	percent
1	24	9.6%
2	43	17.2%
3	65	26.0%
4	118	47.2%

Descriptive Statistics Grade_Level

                                          mydata                        %>%                                                janitor::                        tabyl(Grade_Level)                        %>%                                                janitor::                        adorn_pct_formatting(rounding =                        "half up",                                                                    digits =                        1)                        %>%                                                knitr::                        kable()

Grade_Level	n	percent	valid_percent
high	109	43.6%	43.8%
low	77	30.8%	30.9%
moderate	63	25.2%	25.3%
NA	1	0.4%	-

Descriptive Statistics DeathTime

                                          mydata                        %>%                                                janitor::                        tabyl(DeathTime)                        %>%                                                janitor::                        adorn_pct_formatting(rounding =                        "half up",                                                                    digits =                        1)                        %>%                                                knitr::                        kable()

DeathTime	n	percent
MoreThan1Year	101	40.4%
Within1Year	149	59.6%

                                          race_stats <-                        summarytools::                        freq(mydata$Race)                                              print(race_stats,                        report.nas =                        FALSE,                        totals =                        FALSE,                        display.type =                        FALSE,                        Variable.label =                        "Race Group")

                                          mydata                        %>%                                                explore::                        describe(PreinvasiveComponent)

                  variable = PreinvasiveComponent type     = character na       = 1 of 250 (0.4%) unique   = 3  Absent  = 203 (81.2%)  Present = 46 (18.4%)  NA      = 1 (0.4%)

                                                                  ## Frequency or custom tables for categorical variables                                            SmartEDA::                        ExpCTable(mydata,                        Target =                        NULL,                        margin =                        1,                        clim =                        10,                        nlim =                        5,                        round =                        2,                                                                    bin =                        NULL,                        per =                        T)

                                      Variable         Valid Frequency Percent CumPercent 1                   Sex        Female       122    48.8       48.8 2                   Sex          Male       127    50.8       99.6 3                   Sex            NA         1     0.4      100.0 4                   Sex         TOTAL       250      NA         NA 5                  Race         Asian        15     6.0        6.0 6                  Race     Bi-Racial         5     2.0        8.0 7                  Race         Black        30    12.0       20.0 8                  Race      Hispanic        38    15.2       35.2 9                  Race            NA         1     0.4       35.6 10                 Race        Native         2     0.8       36.4 11                 Race         Other         1     0.4       36.8 12                 Race         White       158    63.2      100.0 13                 Race         TOTAL       250      NA         NA 14 PreinvasiveComponent        Absent       203    81.2       81.2 15 PreinvasiveComponent            NA         1     0.4       81.6 16 PreinvasiveComponent       Present        46    18.4      100.0 17 PreinvasiveComponent         TOTAL       250      NA         NA 18                  LVI        Absent       147    58.8       58.8 19                  LVI            NA         1     0.4       59.2 20                  LVI       Present       102    40.8      100.0 21                  LVI         TOTAL       250      NA         NA 22                  PNI        Absent       171    68.4       68.4 23                  PNI            NA         1     0.4       68.8 24                  PNI       Present        78    31.2      100.0 25                  PNI         TOTAL       250      NA         NA 26                Group       Control       118    47.2       47.2 27                Group            NA         1     0.4       47.6 28                Group     Treatment       131    52.4      100.0 29                Group         TOTAL       250      NA         NA 30                Grade             1        78    31.2       31.2 31                Grade             2        62    24.8       56.0 32                Grade             3       109    43.6       99.6 33                Grade            NA         1     0.4      100.0 34                Grade         TOTAL       250      NA         NA 35               TStage             1        24     9.6        9.6 36               TStage             2        43    17.2       26.8 37               TStage             3        65    26.0       52.8 38               TStage             4       118    47.2      100.0 39               TStage         TOTAL       250      NA         NA 40  LymphNodeMetastasis        Absent       144    57.6       57.6 41  LymphNodeMetastasis            NA         1     0.4       58.0 42  LymphNodeMetastasis       Present       105    42.0      100.0 43  LymphNodeMetastasis         TOTAL       250      NA         NA 44          Grade_Level          high       109    43.6       43.6 45          Grade_Level           low        77    30.8       74.4 46          Grade_Level      moderate        63    25.2       99.6 47          Grade_Level            NA         1     0.4      100.0 48          Grade_Level         TOTAL       250      NA         NA 49            DeathTime MoreThan1Year       101    40.4       40.4 50            DeathTime   Within1Year       149    59.6      100.0 51            DeathTime         TOTAL       250      NA         NA 52      AntiX_intensity             1        25    10.0       10.0 53      AntiX_intensity             2       102    40.8       50.8 54      AntiX_intensity             3       122    48.8       99.6 55      AntiX_intensity            NA         1     0.4      100.0 56      AntiX_intensity         TOTAL       250      NA         NA 57      AntiY_intensity             1        77    30.8       30.8 58      AntiY_intensity             2        91    36.4       67.2 59      AntiY_intensity             3        81    32.4       99.6 60      AntiY_intensity            NA         1     0.4      100.0 61      AntiY_intensity         TOTAL       250      NA         NA

                                          inspectdf::                        inspect_cat(mydata)

                  # A tibble: 16 x 5    col_name               cnt common      common_pcnt levels                <chr>                <int> <chr>             <dbl> <named list>        1 Death                    3 TRUE               66.4 <tibble [3 × 3]>    2 DeathTime                2 Within1Year        59.6 <tibble [2 × 3]>    3 Grade                    4 3                  43.6 <tibble [4 × 3]>    4 Grade_Level              4 high               43.6 <tibble [4 × 3]>    5 Group                    3 Treatment          52.4 <tibble [3 × 3]>    6 ID                     250 001                 0.4 <tibble [250 × 3]>  7 LVI                      3 Absent             58.8 <tibble [3 × 3]>    8 LymphNodeMetastasis      3 Absent             57.6 <tibble [3 × 3]>    9 Name                   250 Aceyn               0.4 <tibble [250 × 3]> 10 PNI                      3 Absent             68.4 <tibble [3 × 3]>   11 PreinvasiveComponent     3 Absent             81.2 <tibble [3 × 3]>   12 Race                     8 White              63.2 <tibble [8 × 3]>   13 Sex                      3 Male               50.8 <tibble [3 × 3]>   14 Smoker                   3 FALSE              52   <tibble [3 × 3]>   15 TStage                   4 4                  47.2 <tibble [4 × 3]>   16 Valid                    3 TRUE               53.2 <tibble [3 × 3]>

                                          inspectdf::                        inspect_cat(mydata)$levels$Group

                  # A tibble: 3 x 3   value      prop   cnt   <chr>     <dbl> <int> 1 Treatment 0.524   131 2 Control   0.472   118 3 <NA>      0.004     1

Split-Group Stats Categorical

                                                                  library(summarytools)                                            grouped_freqs <-                                                stby(data =                        mydata$Smoker,                        INDICES =                        mydata$Sex,                        FUN =                        freq,                        cumul =                        FALSE,                                                                    report.nas =                        FALSE)                                            grouped_freqs                        %>%                                                                        tb(order =                        2)

Grouped Categorical

                                          summarytools::                        stby(list(x =                        mydata$LVI,                        y =                        mydata$LymphNodeMetastasis), mydata$PNI,                                                                    summarytools::ctable)

                                                                  with(mydata, summarytools::                        stby(list(x =                        LVI,                        y =                        LymphNodeMetastasis), PNI, summarytools::ctable))

                                          mydata                        %>%                                                dplyr::                        select(characterVariables)                        %>%                                                dplyr::                        select(PreinvasiveComponent,                                                                    PNI, LVI)                        %>%                                                reactable::                        reactable(data =                        .,                        groupBy =                        c("PreinvasiveComponent",                                                                    "PNI"),                        columns =                        list(LVI =                        reactable::                        colDef(aggregate =                        "count")))

Continious Variables

                                                            source(here::                      here("R",                      "gc_desc_cont.R"))

Descriptive Statistics Age

                                      mydata                      %>%                                            jmv::                      descriptives(data =                      .,                      vars =                      "Age",                      hist =                      TRUE,                      dens =                      TRUE,                      box =                      TRUE,                                                              violin =                      TRUE,                      dot =                      TRUE,                      mode =                      TRUE,                      sd =                      TRUE,                      variance =                      TRUE,                      skew =                      TRUE,                                                              kurt =                      TRUE,                      quart =                      TRUE)

                                  DESCRIPTIVES   Descriptives                         ──────────────────────────────────                            Age         ──────────────────────────────────     N                          249       Missing                      1       Mean                      49.5       Median                    49.0       Mode                      72.0       Standard deviation        14.2       Variance                   200       Minimum                   25.0       Maximum                   73.0       Skewness               0.00389       Std. error skewness      0.154       Kurtosis                 -1.15       Std. error kurtosis      0.307       25th percentile           37.0       50th percentile           49.0       75th percentile           61.0     ──────────────────────────────────

Descriptive Statistics AntiX_intensity

                                      mydata                      %>%                                            jmv::                      descriptives(data =                      .,                      vars =                      "AntiX_intensity",                      hist =                      TRUE,                      dens =                      TRUE,                                                              box =                      TRUE,                      violin =                      TRUE,                      dot =                      TRUE,                      mode =                      TRUE,                      sd =                      TRUE,                      variance =                      TRUE,                                                              skew =                      TRUE,                      kurt =                      TRUE,                      quart =                      TRUE)

                                  DESCRIPTIVES   Descriptives                                 ──────────────────────────────────────────                            AntiX_intensity     ──────────────────────────────────────────     N                                  249       Missing                              1       Mean                              2.39       Median                            2.00       Mode                              3.00       Standard deviation               0.664       Variance                         0.440       Minimum                           1.00       Maximum                           3.00       Skewness                        -0.631       Std. error skewness              0.154       Kurtosis                        -0.640       Std. error kurtosis              0.307       25th percentile                   2.00       50th percentile                   2.00       75th percentile                   3.00     ──────────────────────────────────────────

Descriptive Statistics AntiY_intensity

                                      mydata                      %>%                                            jmv::                      descriptives(data =                      .,                      vars =                      "AntiY_intensity",                      hist =                      TRUE,                      dens =                      TRUE,                                                              box =                      TRUE,                      violin =                      TRUE,                      dot =                      TRUE,                      mode =                      TRUE,                      sd =                      TRUE,                      variance =                      TRUE,                                                              skew =                      TRUE,                      kurt =                      TRUE,                      quart =                      TRUE)

                                  DESCRIPTIVES   Descriptives                                 ──────────────────────────────────────────                            AntiY_intensity     ──────────────────────────────────────────     N                                  249       Missing                              1       Mean                              2.02       Median                            2.00       Mode                              2.00       Standard deviation               0.798       Variance                         0.637       Minimum                           1.00       Maximum                           3.00       Skewness                       -0.0289       Std. error skewness              0.154       Kurtosis                         -1.43       Std. error kurtosis              0.307       25th percentile                   1.00       50th percentile                   2.00       75th percentile                   3.00     ──────────────────────────────────────────

                                      tab <-                      tableone::                      CreateTableOne(data =                      mydata)                                          # ?print.ContTable                                        tab$ContTable

                                  Overall         n                           250             Age (mean (SD))             49.54 (14.16)   AntiX_intensity (mean (SD))  2.39 (0.66)    AntiY_intensity (mean (SD))  2.02 (0.80)

                                                            print(tab$ContTable,                      nonnormal =                      c("Anti-X-intensity"))

                                  Overall         n                           250             Age (mean (SD))             49.54 (14.16)   AntiX_intensity (mean (SD))  2.39 (0.66)    AntiY_intensity (mean (SD))  2.02 (0.80)

                                      mydata                      %>%                                            explore::                      describe(Age)

                variable = Age type     = double na       = 1 of 250 (0.4%) unique   = 50 min|max  = 25 | 73 q05|q95  = 28 | 72 q25|q75  = 37 | 61 median   = 49 mean     = 49.53815

                                      mydata                      %>%                                            dplyr::                      select(continiousVariables)                      %>%                                            SmartEDA::                      ExpNumStat(data =                      .,                                                              by =                      "A",                      gp =                      NULL,                      Qnt =                      seq(0,                      1,                      0.1),                      MesofShape =                      2,                      Outlier =                      TRUE,                      round =                      2)

                                      inspectdf::                      inspect_num(mydata,                      breaks =                      10)

                # A tibble: 3 x 10   col_name        min    q1 median  mean    q3   max     sd pcnt_na hist           <chr>         <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>   <dbl> <named list> 1 Age              25    37     49 49.5     61    73 14.2       0.4 <tibble [12… 2 AntiX_intens…     1     2      2  2.39     3     3  0.664     0.4 <tibble [12… 3 AntiY_intens…     1     1      2  2.02     3     3  0.798     0.4 <tibble [12…

                                      inspectdf::                      inspect_num(mydata)$hist$Age

                # A tibble: 27 x 2    value        prop    <chr>       <dbl>  1 [-Inf, 24) 0       2 [24, 26)   0.0201  3 [26, 28)   0.0281  4 [28, 30)   0.0361  5 [30, 32)   0.0361  6 [32, 34)   0.0602  7 [34, 36)   0.0482  8 [36, 38)   0.0241  9 [38, 40)   0.0161 10 [40, 42)   0.0602 # … with 17 more rows

                                      inspectdf::                      inspect_num(mydata,                      breaks =                      10)                      %>%                                            inspectdf::                      show_plot()

Split-Group Stats Continious

                                          grouped_descr <-                        summarytools::                        stby(data =                        mydata,                        INDICES =                        mydata$Sex,                        FUN =                        summarytools::descr,                                                                    stats =                        "common")                                              # grouped_descr %>% summarytools::tb(order = 2)                                            grouped_descr                        %>%                                                summarytools::                        tb()

Grouped Continious

                                          summarytools::                        stby(data =                        mydata,                        INDICES =                        mydata$PreinvasiveComponent,                        FUN =                        summarytools::descr,                                                                    stats =                        c("mean",                        "sd",                        "min",                        "med",                        "max"),                        transpose =                        TRUE)

                                                                  with(mydata, summarytools::                        stby(Age, PreinvasiveComponent, summarytools::descr),                                                                    stats =                        c("mean",                        "sd",                        "min",                        "med",                        "max"),                        transpose =                        TRUE)

                                          mydata                        %>%                                                                        group_by(PreinvasiveComponent)                        %>%                                                summarytools::                        descr(stats =                        "fivenum")

                                                                  ## Summary statistics by – category                                            SmartEDA::                        ExpNumStat(mydata,                        by =                        "GA",                        gp =                        "PreinvasiveComponent",                        Qnt =                        seq(0,                                                                    1,                        0.1),                        MesofShape =                        2,                        Outlier =                        TRUE,                        round =                        2)

                                      Vname                        Group  TN nNeg nZero nPos NegInf PosInf NA_Value 1   Age     PreinvasiveComponent:All 250    0     0  249      0      0        1 2   Age  PreinvasiveComponent:Absent 203    0     0  203      0      0        0 3   Age PreinvasiveComponent:Present  46    0     0   45      0      0        1 4   Age      PreinvasiveComponent:NA   0    0     0    0      0      0        0   Per_of_Missing   sum min  max  mean median    SD   CV  IQR Skewness Kurtosis 1           0.40 12335  25   73 49.54     49 14.16 0.29 24.0     0.00    -1.16 2           0.00 10117  25   73 49.84     51 14.34 0.29 23.5    -0.02    -1.20 3           2.17  2170  25   72 48.22     49 13.55 0.28 22.0     0.08    -0.98 4            NaN     0 Inf -Inf   NaN     NA    NA   NA   NA      NaN      NaN   0%  10%  20%  30%  40% 50%  60%  70% 80%  90% 100% LB.25% UB.75% nOutliers 1 25 30.8 34.0 40.4 45.0  49 54.0 59.0  64 70.0   73   1.00  97.00         0 2 25 31.0 34.0 40.6 45.0  51 54.0 59.0  65 70.8   73   2.25  96.25         0 3 25 30.8 34.8 40.2 43.6  49 51.8 56.8  59 68.6   72   3.00  91.00         0 4 NA   NA   NA   NA   NA  NA   NA   NA  NA   NA   NA     NA     NA         0

Cross Tables

Codes for cross tables.²⁷

                                                      # dependent <- c('dependent1', 'dependent2' )                                                                          # explanatory <- c('explanatory1', 'explanatory2' )                                                      dependent <-                      "PreinvasiveComponent"                                                      explanatory <-                                        c("Sex",                    "Age",                    "Grade",                    "TStage")

Change column = TRUE argument to get row or column percentages.

                                                      source(here::                    here("R",                    "gc_table_cross.R"))

Cross Table PreinvasiveComponent

                                  mydata                    %>%                                                                                                summary_factorlist(dependent =                    'PreinvasiveComponent',                                                        explanatory =                    explanatory,                                      # column = TRUE,                                                        total_col =                    TRUE,                                      p =                    TRUE,                                      add_dependent_label =                    TRUE,                                      na_include=                    FALSE                                                        # catTest = catTestfisher                                                        ) ->                    table                                    knitr::                    kable(table,                    row.names =                    FALSE,                    align =                    c('l',                    'l',                    'r',                    'r',                    'r'))

Dependent: PreinvasiveComponent		Absent	Present	Total	p
Sex	Female	104 (51.2)	17 (37.8)	121 (48.8)	0.102
	Male	99 (48.8)	28 (62.2)	127 (51.2)
Age	Mean (SD)	49.8 (14.3)	48.2 (13.6)	49.5 (14.2)	0.492
Grade	1	68 (33.7)	9 (19.6)	77 (31.0)	0.100
	2	46 (22.8)	16 (34.8)	62 (25.0)
	3	88 (43.6)	21 (45.7)	109 (44.0)
TStage	1	18 (8.9)	6 (13.0)	24 (9.6)	0.117
	2	38 (18.7)	4 (8.7)	42 (16.9)
	3	48 (23.6)	17 (37.0)	65 (26.1)
	4	99 (48.8)	19 (41.3)	118 (47.4)

                                                      library(DT)                                      datatable(mtcars,                    rownames =                    FALSE,                    filter=                    "top",                    options =                    list(pageLength =                    5,                    scrollX=T) )

chi-square posthoc pairwise

rmngb

RVAideMemoire

Plots

Codes for generating Plots.²⁸

Categorical Variables

Plots

Continious Variables

Interactive graphics

R allows to build any type of interactive graphic. My favourite library is plotly that will turn any of your ggplot2 graphic interactive in one supplementary line of code. Try to hover points, to select a zone, to click on the legend.

                                                            library(ggplot2)                                          library(plotly)                                          library(gapminder)                                        p <-                      gapminder                      %>%                                                                  filter(year                      ==                                                                  1977)                      %>%                                                                  ggplot(aes(gdpPercap, lifeExp,                      size =                      pop,                                                              color =                      continent))                      +                                                                  geom_point()                      +                                                                  scale_x_log10()                      +                                                                  theme_bw()                                                              ggplotly(p)

                                      scales::                      show_col(colours(),                      cex_label =                      0.35)

                                      embedgist <-                      gistr::                      gist("https://gist.github.com/sbalci/834ebc154c0ffcb7d5899c42dd3ab75e")                      %>%                                                                                                                                gistr::                      embed()

Alluvial

                                                            # https://stackoverflow.com/questions/43053375/weighted-sankey-alluvial-diagram-for-visualizing-discrete-and-continuous-panel/48133004                                                                                  library(tidyr)                                          library(dplyr)                                          library(alluvial)                                          library(ggplot2)                                          library(forcats)                                                              set.seed(42)                    individual <-                                            rep(LETTERS[1                      :                      10],                      each =                      2)                    timeperiod <-                                            paste0("time_",                      rep(1                      :                      2,                      10))                    discretechoice <-                                            factor(paste0("choice_",                      sample(letters[1                      :                      3],                      20,                      replace =                      T)))                    continuouschoice <-                                            ceiling(runif(20,                      0,                      100))                    d <-                                            data.frame(individual, timeperiod, discretechoice, continuouschoice)

                                                            # stacked bar diagram of discrete choice by individual                                        g <-                                            ggplot(data =                      d,                      aes(timeperiod,                      fill =                      fct_rev(discretechoice)))                    g                      +                                                                  geom_bar(position =                      "stack")                      +                                                                  guides(fill =                      guide_legend(title =                      NULL))

                                                            # alluvial diagram of discrete choice by individual                                        d_alluvial <-                      d                      %>%                                                                  select(individual, timeperiod, discretechoice)                      %>%                                                                  spread(timeperiod,                                                              discretechoice)                      %>%                                                                  group_by(time_1, time_2)                      %>%                                                                  summarize(count =                      n())                      %>%                                                                  ungroup()

                Error in UseMethod("ungroup"): no applicable method for 'ungroup' applied to an object of class "list"

                                                            alluvial(select(d_alluvial,                      -count),                      freq =                      d_alluvial$count)

                Error in log_select(.data, .fun = dplyr::select, .funname = "select", : object 'd_alluvial' not found

                                                            # stacked bar diagram of discrete choice, weighting by continuous choice                                        g                      +                                                                  geom_bar(position =                      "stack",                      aes(weight =                      continuouschoice))

                                                            library(ggalluvial)                                          ggplot(data =                      d,                      aes(x =                      timeperiod,                      stratum =                      discretechoice,                      alluvium =                      individual,                                                              y =                      continuouschoice))                      +                                                                  geom_stratum(aes(fill =                      discretechoice))                      +                                                                  geom_flow()

                                      CD44changes <-                      mydata                      %>%                                            dplyr::                      select(TumorCD44, TomurcukCD44, PeritumoralTomurcukGr4)                      %>%                                                                                                                                dplyr::                      filter(complete.cases(.))                      %>%                                            dplyr::                      group_by(TumorCD44, TomurcukCD44,                                                              PeritumoralTomurcukGr4)                      %>%                                            dplyr::                      tally()

                Error: Can't subset columns that don't exist. [31mx[39m The column `TumorCD44` doesn't exist.

                                                            library(ggalluvial)                                                              ggplot(data =                      CD44changes,                      aes(axis1 =                      TumorCD44,                      axis2 =                      TomurcukCD44,                      y =                      n))                      +                                                                                                                                                      scale_x_discrete(limits =                      c("TumorCD44",                      "TomurcukCD44"),                      expand =                      c(0.1,                      0.05))                      +                                                                                                                                                      xlab("Tumor Tomurcuk")                      +                                                                  geom_alluvium(aes(fill =                      PeritumoralTomurcukGr4,                      colour =                      PeritumoralTomurcukGr4))                      +                                                                                                                                                      geom_stratum(alpha =                      0.5)                      +                                                                  geom_text(stat =                      "stratum",                      infer.label =                      TRUE)                      +                                                                                                                                                      # geom_text(stat = 'alluvium', infer.label = TRUE) +                                                              theme_minimal()                      +                                                                  ggtitle("Changes in CD44")

                Error in ggplot(data = CD44changes, aes(axis1 = TumorCD44, axis2 = TomurcukCD44, : object 'CD44changes' not found

Codes for generating paired tests.²⁹

Codes for generating hypothesis tests.³⁰

Hypothesis Tests

Tests of Normality

jamovi

                                  mytable <-                    jmv::                    ttestIS(formula =                    HindexCTLA4                    ~                                        PeritumoralTomurcukGr4,                    data =                    mydata,                                                        vars =                    HindexCTLA4,                    students =                    FALSE,                    mann =                    TRUE,                    norm =                    TRUE,                    meanDiff =                    TRUE,                                                        desc =                    TRUE,                    plots =                    TRUE)

              Error: Argument 'vars' contains 'HindexCTLA4' which is not present in the dataset

                                                      cat("<pre class='jamovitable'>")

                ```r print(jtable(mytable$ttest)) ```  ``` Error in lapply(X = X, FUN = FUN, ...): object 'mytable' not found ```  ```r cat("

                              </pre>     ## Categorical        ### Chi-Square Cramer Association Predictive Power             ## Continious     https://stat.ethz.ch/R-manual/R-devel/library/stats/html/t.test.html  t.test(mtcars$mpg ~ mtcars$am) %>%    report::report()   report(t.test(iris$Sepal.Length, iris$Petal.Length))    ## Odds         # Frequently Used Statistical Tests By Pathologists  Frequently Used Statistical Tests^[Statistical Literacy Among Academic Pathologists: A Survey Study to Gauge Knowledge of Frequently Used Statistical Tests Among Trainees and Faculty. Archives of Pathology & Laboratory Medicine: February 2017, Vol. 141, No. 2, pp. 279-287. https://doi.org/10.5858/arpa.2016-0200-OA] by [@Schmidt2017]  - Student t test - Regression/ANOVA - Chi-square test - Mann-Whitney test (rank sum) - Fisher exact test - Survival analysis     - Kaplan-Meier/log-rank     - Cox regression - Multiple comparison adjustment     - Tukey     - Bonferroni     - Newman-Keuls - Kappa Statistic - ROC analysis - Logistic regression - Spearman rank correlation - Kruskal-Wallis test - Pearson correlation statistic - Normality test - McNemar test  # Consider Adding: - https://cran.r-project.org/web/packages/sm/sm.pdf - https://cran.r-project.org/web/packages/Rfit/Rfit.pdf   \newpage \blandscape     **Codes for ROC**.^[See [`childRmd/_16ROC.Rmd`](https://github.com/sbalci/histopathology-template/blob/master/childRmd/_16ROC.Rmd) file for other codes]   # ROC                                            **Codes for Decision Tree**.^[See [`childRmd/_17decisionTree.Rmd`](https://github.com/sbalci/histopathology-template/blob/master/childRmd/_17decisionTree.Rmd)]   # Decision Tree        **Explore**   ```r explore::explore(mydata)

Survival Analysis

Codes for Survival Analysis ³¹

Survival analysis with strata, clusters, frailties and competing risks in in Finalfit

https://www.datasurg.net/2019/09/12/survival-analysis-with-strata-clusters-frailties-and-competing-risks-in-in-finalfit/

Intracranial WHO grade I meningioma: a competing risk analysis of progression and disease-specific survival

https://link.springer.com/article/10.1007/s00701-019-04096-9

Calculate survival time

                                  mydata$int <-                    lubridate::                    interval(lubridate::                    ymd(mydata$SurgeryDate), lubridate::                    ymd(mydata$LastFollowUpDate))                  mydata$OverallTime <-                    lubridate::                    time_length(mydata$int,                    "month")                  mydata$OverallTime <-                                        round(mydata$OverallTime,                    digits =                    1)

recode death status outcome as numbers for survival analysis

                                                      ## Recoding mydata$Death into mydata$Outcome                                    mydata$Outcome <-                    forcats::                    fct_recode(as.character(mydata$Death),                    `                    1                    `                    =                      "TRUE",                    `                    0                    `                    =                      "FALSE")                  mydata$Outcome <-                                        as.numeric(as.character(mydata$Outcome))

it is always a good practice to double-check after recoding ³²

                                                      table(mydata$Death, mydata$Outcome)

                              0   1   FALSE  83   0   TRUE    0 166

Kaplan-Meier

                                                            library(survival)                                          # data(lung) km <- with(lung, Surv(time, status))                                        km <-                                            with(mydata,                      Surv(OverallTime, Outcome))                                          head(km,                      80)

                                  [1]  4.5+  7.8   7.1   7.9  10.6   6.9+  8.4+ 11.0   3.5   7.6   8.4   6.0  [13]   NA   9.5  11.2  11.7   9.2   7.6?  4.1   4.7   9.7+  8.3+  6.0+  5.5+ [25]  6.4  11.4   3.8+ 10.2   3.0   6.4  11.3   6.5+  9.7   6.7   3.3+ 11.2+ [37]  7.8   7.0   6.3  10.2   7.0  11.2   9.7+  6.8   3.1   3.6   7.8   9.5+ [49]  6.0  10.4+ 11.2+  3.3+  7.4   9.2+  9.9  11.2+ 10.0   5.4   9.5   5.4  [61]  5.9   8.4   4.1   9.2   7.3+  6.6   7.0+  8.6+  4.0   4.1  10.7   4.7  [73]  6.9   6.6   5.3   8.0   9.3   8.4+  8.6+  8.8

Kaplan-Meier Plot Log-Rank Test

                                                            # Drawing Survival Curves Using ggplot2                                                              # https://rpkgs.datanovia.com/survminer/reference/ggsurvplot.html                                        dependentKM <-                        "Surv(OverallTime, Outcome)"                                        explanatoryKM <-                        "LVI"                                                            mydata                      %>%                                                                                    finalfit::                      surv_plot(.data =                      .,                                          dependent =                      dependentKM,                                          explanatory =                      explanatoryKM,                                          xlab=                      'Time (months)',                                          pval=                      TRUE,                                          legend =                      'none',                                          break.time.by =                      12,                                          xlim =                      c(0,60)                                          # legend.labs = c('a','b')                                                              )

                                                            # Drawing Survival Curves Using ggplot2                                                              # https://rpkgs.datanovia.com/survminer/reference/ggsurvplot.html                                                            mydata                      %>%                                                                                    finalfit::                      surv_plot(.data =                      .,                                          dependent =                      "Surv(OverallTime, Outcome)",                                          explanatory =                      "LVI",                                          xlab=                      'Time (months)',                                          pval=                      TRUE,                                          legend =                      'none',                                          break.time.by =                      12,                                          xlim =                      c(0,60)                                          # legend.labs = c('a','b')                                                              )

Univariate Cox-Regression

                                                            library(finalfit)                                          library(survival)                    explanatoryUni <-                        "LVI"                                        dependentUni <-                        "Surv(OverallTime, Outcome)"                                                            tUni <-                      mydata                      %>%                                            finalfit::                      finalfit(dependentUni, explanatoryUni)                                        knitr::                      kable(tUni[,                      1                      :                      4],                      row.names =                      FALSE,                      align =                      c("l",                      "l",                      "r",                      "r",                      "r",                      "r"))

Dependent: Surv(OverallTime, Outcome)		all	HR (univariable)
LVI	Absent	147 (100.0)	-
	Present	102 (100.0)	1.59 (1.15-2.20, p=0.005)

                                      tUni_df <-                      tibble::                      as_tibble(tUni,                      .name_repair =                      "minimal")                      %>%                                            janitor::                      clean_names()                                        tUni_df_descr <-                                            paste0("When ", tUni_df$dependent_surv_overall_time_outcome[1],                                                              " is ", tUni_df$x[2],                      ", there is ", tUni_df$hr_univariable[2],                      " times risk than ",                                                              "when ", tUni_df$dependent_surv_overall_time_outcome[1],                      " is ", tUni_df$x[1],                                                              ".")

When LVI is Present, there is 1.59 (1.15-2.20, p=0.005) times risk than when LVI is Absent.

1-3-5-yr survival

                                                            summary(km_fit,                      times =                      c(12,                      36,                      60))

                Call: survfit(formula = Surv(OverallTime, Outcome) ~ LVI, data = mydata)  4 observations deleted due to missingness                  LVI=Absent   time n.risk n.event survival std.err lower 95% CI upper 95% CI    12     75      52    0.617  0.0421        0.539        0.705    36     19      35    0.252  0.0452        0.177        0.358                  LVI=Present   time n.risk n.event survival std.err lower 95% CI upper 95% CI    12     23      49    0.383  0.0566       0.2870        0.512    36      4      12    0.134  0.0488       0.0657        0.274

                                      km_fit_summary <-                                            summary(km_fit,                      times =                      c(12,                      36,                      60))                                        km_fit_df <-                                            as.data.frame(km_fit_summary[c("strata",                      "time",                      "n.risk",                      "n.event",                                                              "surv",                      "std.err",                      "lower",                      "upper")])

                                      km_fit_definition <-                      km_fit_df                      %>%                                            dplyr::                      mutate(description =                      glue::                      glue("When {strata}, {time} month survival is {scales::percent(surv)} [{scales::percent(lower)}-{scales::percent(upper)}, 95% CI]."))                      %>%                                                                                                                                dplyr::                      select(description)                      %>%                                            dplyr::                      pull()

When LVI=Absent, 12 month survival is 62% [54%-70.5%, 95% CI]., When LVI=Absent, 36 month survival is 25% [18%-35.8%, 95% CI]., When LVI=Present, 12 month survival is 38% [29%-51.2%, 95% CI]., When LVI=Present, 36 month survival is 13% [7%-27.4%, 95% CI].

                                                            source(here::                      here("R",                      "gc_survival.R"))

Survival Analysis LVI

Kaplan-Meier Plot Log-Rank Test

                                                            library(survival)                                          library(survminer)                                          library(finalfit)                                        mydata                      %>%                                                                                    finalfit::                      surv_plot('Surv(OverallTime, Outcome)',                      'LVI',                                                              xlab=                      'Time (months)',                      pval=                      TRUE,                      legend =                      'none',                                          break.time.by =                      12,                      xlim =                      c(0,60)                                                              # legend.labs = c('a','b')                                                            )

Univariate Cox-Regression

                                      explanatoryUni <-                        "LVI"                                        dependentUni <-                        "Surv(OverallTime, Outcome)"                                        tUni <-                      mydata                      %>%                                                                  finalfit(dependentUni, explanatoryUni,                      metrics =                      TRUE)                                        knitr::                      kable(tUni[,                      1                      :                      4],                      row.names =                      FALSE,                      align =                      c("l",                      "l",                      "r",                      "r",                      "r",                      "r"))

                Error in tUni[, 1:4]: incorrect number of dimensions

Univariate Cox-Regression Summary

                                      tUni_df <-                      tibble::                      as_tibble(tUni,                      .name_repair =                      "minimal")                      %>%                                            janitor::                      clean_names(dat =                      .,                                                              case =                      "snake")                                                            n_level <-                                            dim(tUni_df)[1]                                        tUni_df_descr <-                                            function(n) {                                          paste0("When ", tUni_df$dependent_surv_overall_time_outcome[1],                      " is ", tUni_df$x[n                      +                                                                                                                                                      1],                      ", there is ", tUni_df$hr_univariable[n                      +                                                                  1],                      " times risk than ",                      "when ",                                                              tUni_df$dependent_surv_overall_time_outcome[1],                      " is ", tUni_df$x[1],                      ".")                                                            }                                                                                results5 <-                      purrr::                      map(.x =                      c(2                      :n_level                      -                                                                  1),                      .f =                      tUni_df_descr)                                                              print(unlist(results5))

                [1] "When  is c(\"Absent\", \"Present\"), there is  times risk than when  is c(\"LVI\", \"\")."

Median Survival

                                      km_fit <-                                            survfit(Surv(OverallTime, Outcome)                      ~                                            LVI,                      data =                      mydata)                    km_fit

                Call: survfit(formula = Surv(OverallTime, Outcome) ~ LVI, data = mydata)     4 observations deleted due to missingness                n events median 0.95LCL 0.95UCL LVI=Absent  144    100   22.0    14.3    31.0 LVI=Present 102     64   10.5     9.9    13.8

                                      km_fit_median_df <-                                            summary(km_fit)                    km_fit_median_df <-                                            as.data.frame(km_fit_median_df$table)                      %>%                                            janitor::                      clean_names(dat =                      .,                                                              case =                      "snake")                      %>%                                            tibble::                      rownames_to_column(.data =                      .,                      var =                      "LVI")                                                                                km_fit_median_definition <-                      km_fit_median_df                      %>%                                            dplyr::                      mutate(description =                      glue::                      glue("When, LVI, {LVI}, median survival is {median} [{x0_95lcl} - {x0_95ucl}, 95% CI] months."))                      %>%                                                                                                                                dplyr::                      mutate(description =                      gsub(pattern =                      "thefactor=",                      replacement =                      " is ",                                                              x =                      description))                      %>%                                            dplyr::                      select(description)                      %>%                                            dplyr::                      pull()                                        km_fit_median_definition

                When, LVI, LVI=Absent, median survival is 22 [14.3 - 31, 95% CI] months. When, LVI, LVI=Present, median survival is 10.5 [9.9 - 13.8, 95% CI] months.

1-3-5-yr survival

                                                            summary(km_fit,                      times =                      c(12,                      36,                      60))

                Call: survfit(formula = Surv(OverallTime, Outcome) ~ LVI, data = mydata)  4 observations deleted due to missingness                  LVI=Absent   time n.risk n.event survival std.err lower 95% CI upper 95% CI    12     75      52    0.617  0.0421        0.539        0.705    36     19      35    0.252  0.0452        0.177        0.358                  LVI=Present   time n.risk n.event survival std.err lower 95% CI upper 95% CI    12     23      49    0.383  0.0566       0.2870        0.512    36      4      12    0.134  0.0488       0.0657        0.274

                                      km_fit_summary <-                                            summary(km_fit,                      times =                      c(12,                      36,                      60))                                        km_fit_df <-                                            as.data.frame(km_fit_summary[c("strata",                      "time",                      "n.risk",                      "n.event",                                                              "surv",                      "std.err",                      "lower",                      "upper")])                                        km_fit_df

                                  strata time n.risk n.event      surv    std.err      lower     upper 1  LVI=Absent   12     75      52 0.6165782 0.04211739 0.53931696 0.7049078 2  LVI=Absent   36     19      35 0.2520087 0.04515881 0.17737163 0.3580528 3 LVI=Present   12     23      49 0.3833784 0.05662684 0.28701265 0.5120993 4 LVI=Present   36      4      12 0.1340646 0.04881983 0.06566707 0.2737036

                                      km_fit_definition <-                      km_fit_df                      %>%                                            dplyr::                      mutate(description =                      glue::                      glue("When {strata}, {time} month survival is {scales::percent(surv)} [{scales::percent(lower)}-{scales::percent(upper)}, 95% CI]."))                      %>%                                                                                                                                dplyr::                      select(description)                      %>%                                            dplyr::                      pull()                                        km_fit_definition

                When LVI=Absent, 12 month survival is 62% [54%-70.5%, 95% CI]. When LVI=Absent, 36 month survival is 25% [18%-35.8%, 95% CI]. When LVI=Present, 12 month survival is 38% [29%-51.2%, 95% CI]. When LVI=Present, 36 month survival is 13% [7%-27.4%, 95% CI].

                                  records n.max n.start events   *rmean *se(rmean) median 0.95LCL LVI=Absent      144   144     144    100 24.71341   1.571856   22.0    14.3 LVI=Present     102   102     102     64 17.48672   1.904576   10.5     9.9             0.95UCL LVI=Absent     31.0 LVI=Present    13.8

                                      km_fit_median_df <-                                            summary(km_fit)                    results1html <-                                            as.data.frame(km_fit_median_df$table)                      %>%                                            janitor::                      clean_names(dat =                      .,                                                              case =                      "snake")                      %>%                                            tibble::                      rownames_to_column(.data =                      .,                      var =                      "LVI")                                        results1html[,                      1] <-                                            gsub(pattern =                      "thefactor=",                      replacement =                      "",                      x =                      results1html[,                                                              1])                                        knitr::                      kable(results1html,                      row.names =                      FALSE,                      align =                      c("l",                      rep("r",                      9)),                      format =                      "html",                                                              digits =                      1)

LVI	records	n_max	n_start	events	rmean	se_rmean	median	x0_95lcl	x0_95ucl
LVI=Absent	144	144	144	100	24.7	1.6	22.0	14.3	31.0
LVI=Present	102	102	102	64	17.5	1.9	10.5	9.9	13.8

Pairwise Comparisons

Pairwise comparison

                                      dependentKM <-                        "Surv(OverallTime, Outcome)"                                        explanatoryKM <-                        "TStage"                                                            mydata                      %>%                                                                                    finalfit::                      surv_plot(.data =                      .,                                          dependent =                      dependentKM,                                          explanatory =                      explanatoryKM,                                          xlab=                      'Time (months)',                                          pval=                      TRUE,                                          legend =                      'none',                                          break.time.by =                      12,                                          xlim =                      c(0,60)                                          # legend.labs = c('a','b')                                                              )

                Call: survfit(formula = Surv(OverallTime, Outcome) ~ LVI, data = mydata)     4 observations deleted due to missingness                n events median 0.95LCL 0.95UCL LVI=Absent  144    100   22.0    14.3    31.0 LVI=Present 102     64   10.5     9.9    13.8

                                                            print(km_fit,                                                              scale=                      1,                                          digits =                      max(options()$digits                      -                                                                  4,3),                                          print.rmean=                      getOption("survfit.print.rmean"),                                          rmean =                      getOption('survfit.rmean'),                                          print.median=                      getOption("survfit.print.median"),                                          median =                      getOption('survfit.median')                                                                                  )

                Call: survfit(formula = Surv(OverallTime, Outcome) ~ LVI, data = mydata)     4 observations deleted due to missingness                n events median 0.95LCL 0.95UCL LVI=Absent  144    100   22.0    14.3    31.0 LVI=Present 102     64   10.5     9.9    13.8

Multivariate Analysis Survival

explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% hr_plot(dependent, explanatory)

                                                            library(survival)                                          library(survminer)                                          library(finalfit)                    mb_followup                      %>%                                                                                    finalfit::                      surv_plot('Surv(OverallTime, Outcome)',                      'Operation',                                                              xlab=                      'Time (months)',                      pval=                      TRUE,                      legend =                      'none',                                          # pval.coord                                                              break.time.by =                      12,                      xlim =                      c(0,60),                      ylim =                      c(0.8,                      1)                                                              # legend.labs = c('a','b')                                                            )

Univariate Cox-Regression

                                      explanatoryUni <-                        "Operation"                                        dependentUni <-                        "Surv(OverallTime, Outcome)"                                        tUni <-                      mb_followup                      %>%                                                                  finalfit(dependentUni, explanatoryUni)                                        knitr::                      kable(tUni[,                      1                      :                      4],                      row.names =                      FALSE,                      align =                      c("l",                      "l",                      "r",                      "r",                      "r",                      "r"))

Univariate Cox-Regression Summary

                                      tUni_df <-                      tibble::                      as_tibble(tUni,                      .name_repair =                      "minimal")                      %>%                                            janitor::                      clean_names(dat =                      .,                                                              case =                      "snake")                                                            n_level <-                                            dim(tUni_df)[1]                                        tUni_df_descr <-                                            function(n) {                                          paste0("When ", tUni_df$dependent_surv_overall_time_outcome[1],                      " is ", tUni_df$x[n                      +                                                                                                                                                      1],                      ", there is ", tUni_df$hr_univariable[n                      +                                                                  1],                      " times risk than ",                      "when ",                                                              tUni_df$dependent_surv_overall_time_outcome[1],                      " is ", tUni_df$x[1],                      ".")                                                            }                                                                                results5 <-                      purrr::                      map(.x =                      c(2                      :n_level                      -                                                                  1),                      .f =                      tUni_df_descr)                                                              print(unlist(results5))

Median Survival

                                      km_fit <-                                            survfit(Surv(OverallTime, Outcome)                      ~                                            Operation,                      data =                      mb_followup)                                                              # km_fit                                                                                  # summary(km_fit)                                                            km_fit_median_df <-                                            summary(km_fit)                    km_fit_median_df <-                                            as.data.frame(km_fit_median_df$table)                      %>%                                            janitor::                      clean_names(dat =                      .,                                                              case =                      "snake")                      %>%                                            tibble::                      rownames_to_column(.data =                      .,                      var =                      "Derece")                                        km_fit_median_df                                                              # km_fit_median_df %>% knitr::kable(format = 'latex') %>%                                                              # kableExtra::kable_styling(latex_options='scale_down')                                                            km_fit_median_definition <-                      km_fit_median_df                      %>%                                            dplyr::                      mutate(description =                      glue::                      glue("When, Derece, {Derece}, median survival is {median} [{x0_95lcl} - {x0_95ucl}, 95% CI] months."))                      %>%                                                                                                                                dplyr::                      mutate(description =                      gsub(pattern =                      "thefactor=",                      replacement =                      " is ",                                                              x =                      description))                      %>%                                            dplyr::                      select(description)                      %>%                                            dplyr::                      pull()                                                              # km_fit_median_definition

1-3-5-yr survival

                                                            summary(km_fit,                      times =                      c(12,                      36,                      60))                                        km_fit_summary <-                                            summary(km_fit,                      times =                      c(12,                      36,                      60))                                        km_fit_df <-                                            as.data.frame(km_fit_summary[c("strata",                      "time",                      "n.risk",                      "n.event",                                                              "surv",                      "std.err",                      "lower",                      "upper")])                                        km_fit_df                      %>%                                            knitr::                      kable(format =                      "latex")                      %>%                                            kableExtra::                      kable_styling(latex_options =                      "scale_down")                                                                                                                        km_fit_definition <-                      km_fit_df                      %>%                                            dplyr::                      mutate(description =                      glue::                      glue("When {strata}, {time} month survival is {scales::percent(surv)} [{scales::percent(lower)}-{scales::percent(upper)}, 95% CI]."))                      %>%                                                                                                                                dplyr::                      select(description)                      %>%                                            dplyr::                      pull()                                        km_fit_definition

Pairwise Comparisons

                                      survminer::                      pairwise_survdiff(formula =                      Surv(OverallTime, Outcome)                      ~                                            Operation,                      data =                      mb_followup,                                                              p.adjust.method =                      "BH")

                                                            library(gt)                                          library(gtsummary)                                                              library(survival)                    fit1 <-                                            survfit(Surv(ttdeath, death)                      ~                                            trt, trial)                    tbl_strata_ex1 <-                                            tbl_survival(fit1,                      times =                      c(12,                      24),                      label =                      "{time} Months")                                        fit2 <-                                            survfit(Surv(ttdeath, death)                      ~                                                                  1, trial)                    tbl_nostrata_ex2 <-                                            tbl_survival(fit2,                      probs =                      c(0.1,                      0.2,                      0.5),                      header_estimate =                      "**Months**")

Interactive Survival Analysis

Codes for generating Survival Analysis.³³

Codes for generating Shiny Survival Analysis.³⁴

Correlation

Correlation Analysis

Codes for generating correlation analysis.³⁵

https://stat.ethz.ch/R-manual/R-patched/library/stats/html/cor.test.html

                                  https:                    //neuropsychology.github.io/psycho.R/                    2018                    /                    05                    /                    20                    /correlation.html                                    devtools::                    install_github("neuropsychology/psycho.R")                    # Install the newest version                                                                          remove.packages("psycho")                  renv::                    install("neuropsychology/psycho.R@0.4.0")                                      # devtools::install_github("neuropsychology/psycho.R@0.4.0")                                                                          library(psycho)                                      <!--                                                            library(tidyverse)                    -->                                    cor <-                    psycho::affective                    %>%                                                                                                                                        correlation()                                                        summary(cor)                                                                          plot(cor)                                                                          print(cor)

                                                      summary(cor)                    %>%                                                                                                                    knitr::                    kable(format =                    "latex")                    %>%                                                                                                                    kableExtra::                    kable_styling(latex_options=                    "scale_down")                                                                          ggplot(mydata,                    aes(x =                    tx_zamani_verici_yasi,                    y =                    trombosit))                    +                                                                                                geom_point()                    +                                                                                                                                        geom_smooth(method =                    lm,                    size =                    1)

Models

Codes used in models ³⁶
Use these descriptions to add autoreporting of new models

Linear Model

generate automatic reporting of model via easystats/report 📦

                                                      library(report)                  model <-                                        lm(Sepal.Length                    ~                                        Species,                    data =                    iris)                  report::                    report(model)

              We fitted a linear model (estimated using OLS) to predict Sepal.Length with Species (formula = Sepal.Length ~ Species). Standardized parameters were obtained by fitting the model on a standardized version of the dataset. Effect sizes were labelled following Funder's (2019) recommendations.  The model explains a significant and substantial proportion of variance (R2 = 0.62, F(2, 147) = 119.26, p < .001, adj. R2 = 0.61). The model's intercept, corresponding to Sepal.Length = 0 and Species = setosa, is at 5.01 (SE = 0.07, 95% CI [4.86, 5.15], p < .001). Within this model:    - The effect of Speciesversicolor is positive and can be considered as very large and significant (beta = 1.12, SE = 0.12, 95% CI [0.88, 1.37], std. beta = 1.12, p < .001).   - The effect of Speciesvirginica is positive and can be considered as very large and significant (beta = 1.91, SE = 0.12, 95% CI [1.66, 2.16], std. beta = 1.91, p < .001).

Table report for a linear model

                                  model <-                                        lm(Sepal.Length                    ~                                        Petal.Length                    +                                        Species,                    data=iris)                  r <-                                        report(model)                                      to_text(r)                                      to_table(r)

General Linear Models (GLMs)

https://stat.ethz.ch/R-manual/R-devel/library/stats/html/glm.html

                                  model <-                                        glm(vs                    ~                                        mpg                    +                                        cyl,                    data=mtcars,                    family=                    "binomial")                  r <-                                        report(model)                                                        to_fulltext(r)                                      to_fulltable(r)

                                  Where a multivariable model contains a subset of the variables specified                    in                    the full univariable set, this can be specified.                                    explanatory =                                        c("age.factor",                    "sex.factor",                    "obstruct.factor",                    "perfor.factor")                  explanatory.multi =                                        c("age.factor",                    "obstruct.factor")                  dependent =                      'mort_5yr'                                    colon_s                    %>%                                                                                                summarizer(dependent, explanatory, explanatory.multi)                                    Random effects.                                    e.g. lme4::                    glmer(dependent                    ~                                        explanatory                    +                                        (1                    |                                        random_effect),                    family=                    "binomial")                                    explanatory =                                        c("age.factor",                    "sex.factor",                    "obstruct.factor",                    "perfor.factor")                  explanatory.multi =                                        c("age.factor",                    "obstruct.factor")                  random.effect =                      "hospital"                                    dependent =                      'mort_5yr'                                    colon_s                    %>%                                                                                                summarizer(dependent, explanatory, explanatory.multi, random.effect)                                    metrics=TRUE                    provides common model metrics.                                    colon_s                    %>%                                                                                                summarizer(dependent, explanatory, explanatory.multi,                    metrics=                    TRUE)                                    Cox proportional hazards                                    e.g. survival::                    coxph(dependent                    ~                                        explanatory)                                    explanatory =                                        c("age.factor",                    "sex.factor",                    "obstruct.factor",                    "perfor.factor")                  dependent =                      "Surv(time, status)"                                                      colon_s                    %>%                                                                                                summarizer(dependent, explanatory)                                    Rather than going all-                    in                    -one, any number of subset models can be manually added on to a                    summary.factorlist() table using                    summarizer.merge(). This is particularly useful when models take a long-time to run or are complicated.                                    Note requirement                    for                    glm.id=TRUE. fit2df is a subfunction extracting most common models to a dataframe.                                    explanatory =                                        c("age.factor",                    "sex.factor",                    "obstruct.factor",                    "perfor.factor")                  explanatory.multi =                                        c("age.factor",                    "obstruct.factor")                  random.effect =                      "hospital"                                    dependent =                      'mort_5yr'                                                                          # Separate tables                                    colon_s                    %>%                                                                                                summary.factorlist(dependent, explanatory,                    glm.id=                    TRUE) ->                    example.summary                                    colon_s                    %>%                                                                                                glmuni(dependent, explanatory)                    %>%                                                                                                fit2df(estimate.suffix=                    " (univariable)") ->                    example.univariable                                    colon_s                    %>%                                                                                                glmmulti(dependent, explanatory)                    %>%                                                                                                fit2df(estimate.suffix=                    " (multivariable)") ->                    example.multivariable                                                      colon_s                    %>%                                                                                                glmmixed(dependent, explanatory, random.effect)                    %>%                                                                                                fit2df(estimate.suffix=                    " (multilevel") ->                    example.multilevel                                                        # Pipe together                                    example.summary                    %>%                                                                                                summarizer.merge(example.univariable)                    %>%                                                                                                summarizer.merge(example.multivariable)                    %>%                                                                                                summarizer.merge(example.multilevel)                    %>%                                                                                                select(-                    c(glm.id, index)) ->                    example.final                  example.final                                    Cox Proportional Hazards example with separate tables merged together.                                    explanatory =                                        c("age.factor",                    "sex.factor",                    "obstruct.factor",                    "perfor.factor")                  explanatory.multi =                                        c("age.factor",                    "obstruct.factor")                  dependent =                      "Surv(time, status)"                                                                          # Separate tables                                    colon_s                    %>%                                                                                                summary.factorlist(dependent, explanatory,                    glm.id=                    TRUE) ->                    example2.summary                                    colon_s                    %>%                                                                                                coxphuni(dependent, explanatory)                    %>%                                                                                                fit2df(estimate.suffix=                    " (univariable)") ->                    example2.univariable                                    colon_s                    %>%                                                                                                coxphmulti(dependent, explanatory.multi)                    %>%                                                                                                fit2df(estimate.suffix=                    " (multivariable)") ->                    example2.multivariable                                                        # Pipe together                                    example2.summary                    %>%                                                                                                summarizer.merge(example2.univariable)                    %>%                                                                                                summarizer.merge(example2.multivariable)                    %>%                                                                                                select(-                    c(glm.id, index)) ->                    example2.final                  example2.final                                                                                                                                                                    # OR plot                                    explanatory =                                        c("age.factor",                    "sex.factor",                    "obstruct.factor",                    "perfor.factor")                  dependent =                      'mort_5yr'                                    colon_s                    %>%                                                                                                or.plot(dependent, explanatory)                                      # Previously fitted models (`glmmulti()` or `glmmixed()`) can be provided directly to `glmfit`                                                                          # HR plot (not fully tested)                                    explanatory =                                        c("age.factor",                    "sex.factor",                    "obstruct.factor",                    "perfor.factor")                  dependent =                      "Surv(time, status)"                                    colon_s                    %>%                                                                                                hr.plot(dependent, explanatory,                    dependent_label =                    "Survival")                                      # Previously fitted models (`coxphmulti`) can be provided directly using `coxfit`

ANOVA

Bayesian

                                                      # Full report for a Bayesian logistic mixed model with effect sizes                                                        library(rstanarm)                                                        stan_glmer(vs                    ~                                        mpg                    +                                        (1                    |cyl),                    data=mtcars,                    family=                    "binomial")                    %>%                                                                                                                                        report(standardize=                    "smart",                    effsize=                    "cohen1988")                    %>%                                                                                                                                        to_fulltext()

indices of model quality and goodness of fit

Test if your model is a good model

https://easystats.github.io/performance/

Some Text ile sağkalım açısından bir ilişki bulunmamıştır (p = 0.22).

                                  my_text <-                    kableExtra::                    text_spec("Some Text",                    color =                    "red",                    background =                    "yellow")                                      # `r my_text`

Text Here

Since R Markdown use the bootstrap framework under the hood. It is possible to benefit its powerful grid system. Basically, you can consider that your row is divided in 12 subunits of same width. You can then choose to use only a few of this subunits.

Here, I use 3 subunits of size 4 (4x3=12). The last column is used for a plot. You can read more about the grid system here. I got this result showing the following code in my R Markdown document.

Tabs for sub-chapters

First

content of sub-chapter #1

Second

content of sub-chapter #2

Third

content of sub-chapter #3

Block rmdnote

Block rmdtip

Block warning

Discussion

Interpret the results in context of the working hypothesis elaborated in the introduction and other relevant studies; include a discussion of limitations of the study.
Discuss potential clinical applications and implications for future research

References

Knijn, N., F. Simmer, and I. D. Nagtegaal. 2015. "Recommendations for Reporting Histopathology Studies: A Proposal." Virchows Archiv 466 (6): 611–15. https://doi.org/10.1007/s00428-015-1762-3.

Schmidt, Robert L., Deborah J. Chute, Jorie M. Colbert-Getz, Adolfo Firpo-Betancourt, Daniel S. James, Julie K. Karp, Douglas C. Miller, et al. 2017. "Statistical Literacy Among Academic Pathologists: A Survey Study to Gauge Knowledge of Frequently Used Statistical Tests Among Trainees and Faculty." Archives of Pathology & Laboratory Medicine 141 (2): 279–87. https://doi.org/10.5858/arpa.2016-0200-OA.

A work by Serdar Balci

drserdarbalci@gmail.com

Survminer Drawing Survival Curves Using Ggplot2 R Package Version 0.4.3

Source: https://www.serdarbalci.com/histopathology-template/Report.html

Survminer Drawing Survival Curves Using Ggplot2 R Package Version 0.4.3

Introduction

Materials & Methods

Statistical Methods

Generate Fake Data

Import Data

Study Population

Report General Features

Ethics and IRB

Always Respect Patient Privacy

Define Variable Types

Find Key Columns

Find ID and key columns to exclude from analysis

Variable Types

Define Variable Types

Find character variables

Find categorical variables

Find continious variables

Find numeric variables

Find integer variables

Find list variables

Find date variables

Overview the Data

View Data

Overview / Exploratory Data Analysis (EDA)

Control Data

Explore Data

Statistical Analysis

Results

Data Dictionary

Clean and Recode Data

Impute Missing Data

transform

min -max

skewness

log

binning

optimal binning

standardize

data transformation report

inspectdf

Descriptive Statistics

Table One

Categorical Variables

Descriptive Statistics Sex

Descriptive Statistics Race

Descriptive Statistics PreinvasiveComponent

Descriptive Statistics LVI

Descriptive Statistics PNI

Descriptive Statistics Group

Descriptive Statistics Grade

Descriptive Statistics TStage

Descriptive Statistics Grade_Level

Descriptive Statistics DeathTime

Split-Group Stats Categorical

Grouped Categorical

Continious Variables

Split-Group Stats Continious

Grouped Continious

Cross Tables

chi-square posthoc pairwise

rmngb

RVAideMemoire

Plots

Categorical Variables

Plots

Continious Variables

Interactive graphics

Alluvial

Hypothesis Tests

Tests of Normality

jamovi

Survival Analysis

Kaplan-Meier

Univariate Cox-Regression

1-3-5-yr survival

Survival Analysis LVI

Pairwise comparison

Multivariate Analysis Survival

Interactive Survival Analysis

Find `character` variables

Find `categorical` variables

Find `continious` variables

Find `numeric` variables

Find `integer` variables

Find `list` variables

Find `date` variables