Survminer Drawing Survival Curves Using Ggplot2 R Package Version 0.4.3
\([![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3635430.svg)](https://doi.org/10.5281/zenodo.3635430)\)
https://doi.org/10.5281/zenodo.3635430
https://osf.io/3tjfk/
Histopathology Research Template 🔬
Introduction
- State the marker of interest, the study objectives, and hypotheses (Knijn, Simmer, and Nagtegaal 2015) .1
Materials & Methods
Describe Materials and Methods as highlighted in (Knijn, Simmer, and Nagtegaal 2015) .2
-
Describe patient characteristics, and inclusion and exclusion criteria
-
Describe treatment details
-
Describe the type of material used
-
Specify how expression of the biomarker was assessed
-
Describe the number of independent (blinded) scorers and how they scored
-
State the method of case selection, study design, origin of the cases, and time frame
-
Describe the end of the follow-up period and median follow-up time
-
Define all clinical endpoints examined
-
Specify all applied statistical methods
-
Describe how interactions with other clinical/pathological factors were analyzed
Statistical Methods
Generate Fake Data
Codes for generating fake data.5
Generate Fake Data
This code generates a fake histopathological data. Some sources for fake data generation here6 , here7 , here8 , here9 , here10 , here11 , here12 , here13 , and here14 .
Use this code to generate fake clinicopathologic data
source(file = here:: here("R", "gc_fake_data.R"))
wakefield:: table_heat(x = fakedata, palette = "Set1", flip = TRUE, print = TRUE)
Import Data
Codes for importing data. 15
Read the data
library(readxl) mydata <- readxl:: read_excel(here:: here("data", "mydata.xlsx")) # View(mydata) # Use to view data after importing
Add code for import multiple data purrr reduce
Study Population
Report General Features
Codes for reporting general features. 16
Dataframe Report
# Dataframe report mydata %>% dplyr:: select(- contains("Date")) %>% report:: report(.)
The data contains 250 observations of the following variables: - ID: 250 entries: 001, n = 1; 002, n = 1; 003, n = 1 and 247 others (0 missing) - Name: 249 entries: Aceyn, n = 1; Adalaide, n = 1; Adidas, n = 1 and 246 others (1 missing) - Sex: 2 entries: Male, n = 127; Female, n = 122 (1 missing) - Age: Mean = 49.54, SD = 14.16, Median = , MAD = 17.79, range: [25, 73], Skewness = 0.00, Kurtosis = -1.15, 1 missing - Race: 7 entries: White, n = 158; Hispanic, n = 38; Black, n = 30 and 4 others (1 missing) - PreinvasiveComponent: 2 entries: Absent, n = 203; Present, n = 46 (1 missing) - LVI: 2 entries: Absent, n = 147; Present, n = 102 (1 missing) - PNI: 2 entries: Absent, n = 171; Present, n = 78 (1 missing) - Death: 2 levels: FALSE (n = 83, 33.20%); TRUE (n = 166, 66.40%) and missing (n = 1, 0.40%) - Group: 2 entries: Treatment, n = 131; Control, n = 118 (1 missing) - Grade: 3 entries: 3, n = 109; 1, n = 78; 2, n = 62 (1 missing) - TStage: 4 entries: 4, n = 118; 3, n = 65; 2, n = 43 and 1 other (0 missing) - AntiX_intensity: Mean = 2.39, SD = 0.66, Median = , MAD = 1.48, range: [1, 3], Skewness = -0.63, Kurtosis = -0.65, 1 missing - AntiY_intensity: Mean = 2.02, SD = 0.80, Median = , MAD = 1.48, range: [1, 3], Skewness = -0.03, Kurtosis = -1.42, 1 missing - LymphNodeMetastasis: 2 entries: Absent, n = 144; Present, n = 105 (1 missing) - Valid: 2 levels: FALSE (n = 116, 46.40%); TRUE (n = 133, 53.20%) and missing (n = 1, 0.40%) - Smoker: 2 levels: FALSE (n = 130, 52.00%); TRUE (n = 119, 47.60%) and missing (n = 1, 0.40%) - Grade_Level: 3 entries: high, n = 109; low, n = 77; moderate, n = 63 (1 missing) - DeathTime: 2 entries: Within1Year, n = 149; MoreThan1Year, n = 101 (0 missing)
mydata %>% explore:: describe_tbl()
250 observations with 21 variables 18 variables containing missings (NA) 0 variables with no variance
Ethics and IRB
Always Respect Patient Privacy
Always Respect Patient Privacy
- Health Information Privacy17
- Kişisel Verilerin Korunması18
Define Variable Types
Codes for defining variable types.19
print column names as vector
c("ID", "Name", "Sex", "Age", "Race", "PreinvasiveComponent", "LVI", "PNI", "LastFollowUpDate", "Death", "Group", "Grade", "TStage", "AntiX_intensity", "AntiY_intensity", "LymphNodeMetastasis", "Valid", "Smoker", "Grade_Level", "SurgeryDate", "DeathTime")
Find Key Columns
Find ID and key columns to exclude from analysis
vctrs::vec_assert() dplyr::all_equal() arsenal::compare() visdat::vis_compare()
See the code as function in R/find_key.R
.
keycolumns <- mydata %>% sapply(., FUN = dataMaid::isKey) %>% tibble:: as_tibble() %>% dplyr:: select(which(.[1, ] == TRUE)) %>% names() keycolumns
[1] "ID" "Name"
Variable Types
Get variable types
mydata %>% dplyr:: select(-keycolumns) %>% inspectdf:: inspect_types()
# A tibble: 4 x 4 type cnt pcnt col_name <chr> <int> <dbl> <list> 1 character 11 57.9 <chr [11]> 2 logical 3 15.8 <chr [3]> 3 numeric 3 15.8 <chr [3]> 4 POSIXct POSIXt 2 10.5 <chr [2]>
mydata %>% dplyr:: select(-keycolumns, - contains("Date")) %>% describer:: describe() %>% knitr:: kable(format = "markdown")
.column_name | .column_class | .column_type | .count_elements | .mean_value | .sd_value | .q0_value | .q25_value | .q50_value | .q75_value | .q100_value |
---|---|---|---|---|---|---|---|---|---|---|
Sex | character | character | 250 | NA | NA | Female | NA | NA | NA | Male |
Age | numeric | double | 250 | 49.538153 | 14.1595015 | 25 | 37 | 49 | 61 | 73 |
Race | character | character | 250 | NA | NA | Asian | NA | NA | NA | White |
PreinvasiveComponent | character | character | 250 | NA | NA | Absent | NA | NA | NA | Present |
LVI | character | character | 250 | NA | NA | Absent | NA | NA | NA | Present |
PNI | character | character | 250 | NA | NA | Absent | NA | NA | NA | Present |
Death | logical | logical | 250 | NA | NA | FALSE | NA | NA | NA | TRUE |
Group | character | character | 250 | NA | NA | Control | NA | NA | NA | Treatment |
Grade | character | character | 250 | NA | NA | 1 | NA | NA | NA | 3 |
TStage | character | character | 250 | NA | NA | 1 | NA | NA | NA | 4 |
AntiX_intensity | numeric | double | 250 | 2.389558 | 0.6636071 | 1 | 2 | 2 | 3 | 3 |
AntiY_intensity | numeric | double | 250 | 2.016064 | 0.7980211 | 1 | 1 | 2 | 3 | 3 |
LymphNodeMetastasis | character | character | 250 | NA | NA | Absent | NA | NA | NA | Present |
Valid | logical | logical | 250 | NA | NA | FALSE | NA | NA | NA | TRUE |
Smoker | logical | logical | 250 | NA | NA | FALSE | NA | NA | NA | TRUE |
Grade_Level | character | character | 250 | NA | NA | high | NA | NA | NA | moderate |
DeathTime | character | character | 250 | NA | NA | MoreThan1Year | NA | NA | NA | Within1Year |
Plot variable types
mydata %>% dplyr:: select(-keycolumns) %>% inspectdf:: inspect_types() %>% inspectdf:: show_plot()
# https://github.com/ropensci/visdat # http://visdat.njtierney.com/articles/using_visdat.html # https://cran.r-project.org/web/packages/visdat/index.html # http://visdat.njtierney.com/ # visdat::vis_guess(mydata) visdat:: vis_dat(mydata)
mydata %>% explore:: explore_tbl()
Define Variable Types
Find character
variables
characterVariables <- mydata %>% dplyr:: select(-keycolumns) %>% inspectdf:: inspect_types() %>% dplyr:: filter(type == "character") %>% dplyr:: select(col_name) %>% dplyr:: pull() %>% unlist() characterVariables
[1] "Sex" "Race" "PreinvasiveComponent" [4] "LVI" "PNI" "Group" [7] "Grade" "TStage" "LymphNodeMetastasis" [10] "Grade_Level" "DeathTime"
Find categorical
variables
categoricalVariables <- mydata %>% dplyr:: select(-keycolumns, - contains("Date")) %>% describer:: describe() %>% janitor:: clean_names() %>% dplyr:: filter(column_type == "factor") %>% dplyr:: select(column_name) %>% dplyr:: pull() categoricalVariables
character(0)
Find continious
variables
continiousVariables <- mydata %>% dplyr:: select(-keycolumns, - contains("Date")) %>% describer:: describe() %>% janitor:: clean_names() %>% dplyr:: filter(column_type == "numeric" | column_type == "double") %>% dplyr:: select(column_name) %>% dplyr:: pull() continiousVariables
[1] "Age" "AntiX_intensity" "AntiY_intensity"
Find numeric
variables
numericVariables <- mydata %>% dplyr:: select(-keycolumns) %>% inspectdf:: inspect_types() %>% dplyr:: filter(type == "numeric") %>% dplyr:: select(col_name) %>% dplyr:: pull() %>% unlist() numericVariables
[1] "Age" "AntiX_intensity" "AntiY_intensity"
Find integer
variables
integerVariables <- mydata %>% dplyr:: select(-keycolumns) %>% inspectdf:: inspect_types() %>% dplyr:: filter(type == "integer") %>% dplyr:: select(col_name) %>% dplyr:: pull() %>% unlist() integerVariables
NULL
Find list
variables
listVariables <- mydata %>% dplyr:: select(-keycolumns) %>% inspectdf:: inspect_types() %>% dplyr:: filter(type == "list") %>% dplyr:: select(col_name) %>% dplyr:: pull() %>% unlist() listVariables
NULL
Find date
variables
is_date <- function(x) inherits(x, c("POSIXct", "POSIXt")) dateVariables <- names(which(sapply(mydata, FUN = is_date) == TRUE)) dateVariables
[1] "LastFollowUpDate" "SurgeryDate"
Overview the Data
Codes for overviewing the data. 20
View Data
reactable:: reactable(data = mydata, sortable = TRUE, resizable = TRUE, filterable = TRUE, searchable = TRUE, pagination = TRUE, paginationType = "numbers", showPageSizeOptions = TRUE, highlight = TRUE, striped = TRUE, outlined = TRUE, compact = TRUE, wrap = FALSE, showSortIcon = TRUE, showSortable = TRUE)
Overview / Exploratory Data Analysis (EDA)
Summary of Data via summarytools 📦
summarytools:: view(summarytools:: dfSummary(mydata %>% dplyr:: select(-keycolumns)))
if (! dir.exists(here:: here("out"))) { dir.create(here:: here("out")) } summarytools:: view(x = summarytools:: dfSummary(mydata %>% dplyr:: select(-keycolumns)), file = here:: here("out", "mydata_summary.html"))
Summary via dataMaid 📦
if (! dir.exists(here:: here("out"))) { dir.create(here:: here("out")) } dataMaid:: makeDataReport(data = mydata, file = here:: here("out", "dataMaid_mydata.Rmd"), replace = TRUE, openResult = FALSE, render = FALSE, quiet = TRUE)
Summary via explore 📦
if (! dir.exists(here:: here("out"))) { dir.create(here:: here("out")) } mydata %>% dplyr:: select(-dateVariables) %>% explore:: report(output_file = "mydata_report.html", output_dir = here:: here("out"))
Glimpse of Data
dplyr:: glimpse(mydata %>% dplyr:: select(-keycolumns, -dateVariables))
Observations: 250 Variables: 17 $ Sex <chr> "Female", "Female", "Female", "Female", "Male", … $ Age <dbl> 30, 32, 53, 57, 47, 58, 59, 54, 35, 27, 53, 55, … $ Race <chr> "White", "White", "White", "Hispanic", "White", … $ PreinvasiveComponent <chr> "Absent", "Absent", "Absent", "Absent", "Absent"… $ LVI <chr> "Present", "Absent", "Absent", "Present", "Absen… $ PNI <chr> "Absent", "Absent", "Absent", "Present", "Absent… $ Death <lgl> FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, TRU… $ Group <chr> "Control", "Control", "Control", "Control", "Con… $ Grade <chr> "1", "1", "2", "1", "2", "2", "3", "1", "2", "1"… $ TStage <chr> "4", "4", "3", "3", "1", "3", "3", "3", "4", "4"… $ AntiX_intensity <dbl> 2, 2, 2, 2, 3, 1, 1, 3, 2, 3, 2, 3, 1, 3, 1, 2, … $ AntiY_intensity <dbl> 2, 2, 2, 3, 2, 1, 2, 3, 3, 1, 1, 2, 1, 3, 1, 2, … $ LymphNodeMetastasis <chr> "Present", "Absent", "Present", "Present", "Pres… $ Valid <lgl> TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRU… $ Smoker <lgl> TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TR… $ Grade_Level <chr> "moderate", "moderate", "high", "low", "high", "… $ DeathTime <chr> "Within1Year", "Within1Year", "Within1Year", "Wi…
mydata %>% explore:: describe()
# A tibble: 21 x 8 variable type na na_pct unique min mean max <chr> <chr> <int> <dbl> <int> <dbl> <dbl> <dbl> 1 ID chr 0 0 250 NA NA NA 2 Name chr 1 0.4 250 NA NA NA 3 Sex chr 1 0.4 3 NA NA NA 4 Age dbl 1 0.4 50 25 49.5 73 5 Race chr 1 0.4 8 NA NA NA 6 PreinvasiveComponent chr 1 0.4 3 NA NA NA 7 LVI chr 1 0.4 3 NA NA NA 8 PNI chr 1 0.4 3 NA NA NA 9 LastFollowUpDate dat 1 0.4 13 NA NA NA 10 Death lgl 1 0.4 3 0 0.67 1 # … with 11 more rows
Explore
Control Data
Control Data if matching expectations
visdat:: vis_expect(data = mydata, expectation = ~.x == -1, show_perc = TRUE) visdat:: vis_expect(mydata, ~.x >= 25)
See missing values
visdat:: vis_miss(airquality, cluster = TRUE)
visdat:: vis_miss(airquality, sort_miss = TRUE)
$variables Variable q qNA pNA qZero pZero qBlank pBlank qInf pInf 1 Smoker 250 1 0.4% 130 52% 0 - 0 - 2 Valid 250 1 0.4% 116 46.4% 0 - 0 - 3 Death 250 1 0.4% 83 33.2% 0 - 0 - 4 Sex 250 1 0.4% 0 - 0 - 0 - 5 PreinvasiveComponent 250 1 0.4% 0 - 0 - 0 - 6 LVI 250 1 0.4% 0 - 0 - 0 - 7 PNI 250 1 0.4% 0 - 0 - 0 - 8 Group 250 1 0.4% 0 - 0 - 0 - 9 LymphNodeMetastasis 250 1 0.4% 0 - 0 - 0 - 10 Grade 250 1 0.4% 0 - 0 - 0 - 11 AntiX_intensity 250 1 0.4% 0 - 0 - 0 - 12 AntiY_intensity 250 1 0.4% 0 - 0 - 0 - 13 Grade_Level 250 1 0.4% 0 - 0 - 0 - 14 Race 250 1 0.4% 0 - 0 - 0 - 15 LastFollowUpDate 250 1 0.4% 0 - 0 - 0 - 16 Age 250 1 0.4% 0 - 0 - 0 - 17 SurgeryDate 250 1 0.4% 0 - 0 - 0 - 18 Name 250 1 0.4% 0 - 0 - 0 - 19 DeathTime 250 0 - 0 - 0 - 0 - 20 TStage 250 0 - 0 - 0 - 0 - 21 ID 250 0 - 0 - 0 - 0 - qDistinct type anomalous_percent 1 3 Logical 52.4% 2 3 Logical 46.8% 3 3 Logical 33.6% 4 3 Character 0.4% 5 3 Character 0.4% 6 3 Character 0.4% 7 3 Character 0.4% 8 3 Character 0.4% 9 3 Character 0.4% 10 4 Character 0.4% 11 4 Numeric 0.4% 12 4 Numeric 0.4% 13 4 Character 0.4% 14 8 Character 0.4% 15 13 Timestamp 0.4% 16 50 Numeric 0.4% 17 233 Timestamp 0.4% 18 250 Character 0.4% 19 2 Character - 20 4 Character - 21 250 Character - $problem_variables [1] Variable q qNA pNA [5] qZero pZero qBlank pBlank [9] qInf pInf qDistinct type [13] anomalous_percent problems <0 rows> (or 0-length row.names)
xray:: distributions(mydata)
================================================================================
[1] "Ignoring variable LastFollowUpDate: Unsupported type for visualization."
[1] "Ignoring variable SurgeryDate: Unsupported type for visualization."
Variable p_1 p_10 p_25 p_50 p_75 p_90 p_99 1 AntiX_intensity 1 1.8 2 2 3 3 3 2 AntiY_intensity 1 1 1 2 3 3 3 3 Age 25 30.8 37 49 61 70 73
Explore Data
Summary of Data via DataExplorer 📦
DataExplorer:: plot_str(mydata)
DataExplorer:: plot_str(mydata, type = "r")
DataExplorer:: introduce(mydata)
# A tibble: 1 x 9 rows columns discrete_columns continuous_colu… all_missing_col… <int> <int> <int> <int> <int> 1 250 21 18 3 0 # … with 4 more variables: total_missing_values <int>, complete_rows <int>, # total_observations <int>, memory_usage <dbl>
DataExplorer:: plot_intro(mydata)
DataExplorer:: plot_missing(mydata)
Drop columns
mydata2 <- DataExplorer:: drop_columns(mydata, "TStage")
DataExplorer:: plot_bar(mydata)
DataExplorer:: plot_bar(mydata, with = "Death")
DataExplorer:: plot_histogram(mydata)
Statistical Analysis
Learn these tests as highlighted in (Schmidt et al. 2017). 21
Results
Write results as described in (Knijn, Simmer, and Nagtegaal 2015) 22
-
Describe the number of patients included in the analysis and reason for dropout
-
Report patient/disease characteristics (including the biomarker of interest) with the number of missing values
-
Describe the interaction of the biomarker of interest with established prognostic variables
-
Include at least 90 % of initial cases included in univariate and multivariate analyses
-
Report the estimated effect (relative risk/odds ratio, confidence interval, and p value) in univariate analysis
-
Report the estimated effect (hazard rate/odds ratio, confidence interval, and p value) in multivariate analysis
-
Report the estimated effects (hazard ratio/odds ratio, confidence interval, and p value) of other prognostic factors included in multivariate analysis
Data Dictionary
Codes for generating data dictionary.23
Clean and Recode Data
Codes for clean and recode data.24
questionr::irec()
questionr::iorder()
questionr::icut()
iris %>% mutate(sumVar = rowSums(.[1:4]))
iris %>% mutate(sumVar = rowSums(select(., contains("Sepal")))) %>% head
iris %>% mutate(sumVar = select(., contains("Sepal")) %>% rowSums()) %>% head
iRenameColumn.R
iSelectColumn.R
<= 22 Low >= 23 & <= 41 Average >=42 High
Impute Missing Data
transform
min -max
skewness
log
binning
optimal binning
standardize
data transformation report
inspectdf
Descriptive Statistics
Codes for Descriptive Statistics.26
Table One
Report Data properties via report 📦
mydata %>% dplyr:: select(-dplyr:: contains("Date")) %>% report:: report()
The data contains 250 observations of the following variables: - ID: 250 entries: 001, n = 1; 002, n = 1; 003, n = 1 and 247 others (0 missing) - Name: 249 entries: Aceyn, n = 1; Adalaide, n = 1; Adidas, n = 1 and 246 others (1 missing) - Sex: 2 entries: Male, n = 127; Female, n = 122 (1 missing) - Age: Mean = 49.54, SD = 14.16, Median = , MAD = 17.79, range: [25, 73], Skewness = 0.00, Kurtosis = -1.15, 1 missing - Race: 7 entries: White, n = 158; Hispanic, n = 38; Black, n = 30 and 4 others (1 missing) - PreinvasiveComponent: 2 entries: Absent, n = 203; Present, n = 46 (1 missing) - LVI: 2 entries: Absent, n = 147; Present, n = 102 (1 missing) - PNI: 2 entries: Absent, n = 171; Present, n = 78 (1 missing) - Death: 2 levels: FALSE (n = 83, 33.20%); TRUE (n = 166, 66.40%) and missing (n = 1, 0.40%) - Group: 2 entries: Treatment, n = 131; Control, n = 118 (1 missing) - Grade: 3 entries: 3, n = 109; 1, n = 78; 2, n = 62 (1 missing) - TStage: 4 entries: 4, n = 118; 3, n = 65; 2, n = 43 and 1 other (0 missing) - AntiX_intensity: Mean = 2.39, SD = 0.66, Median = , MAD = 1.48, range: [1, 3], Skewness = -0.63, Kurtosis = -0.65, 1 missing - AntiY_intensity: Mean = 2.02, SD = 0.80, Median = , MAD = 1.48, range: [1, 3], Skewness = -0.03, Kurtosis = -1.42, 1 missing - LymphNodeMetastasis: 2 entries: Absent, n = 144; Present, n = 105 (1 missing) - Valid: 2 levels: FALSE (n = 116, 46.40%); TRUE (n = 133, 53.20%) and missing (n = 1, 0.40%) - Smoker: 2 levels: FALSE (n = 130, 52.00%); TRUE (n = 119, 47.60%) and missing (n = 1, 0.40%) - Grade_Level: 3 entries: high, n = 109; low, n = 77; moderate, n = 63 (1 missing) - DeathTime: 2 entries: Within1Year, n = 149; MoreThan1Year, n = 101 (0 missing)
Table 1 via arsenal 📦
# cat(names(mydata), sep = " + \n") library(arsenal) tab1 <- arsenal:: tableby( ~ Sex + Age + Race + PreinvasiveComponent + LVI + PNI + Death + Group + Grade + TStage + # `Anti-X-intensity` + # `Anti-Y-intensity` + LymphNodeMetastasis + Valid + Smoker + Grade_Level , data = mydata ) summary(tab1)
Overall (N=250) | |
---|---|
Sex | |
N-Miss | 1 |
Female | 122 (49.0%) |
Male | 127 (51.0%) |
Age | |
N-Miss | 1 |
Mean (SD) | 49.538 (14.160) |
Range | 25.000 - 73.000 |
Race | |
N-Miss | 1 |
Asian | 15 (6.0%) |
Bi-Racial | 5 (2.0%) |
Black | 30 (12.0%) |
Hispanic | 38 (15.3%) |
Native | 2 (0.8%) |
Other | 1 (0.4%) |
White | 158 (63.5%) |
PreinvasiveComponent | |
N-Miss | 1 |
Absent | 203 (81.5%) |
Present | 46 (18.5%) |
LVI | |
N-Miss | 1 |
Absent | 147 (59.0%) |
Present | 102 (41.0%) |
PNI | |
N-Miss | 1 |
Absent | 171 (68.7%) |
Present | 78 (31.3%) |
Death | |
N-Miss | 1 |
FALSE | 83 (33.3%) |
TRUE | 166 (66.7%) |
Group | |
N-Miss | 1 |
Control | 118 (47.4%) |
Treatment | 131 (52.6%) |
Grade | |
N-Miss | 1 |
1 | 78 (31.3%) |
2 | 62 (24.9%) |
3 | 109 (43.8%) |
TStage | |
1 | 24 (9.6%) |
2 | 43 (17.2%) |
3 | 65 (26.0%) |
4 | 118 (47.2%) |
LymphNodeMetastasis | |
N-Miss | 1 |
Absent | 144 (57.8%) |
Present | 105 (42.2%) |
Valid | |
N-Miss | 1 |
FALSE | 116 (46.6%) |
TRUE | 133 (53.4%) |
Smoker | |
N-Miss | 1 |
FALSE | 130 (52.2%) |
TRUE | 119 (47.8%) |
Grade_Level | |
N-Miss | 1 |
high | 109 (43.8%) |
low | 77 (30.9%) |
moderate | 63 (25.3%) |
Table 1 via tableone 📦
library(tableone) mydata %>% dplyr:: select(-keycolumns, -dateVariables) %>% tableone:: CreateTableOne(data = .)
Overall n 250 Sex = Male (%) 127 (51.0) Age (mean (SD)) 49.54 (14.16) Race (%) Asian 15 ( 6.0) Bi-Racial 5 ( 2.0) Black 30 (12.0) Hispanic 38 (15.3) Native 2 ( 0.8) Other 1 ( 0.4) White 158 (63.5) PreinvasiveComponent = Present (%) 46 (18.5) LVI = Present (%) 102 (41.0) PNI = Present (%) 78 (31.3) Death = TRUE (%) 166 (66.7) Group = Treatment (%) 131 (52.6) Grade (%) 1 78 (31.3) 2 62 (24.9) 3 109 (43.8) TStage (%) 1 24 ( 9.6) 2 43 (17.2) 3 65 (26.0) 4 118 (47.2) AntiX_intensity (mean (SD)) 2.39 (0.66) AntiY_intensity (mean (SD)) 2.02 (0.80) LymphNodeMetastasis = Present (%) 105 (42.2) Valid = TRUE (%) 133 (53.4) Smoker = TRUE (%) 119 (47.8) Grade_Level (%) high 109 (43.8) low 77 (30.9) moderate 63 (25.3) DeathTime = Within1Year (%) 149 (59.6)
Descriptive Statistics of Continuous Variables
mydata %>% dplyr:: select(continiousVariables, numericVariables, integerVariables) %>% summarytools:: descr(., style = "rmarkdown")
print(summarytools:: descr(mydata), method = "render", table.classes = "st-small")
mydata %>% summarytools:: descr(., stats = "common", transpose = TRUE, headings = FALSE)
mydata %>% summarytools:: descr(stats = "common") %>% summarytools:: tb()
mydata$Sex %>% summarytools:: freq(cumul = FALSE, report.nas = FALSE) %>% summarytools:: tb()
mydata %>% explore:: describe() %>% dplyr:: filter(unique < 5)
# A tibble: 15 x 8 variable type na na_pct unique min mean max <chr> <chr> <int> <dbl> <int> <dbl> <dbl> <dbl> 1 Sex chr 1 0.4 3 NA NA NA 2 PreinvasiveComponent chr 1 0.4 3 NA NA NA 3 LVI chr 1 0.4 3 NA NA NA 4 PNI chr 1 0.4 3 NA NA NA 5 Death lgl 1 0.4 3 0 0.67 1 6 Group chr 1 0.4 3 NA NA NA 7 Grade chr 1 0.4 4 NA NA NA 8 TStage chr 0 0 4 NA NA NA 9 AntiX_intensity dbl 1 0.4 4 1 2.39 3 10 AntiY_intensity dbl 1 0.4 4 1 2.02 3 11 LymphNodeMetastasis chr 1 0.4 3 NA NA NA 12 Valid lgl 1 0.4 3 0 0.53 1 13 Smoker lgl 1 0.4 3 0 0.48 1 14 Grade_Level chr 1 0.4 4 NA NA NA 15 DeathTime chr 0 0 2 NA NA NA
mydata %>% explore:: describe() %>% dplyr:: filter(na > 0)
# A tibble: 18 x 8 variable type na na_pct unique min mean max <chr> <chr> <int> <dbl> <int> <dbl> <dbl> <dbl> 1 Name chr 1 0.4 250 NA NA NA 2 Sex chr 1 0.4 3 NA NA NA 3 Age dbl 1 0.4 50 25 49.5 73 4 Race chr 1 0.4 8 NA NA NA 5 PreinvasiveComponent chr 1 0.4 3 NA NA NA 6 LVI chr 1 0.4 3 NA NA NA 7 PNI chr 1 0.4 3 NA NA NA 8 LastFollowUpDate dat 1 0.4 13 NA NA NA 9 Death lgl 1 0.4 3 0 0.67 1 10 Group chr 1 0.4 3 NA NA NA 11 Grade chr 1 0.4 4 NA NA NA 12 AntiX_intensity dbl 1 0.4 4 1 2.39 3 13 AntiY_intensity dbl 1 0.4 4 1 2.02 3 14 LymphNodeMetastasis chr 1 0.4 3 NA NA NA 15 Valid lgl 1 0.4 3 0 0.53 1 16 Smoker lgl 1 0.4 3 0 0.48 1 17 Grade_Level chr 1 0.4 4 NA NA NA 18 SurgeryDate dat 1 0.4 233 NA NA NA
mydata %>% explore:: describe()
# A tibble: 21 x 8 variable type na na_pct unique min mean max <chr> <chr> <int> <dbl> <int> <dbl> <dbl> <dbl> 1 ID chr 0 0 250 NA NA NA 2 Name chr 1 0.4 250 NA NA NA 3 Sex chr 1 0.4 3 NA NA NA 4 Age dbl 1 0.4 50 25 49.5 73 5 Race chr 1 0.4 8 NA NA NA 6 PreinvasiveComponent chr 1 0.4 3 NA NA NA 7 LVI chr 1 0.4 3 NA NA NA 8 PNI chr 1 0.4 3 NA NA NA 9 LastFollowUpDate dat 1 0.4 13 NA NA NA 10 Death lgl 1 0.4 3 0 0.67 1 # … with 11 more rows
Categorical Variables
Use R/gc_desc_cat.R
to generate gc_desc_cat.Rmd
containing descriptive statistics for categorical variables
source(here:: here("R", "gc_desc_cat.R"))
Descriptive Statistics Sex
mydata %>% janitor:: tabyl(Sex) %>% janitor:: adorn_pct_formatting(rounding = "half up", digits = 1) %>% knitr:: kable()
Sex | n | percent | valid_percent |
---|---|---|---|
Female | 122 | 48.8% | 49.0% |
Male | 127 | 50.8% | 51.0% |
NA | 1 | 0.4% | - |
Descriptive Statistics Race
mydata %>% janitor:: tabyl(Race) %>% janitor:: adorn_pct_formatting(rounding = "half up", digits = 1) %>% knitr:: kable()
Race | n | percent | valid_percent |
---|---|---|---|
Asian | 15 | 6.0% | 6.0% |
Bi-Racial | 5 | 2.0% | 2.0% |
Black | 30 | 12.0% | 12.0% |
Hispanic | 38 | 15.2% | 15.3% |
Native | 2 | 0.8% | 0.8% |
Other | 1 | 0.4% | 0.4% |
White | 158 | 63.2% | 63.5% |
NA | 1 | 0.4% | - |
Descriptive Statistics PreinvasiveComponent
mydata %>% janitor:: tabyl(PreinvasiveComponent) %>% janitor:: adorn_pct_formatting(rounding = "half up", digits = 1) %>% knitr:: kable()
PreinvasiveComponent | n | percent | valid_percent |
---|---|---|---|
Absent | 203 | 81.2% | 81.5% |
Present | 46 | 18.4% | 18.5% |
NA | 1 | 0.4% | - |
Descriptive Statistics LVI
mydata %>% janitor:: tabyl(LVI) %>% janitor:: adorn_pct_formatting(rounding = "half up", digits = 1) %>% knitr:: kable()
LVI | n | percent | valid_percent |
---|---|---|---|
Absent | 147 | 58.8% | 59.0% |
Present | 102 | 40.8% | 41.0% |
NA | 1 | 0.4% | - |
Descriptive Statistics PNI
mydata %>% janitor:: tabyl(PNI) %>% janitor:: adorn_pct_formatting(rounding = "half up", digits = 1) %>% knitr:: kable()
PNI | n | percent | valid_percent |
---|---|---|---|
Absent | 171 | 68.4% | 68.7% |
Present | 78 | 31.2% | 31.3% |
NA | 1 | 0.4% | - |
Descriptive Statistics Group
mydata %>% janitor:: tabyl(Group) %>% janitor:: adorn_pct_formatting(rounding = "half up", digits = 1) %>% knitr:: kable()
Group | n | percent | valid_percent |
---|---|---|---|
Control | 118 | 47.2% | 47.4% |
Treatment | 131 | 52.4% | 52.6% |
NA | 1 | 0.4% | - |
Descriptive Statistics Grade
mydata %>% janitor:: tabyl(Grade) %>% janitor:: adorn_pct_formatting(rounding = "half up", digits = 1) %>% knitr:: kable()
Grade | n | percent | valid_percent |
---|---|---|---|
1 | 78 | 31.2% | 31.3% |
2 | 62 | 24.8% | 24.9% |
3 | 109 | 43.6% | 43.8% |
NA | 1 | 0.4% | - |
Descriptive Statistics TStage
mydata %>% janitor:: tabyl(TStage) %>% janitor:: adorn_pct_formatting(rounding = "half up", digits = 1) %>% knitr:: kable()
TStage | n | percent |
---|---|---|
1 | 24 | 9.6% |
2 | 43 | 17.2% |
3 | 65 | 26.0% |
4 | 118 | 47.2% |
Descriptive Statistics Grade_Level
mydata %>% janitor:: tabyl(Grade_Level) %>% janitor:: adorn_pct_formatting(rounding = "half up", digits = 1) %>% knitr:: kable()
Grade_Level | n | percent | valid_percent |
---|---|---|---|
high | 109 | 43.6% | 43.8% |
low | 77 | 30.8% | 30.9% |
moderate | 63 | 25.2% | 25.3% |
NA | 1 | 0.4% | - |
Descriptive Statistics DeathTime
mydata %>% janitor:: tabyl(DeathTime) %>% janitor:: adorn_pct_formatting(rounding = "half up", digits = 1) %>% knitr:: kable()
DeathTime | n | percent |
---|---|---|
MoreThan1Year | 101 | 40.4% |
Within1Year | 149 | 59.6% |
race_stats <- summarytools:: freq(mydata$Race) print(race_stats, report.nas = FALSE, totals = FALSE, display.type = FALSE, Variable.label = "Race Group")
mydata %>% explore:: describe(PreinvasiveComponent)
variable = PreinvasiveComponent type = character na = 1 of 250 (0.4%) unique = 3 Absent = 203 (81.2%) Present = 46 (18.4%) NA = 1 (0.4%)
## Frequency or custom tables for categorical variables SmartEDA:: ExpCTable(mydata, Target = NULL, margin = 1, clim = 10, nlim = 5, round = 2, bin = NULL, per = T)
Variable Valid Frequency Percent CumPercent 1 Sex Female 122 48.8 48.8 2 Sex Male 127 50.8 99.6 3 Sex NA 1 0.4 100.0 4 Sex TOTAL 250 NA NA 5 Race Asian 15 6.0 6.0 6 Race Bi-Racial 5 2.0 8.0 7 Race Black 30 12.0 20.0 8 Race Hispanic 38 15.2 35.2 9 Race NA 1 0.4 35.6 10 Race Native 2 0.8 36.4 11 Race Other 1 0.4 36.8 12 Race White 158 63.2 100.0 13 Race TOTAL 250 NA NA 14 PreinvasiveComponent Absent 203 81.2 81.2 15 PreinvasiveComponent NA 1 0.4 81.6 16 PreinvasiveComponent Present 46 18.4 100.0 17 PreinvasiveComponent TOTAL 250 NA NA 18 LVI Absent 147 58.8 58.8 19 LVI NA 1 0.4 59.2 20 LVI Present 102 40.8 100.0 21 LVI TOTAL 250 NA NA 22 PNI Absent 171 68.4 68.4 23 PNI NA 1 0.4 68.8 24 PNI Present 78 31.2 100.0 25 PNI TOTAL 250 NA NA 26 Group Control 118 47.2 47.2 27 Group NA 1 0.4 47.6 28 Group Treatment 131 52.4 100.0 29 Group TOTAL 250 NA NA 30 Grade 1 78 31.2 31.2 31 Grade 2 62 24.8 56.0 32 Grade 3 109 43.6 99.6 33 Grade NA 1 0.4 100.0 34 Grade TOTAL 250 NA NA 35 TStage 1 24 9.6 9.6 36 TStage 2 43 17.2 26.8 37 TStage 3 65 26.0 52.8 38 TStage 4 118 47.2 100.0 39 TStage TOTAL 250 NA NA 40 LymphNodeMetastasis Absent 144 57.6 57.6 41 LymphNodeMetastasis NA 1 0.4 58.0 42 LymphNodeMetastasis Present 105 42.0 100.0 43 LymphNodeMetastasis TOTAL 250 NA NA 44 Grade_Level high 109 43.6 43.6 45 Grade_Level low 77 30.8 74.4 46 Grade_Level moderate 63 25.2 99.6 47 Grade_Level NA 1 0.4 100.0 48 Grade_Level TOTAL 250 NA NA 49 DeathTime MoreThan1Year 101 40.4 40.4 50 DeathTime Within1Year 149 59.6 100.0 51 DeathTime TOTAL 250 NA NA 52 AntiX_intensity 1 25 10.0 10.0 53 AntiX_intensity 2 102 40.8 50.8 54 AntiX_intensity 3 122 48.8 99.6 55 AntiX_intensity NA 1 0.4 100.0 56 AntiX_intensity TOTAL 250 NA NA 57 AntiY_intensity 1 77 30.8 30.8 58 AntiY_intensity 2 91 36.4 67.2 59 AntiY_intensity 3 81 32.4 99.6 60 AntiY_intensity NA 1 0.4 100.0 61 AntiY_intensity TOTAL 250 NA NA
inspectdf:: inspect_cat(mydata)
# A tibble: 16 x 5 col_name cnt common common_pcnt levels <chr> <int> <chr> <dbl> <named list> 1 Death 3 TRUE 66.4 <tibble [3 × 3]> 2 DeathTime 2 Within1Year 59.6 <tibble [2 × 3]> 3 Grade 4 3 43.6 <tibble [4 × 3]> 4 Grade_Level 4 high 43.6 <tibble [4 × 3]> 5 Group 3 Treatment 52.4 <tibble [3 × 3]> 6 ID 250 001 0.4 <tibble [250 × 3]> 7 LVI 3 Absent 58.8 <tibble [3 × 3]> 8 LymphNodeMetastasis 3 Absent 57.6 <tibble [3 × 3]> 9 Name 250 Aceyn 0.4 <tibble [250 × 3]> 10 PNI 3 Absent 68.4 <tibble [3 × 3]> 11 PreinvasiveComponent 3 Absent 81.2 <tibble [3 × 3]> 12 Race 8 White 63.2 <tibble [8 × 3]> 13 Sex 3 Male 50.8 <tibble [3 × 3]> 14 Smoker 3 FALSE 52 <tibble [3 × 3]> 15 TStage 4 4 47.2 <tibble [4 × 3]> 16 Valid 3 TRUE 53.2 <tibble [3 × 3]>
inspectdf:: inspect_cat(mydata)$levels$Group
# A tibble: 3 x 3 value prop cnt <chr> <dbl> <int> 1 Treatment 0.524 131 2 Control 0.472 118 3 <NA> 0.004 1
Split-Group Stats Categorical
library(summarytools) grouped_freqs <- stby(data = mydata$Smoker, INDICES = mydata$Sex, FUN = freq, cumul = FALSE, report.nas = FALSE) grouped_freqs %>% tb(order = 2)
Grouped Categorical
summarytools:: stby(list(x = mydata$LVI, y = mydata$LymphNodeMetastasis), mydata$PNI, summarytools::ctable)
with(mydata, summarytools:: stby(list(x = LVI, y = LymphNodeMetastasis), PNI, summarytools::ctable))
mydata %>% dplyr:: select(characterVariables) %>% dplyr:: select(PreinvasiveComponent, PNI, LVI) %>% reactable:: reactable(data = ., groupBy = c("PreinvasiveComponent", "PNI"), columns = list(LVI = reactable:: colDef(aggregate = "count")))
Continious Variables
source(here:: here("R", "gc_desc_cont.R"))
Descriptive Statistics Age
mydata %>% jmv:: descriptives(data = ., vars = "Age", hist = TRUE, dens = TRUE, box = TRUE, violin = TRUE, dot = TRUE, mode = TRUE, sd = TRUE, variance = TRUE, skew = TRUE, kurt = TRUE, quart = TRUE)
DESCRIPTIVES Descriptives ────────────────────────────────── Age ────────────────────────────────── N 249 Missing 1 Mean 49.5 Median 49.0 Mode 72.0 Standard deviation 14.2 Variance 200 Minimum 25.0 Maximum 73.0 Skewness 0.00389 Std. error skewness 0.154 Kurtosis -1.15 Std. error kurtosis 0.307 25th percentile 37.0 50th percentile 49.0 75th percentile 61.0 ──────────────────────────────────
Descriptive Statistics AntiX_intensity
mydata %>% jmv:: descriptives(data = ., vars = "AntiX_intensity", hist = TRUE, dens = TRUE, box = TRUE, violin = TRUE, dot = TRUE, mode = TRUE, sd = TRUE, variance = TRUE, skew = TRUE, kurt = TRUE, quart = TRUE)
DESCRIPTIVES Descriptives ────────────────────────────────────────── AntiX_intensity ────────────────────────────────────────── N 249 Missing 1 Mean 2.39 Median 2.00 Mode 3.00 Standard deviation 0.664 Variance 0.440 Minimum 1.00 Maximum 3.00 Skewness -0.631 Std. error skewness 0.154 Kurtosis -0.640 Std. error kurtosis 0.307 25th percentile 2.00 50th percentile 2.00 75th percentile 3.00 ──────────────────────────────────────────
Descriptive Statistics AntiY_intensity
mydata %>% jmv:: descriptives(data = ., vars = "AntiY_intensity", hist = TRUE, dens = TRUE, box = TRUE, violin = TRUE, dot = TRUE, mode = TRUE, sd = TRUE, variance = TRUE, skew = TRUE, kurt = TRUE, quart = TRUE)
DESCRIPTIVES Descriptives ────────────────────────────────────────── AntiY_intensity ────────────────────────────────────────── N 249 Missing 1 Mean 2.02 Median 2.00 Mode 2.00 Standard deviation 0.798 Variance 0.637 Minimum 1.00 Maximum 3.00 Skewness -0.0289 Std. error skewness 0.154 Kurtosis -1.43 Std. error kurtosis 0.307 25th percentile 1.00 50th percentile 2.00 75th percentile 3.00 ──────────────────────────────────────────
tab <- tableone:: CreateTableOne(data = mydata) # ?print.ContTable tab$ContTable
Overall n 250 Age (mean (SD)) 49.54 (14.16) AntiX_intensity (mean (SD)) 2.39 (0.66) AntiY_intensity (mean (SD)) 2.02 (0.80)
print(tab$ContTable, nonnormal = c("Anti-X-intensity"))
Overall n 250 Age (mean (SD)) 49.54 (14.16) AntiX_intensity (mean (SD)) 2.39 (0.66) AntiY_intensity (mean (SD)) 2.02 (0.80)
mydata %>% explore:: describe(Age)
variable = Age type = double na = 1 of 250 (0.4%) unique = 50 min|max = 25 | 73 q05|q95 = 28 | 72 q25|q75 = 37 | 61 median = 49 mean = 49.53815
mydata %>% dplyr:: select(continiousVariables) %>% SmartEDA:: ExpNumStat(data = ., by = "A", gp = NULL, Qnt = seq(0, 1, 0.1), MesofShape = 2, Outlier = TRUE, round = 2)
inspectdf:: inspect_num(mydata, breaks = 10)
# A tibble: 3 x 10 col_name min q1 median mean q3 max sd pcnt_na hist <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <named list> 1 Age 25 37 49 49.5 61 73 14.2 0.4 <tibble [12… 2 AntiX_intens… 1 2 2 2.39 3 3 0.664 0.4 <tibble [12… 3 AntiY_intens… 1 1 2 2.02 3 3 0.798 0.4 <tibble [12…
inspectdf:: inspect_num(mydata)$hist$Age
# A tibble: 27 x 2 value prop <chr> <dbl> 1 [-Inf, 24) 0 2 [24, 26) 0.0201 3 [26, 28) 0.0281 4 [28, 30) 0.0361 5 [30, 32) 0.0361 6 [32, 34) 0.0602 7 [34, 36) 0.0482 8 [36, 38) 0.0241 9 [38, 40) 0.0161 10 [40, 42) 0.0602 # … with 17 more rows
inspectdf:: inspect_num(mydata, breaks = 10) %>% inspectdf:: show_plot()
Split-Group Stats Continious
grouped_descr <- summarytools:: stby(data = mydata, INDICES = mydata$Sex, FUN = summarytools::descr, stats = "common") # grouped_descr %>% summarytools::tb(order = 2) grouped_descr %>% summarytools:: tb()
Grouped Continious
summarytools:: stby(data = mydata, INDICES = mydata$PreinvasiveComponent, FUN = summarytools::descr, stats = c("mean", "sd", "min", "med", "max"), transpose = TRUE)
with(mydata, summarytools:: stby(Age, PreinvasiveComponent, summarytools::descr), stats = c("mean", "sd", "min", "med", "max"), transpose = TRUE)
mydata %>% group_by(PreinvasiveComponent) %>% summarytools:: descr(stats = "fivenum")
## Summary statistics by – category SmartEDA:: ExpNumStat(mydata, by = "GA", gp = "PreinvasiveComponent", Qnt = seq(0, 1, 0.1), MesofShape = 2, Outlier = TRUE, round = 2)
Vname Group TN nNeg nZero nPos NegInf PosInf NA_Value 1 Age PreinvasiveComponent:All 250 0 0 249 0 0 1 2 Age PreinvasiveComponent:Absent 203 0 0 203 0 0 0 3 Age PreinvasiveComponent:Present 46 0 0 45 0 0 1 4 Age PreinvasiveComponent:NA 0 0 0 0 0 0 0 Per_of_Missing sum min max mean median SD CV IQR Skewness Kurtosis 1 0.40 12335 25 73 49.54 49 14.16 0.29 24.0 0.00 -1.16 2 0.00 10117 25 73 49.84 51 14.34 0.29 23.5 -0.02 -1.20 3 2.17 2170 25 72 48.22 49 13.55 0.28 22.0 0.08 -0.98 4 NaN 0 Inf -Inf NaN NA NA NA NA NaN NaN 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% LB.25% UB.75% nOutliers 1 25 30.8 34.0 40.4 45.0 49 54.0 59.0 64 70.0 73 1.00 97.00 0 2 25 31.0 34.0 40.6 45.0 51 54.0 59.0 65 70.8 73 2.25 96.25 0 3 25 30.8 34.8 40.2 43.6 49 51.8 56.8 59 68.6 72 3.00 91.00 0 4 NA NA NA NA NA NA NA NA NA NA NA NA NA 0
Cross Tables
Codes for cross tables.27
# dependent <- c('dependent1', 'dependent2' ) # explanatory <- c('explanatory1', 'explanatory2' ) dependent <- "PreinvasiveComponent" explanatory <- c("Sex", "Age", "Grade", "TStage")
Change column = TRUE
argument to get row or column percentages.
source(here:: here("R", "gc_table_cross.R"))
Cross Table PreinvasiveComponent
mydata %>% summary_factorlist(dependent = 'PreinvasiveComponent', explanatory = explanatory, # column = TRUE, total_col = TRUE, p = TRUE, add_dependent_label = TRUE, na_include= FALSE # catTest = catTestfisher ) -> table knitr:: kable(table, row.names = FALSE, align = c('l', 'l', 'r', 'r', 'r'))
Dependent: PreinvasiveComponent | Absent | Present | Total | p | |
---|---|---|---|---|---|
Sex | Female | 104 (51.2) | 17 (37.8) | 121 (48.8) | 0.102 |
Male | 99 (48.8) | 28 (62.2) | 127 (51.2) | ||
Age | Mean (SD) | 49.8 (14.3) | 48.2 (13.6) | 49.5 (14.2) | 0.492 |
Grade | 1 | 68 (33.7) | 9 (19.6) | 77 (31.0) | 0.100 |
2 | 46 (22.8) | 16 (34.8) | 62 (25.0) | ||
3 | 88 (43.6) | 21 (45.7) | 109 (44.0) | ||
TStage | 1 | 18 (8.9) | 6 (13.0) | 24 (9.6) | 0.117 |
2 | 38 (18.7) | 4 (8.7) | 42 (16.9) | ||
3 | 48 (23.6) | 17 (37.0) | 65 (26.1) | ||
4 | 99 (48.8) | 19 (41.3) | 118 (47.4) |
library(DT) datatable(mtcars, rownames = FALSE, filter= "top", options = list(pageLength = 5, scrollX=T) )
chi-square posthoc pairwise
rmngb
RVAideMemoire
Plots
Codes for generating Plots.28
Categorical Variables
Plots
Continious Variables
Interactive graphics
R allows to build any type of interactive graphic. My favourite library is plotly that will turn any of your ggplot2 graphic interactive in one supplementary line of code. Try to hover points, to select a zone, to click on the legend.
library(ggplot2) library(plotly) library(gapminder) p <- gapminder %>% filter(year == 1977) %>% ggplot(aes(gdpPercap, lifeExp, size = pop, color = continent)) + geom_point() + scale_x_log10() + theme_bw() ggplotly(p)
scales:: show_col(colours(), cex_label = 0.35)
embedgist <- gistr:: gist("https://gist.github.com/sbalci/834ebc154c0ffcb7d5899c42dd3ab75e") %>% gistr:: embed()
Alluvial
# https://stackoverflow.com/questions/43053375/weighted-sankey-alluvial-diagram-for-visualizing-discrete-and-continuous-panel/48133004 library(tidyr) library(dplyr) library(alluvial) library(ggplot2) library(forcats) set.seed(42) individual <- rep(LETTERS[1 : 10], each = 2) timeperiod <- paste0("time_", rep(1 : 2, 10)) discretechoice <- factor(paste0("choice_", sample(letters[1 : 3], 20, replace = T))) continuouschoice <- ceiling(runif(20, 0, 100)) d <- data.frame(individual, timeperiod, discretechoice, continuouschoice)
# stacked bar diagram of discrete choice by individual g <- ggplot(data = d, aes(timeperiod, fill = fct_rev(discretechoice))) g + geom_bar(position = "stack") + guides(fill = guide_legend(title = NULL))
# alluvial diagram of discrete choice by individual d_alluvial <- d %>% select(individual, timeperiod, discretechoice) %>% spread(timeperiod, discretechoice) %>% group_by(time_1, time_2) %>% summarize(count = n()) %>% ungroup()
Error in UseMethod("ungroup"): no applicable method for 'ungroup' applied to an object of class "list"
alluvial(select(d_alluvial, -count), freq = d_alluvial$count)
Error in log_select(.data, .fun = dplyr::select, .funname = "select", : object 'd_alluvial' not found
# stacked bar diagram of discrete choice, weighting by continuous choice g + geom_bar(position = "stack", aes(weight = continuouschoice))
library(ggalluvial) ggplot(data = d, aes(x = timeperiod, stratum = discretechoice, alluvium = individual, y = continuouschoice)) + geom_stratum(aes(fill = discretechoice)) + geom_flow()
CD44changes <- mydata %>% dplyr:: select(TumorCD44, TomurcukCD44, PeritumoralTomurcukGr4) %>% dplyr:: filter(complete.cases(.)) %>% dplyr:: group_by(TumorCD44, TomurcukCD44, PeritumoralTomurcukGr4) %>% dplyr:: tally()
Error: Can't subset columns that don't exist. [31mx[39m The column `TumorCD44` doesn't exist.
library(ggalluvial) ggplot(data = CD44changes, aes(axis1 = TumorCD44, axis2 = TomurcukCD44, y = n)) + scale_x_discrete(limits = c("TumorCD44", "TomurcukCD44"), expand = c(0.1, 0.05)) + xlab("Tumor Tomurcuk") + geom_alluvium(aes(fill = PeritumoralTomurcukGr4, colour = PeritumoralTomurcukGr4)) + geom_stratum(alpha = 0.5) + geom_text(stat = "stratum", infer.label = TRUE) + # geom_text(stat = 'alluvium', infer.label = TRUE) + theme_minimal() + ggtitle("Changes in CD44")
Error in ggplot(data = CD44changes, aes(axis1 = TumorCD44, axis2 = TomurcukCD44, : object 'CD44changes' not found
Codes for generating paired tests.29
Codes for generating hypothesis tests.30
Hypothesis Tests
Tests of Normality
jamovi
mytable <- jmv:: ttestIS(formula = HindexCTLA4 ~ PeritumoralTomurcukGr4, data = mydata, vars = HindexCTLA4, students = FALSE, mann = TRUE, norm = TRUE, meanDiff = TRUE, desc = TRUE, plots = TRUE)
Error: Argument 'vars' contains 'HindexCTLA4' which is not present in the dataset
cat("<pre class='jamovitable'>")
```r print(jtable(mytable$ttest)) ``` ``` Error in lapply(X = X, FUN = FUN, ...): object 'mytable' not found ``` ```r cat("
")
</pre> ## Categorical ### Chi-Square Cramer Association Predictive Power ## Continious https://stat.ethz.ch/R-manual/R-devel/library/stats/html/t.test.html t.test(mtcars$mpg ~ mtcars$am) %>% report::report() report(t.test(iris$Sepal.Length, iris$Petal.Length)) ## Odds # Frequently Used Statistical Tests By Pathologists Frequently Used Statistical Tests^[Statistical Literacy Among Academic Pathologists: A Survey Study to Gauge Knowledge of Frequently Used Statistical Tests Among Trainees and Faculty. Archives of Pathology & Laboratory Medicine: February 2017, Vol. 141, No. 2, pp. 279-287. https://doi.org/10.5858/arpa.2016-0200-OA] by [@Schmidt2017] - Student t test - Regression/ANOVA - Chi-square test - Mann-Whitney test (rank sum) - Fisher exact test - Survival analysis - Kaplan-Meier/log-rank - Cox regression - Multiple comparison adjustment - Tukey - Bonferroni - Newman-Keuls - Kappa Statistic - ROC analysis - Logistic regression - Spearman rank correlation - Kruskal-Wallis test - Pearson correlation statistic - Normality test - McNemar test # Consider Adding: - https://cran.r-project.org/web/packages/sm/sm.pdf - https://cran.r-project.org/web/packages/Rfit/Rfit.pdf \newpage \blandscape **Codes for ROC**.^[See [`childRmd/_16ROC.Rmd`](https://github.com/sbalci/histopathology-template/blob/master/childRmd/_16ROC.Rmd) file for other codes] # ROC **Codes for Decision Tree**.^[See [`childRmd/_17decisionTree.Rmd`](https://github.com/sbalci/histopathology-template/blob/master/childRmd/_17decisionTree.Rmd)] # Decision Tree **Explore** ```r explore::explore(mydata)
Survival Analysis
Codes for Survival Analysis 31
- Survival analysis with strata, clusters, frailties and competing risks in in Finalfit
https://www.datasurg.net/2019/09/12/survival-analysis-with-strata-clusters-frailties-and-competing-risks-in-in-finalfit/
- Intracranial WHO grade I meningioma: a competing risk analysis of progression and disease-specific survival
https://link.springer.com/article/10.1007/s00701-019-04096-9
Calculate survival time
mydata$int <- lubridate:: interval(lubridate:: ymd(mydata$SurgeryDate), lubridate:: ymd(mydata$LastFollowUpDate)) mydata$OverallTime <- lubridate:: time_length(mydata$int, "month") mydata$OverallTime <- round(mydata$OverallTime, digits = 1)
recode death status outcome as numbers for survival analysis
## Recoding mydata$Death into mydata$Outcome mydata$Outcome <- forcats:: fct_recode(as.character(mydata$Death), ` 1 ` = "TRUE", ` 0 ` = "FALSE") mydata$Outcome <- as.numeric(as.character(mydata$Outcome))
it is always a good practice to double-check after recoding 32
table(mydata$Death, mydata$Outcome)
0 1 FALSE 83 0 TRUE 0 166
Kaplan-Meier
library(survival) # data(lung) km <- with(lung, Surv(time, status)) km <- with(mydata, Surv(OverallTime, Outcome)) head(km, 80)
[1] 4.5+ 7.8 7.1 7.9 10.6 6.9+ 8.4+ 11.0 3.5 7.6 8.4 6.0 [13] NA 9.5 11.2 11.7 9.2 7.6? 4.1 4.7 9.7+ 8.3+ 6.0+ 5.5+ [25] 6.4 11.4 3.8+ 10.2 3.0 6.4 11.3 6.5+ 9.7 6.7 3.3+ 11.2+ [37] 7.8 7.0 6.3 10.2 7.0 11.2 9.7+ 6.8 3.1 3.6 7.8 9.5+ [49] 6.0 10.4+ 11.2+ 3.3+ 7.4 9.2+ 9.9 11.2+ 10.0 5.4 9.5 5.4 [61] 5.9 8.4 4.1 9.2 7.3+ 6.6 7.0+ 8.6+ 4.0 4.1 10.7 4.7 [73] 6.9 6.6 5.3 8.0 9.3 8.4+ 8.6+ 8.8
Kaplan-Meier Plot Log-Rank Test
# Drawing Survival Curves Using ggplot2 # https://rpkgs.datanovia.com/survminer/reference/ggsurvplot.html dependentKM <- "Surv(OverallTime, Outcome)" explanatoryKM <- "LVI" mydata %>% finalfit:: surv_plot(.data = ., dependent = dependentKM, explanatory = explanatoryKM, xlab= 'Time (months)', pval= TRUE, legend = 'none', break.time.by = 12, xlim = c(0,60) # legend.labs = c('a','b') )
# Drawing Survival Curves Using ggplot2 # https://rpkgs.datanovia.com/survminer/reference/ggsurvplot.html mydata %>% finalfit:: surv_plot(.data = ., dependent = "Surv(OverallTime, Outcome)", explanatory = "LVI", xlab= 'Time (months)', pval= TRUE, legend = 'none', break.time.by = 12, xlim = c(0,60) # legend.labs = c('a','b') )
Univariate Cox-Regression
library(finalfit) library(survival) explanatoryUni <- "LVI" dependentUni <- "Surv(OverallTime, Outcome)" tUni <- mydata %>% finalfit:: finalfit(dependentUni, explanatoryUni) knitr:: kable(tUni[, 1 : 4], row.names = FALSE, align = c("l", "l", "r", "r", "r", "r"))
Dependent: Surv(OverallTime, Outcome) | all | HR (univariable) | |
---|---|---|---|
LVI | Absent | 147 (100.0) | - |
Present | 102 (100.0) | 1.59 (1.15-2.20, p=0.005) |
tUni_df <- tibble:: as_tibble(tUni, .name_repair = "minimal") %>% janitor:: clean_names() tUni_df_descr <- paste0("When ", tUni_df$dependent_surv_overall_time_outcome[1], " is ", tUni_df$x[2], ", there is ", tUni_df$hr_univariable[2], " times risk than ", "when ", tUni_df$dependent_surv_overall_time_outcome[1], " is ", tUni_df$x[1], ".")
When LVI is Present, there is 1.59 (1.15-2.20, p=0.005) times risk than when LVI is Absent.
1-3-5-yr survival
summary(km_fit, times = c(12, 36, 60))
Call: survfit(formula = Surv(OverallTime, Outcome) ~ LVI, data = mydata) 4 observations deleted due to missingness LVI=Absent time n.risk n.event survival std.err lower 95% CI upper 95% CI 12 75 52 0.617 0.0421 0.539 0.705 36 19 35 0.252 0.0452 0.177 0.358 LVI=Present time n.risk n.event survival std.err lower 95% CI upper 95% CI 12 23 49 0.383 0.0566 0.2870 0.512 36 4 12 0.134 0.0488 0.0657 0.274
km_fit_summary <- summary(km_fit, times = c(12, 36, 60)) km_fit_df <- as.data.frame(km_fit_summary[c("strata", "time", "n.risk", "n.event", "surv", "std.err", "lower", "upper")])
km_fit_definition <- km_fit_df %>% dplyr:: mutate(description = glue:: glue("When {strata}, {time} month survival is {scales::percent(surv)} [{scales::percent(lower)}-{scales::percent(upper)}, 95% CI].")) %>% dplyr:: select(description) %>% dplyr:: pull()
When LVI=Absent, 12 month survival is 62% [54%-70.5%, 95% CI]., When LVI=Absent, 36 month survival is 25% [18%-35.8%, 95% CI]., When LVI=Present, 12 month survival is 38% [29%-51.2%, 95% CI]., When LVI=Present, 36 month survival is 13% [7%-27.4%, 95% CI].
source(here:: here("R", "gc_survival.R"))
Survival Analysis LVI
Kaplan-Meier Plot Log-Rank Test
library(survival) library(survminer) library(finalfit) mydata %>% finalfit:: surv_plot('Surv(OverallTime, Outcome)', 'LVI', xlab= 'Time (months)', pval= TRUE, legend = 'none', break.time.by = 12, xlim = c(0,60) # legend.labs = c('a','b') )
Univariate Cox-Regression
explanatoryUni <- "LVI" dependentUni <- "Surv(OverallTime, Outcome)" tUni <- mydata %>% finalfit(dependentUni, explanatoryUni, metrics = TRUE) knitr:: kable(tUni[, 1 : 4], row.names = FALSE, align = c("l", "l", "r", "r", "r", "r"))
Error in tUni[, 1:4]: incorrect number of dimensions
Univariate Cox-Regression Summary
tUni_df <- tibble:: as_tibble(tUni, .name_repair = "minimal") %>% janitor:: clean_names(dat = ., case = "snake") n_level <- dim(tUni_df)[1] tUni_df_descr <- function(n) { paste0("When ", tUni_df$dependent_surv_overall_time_outcome[1], " is ", tUni_df$x[n + 1], ", there is ", tUni_df$hr_univariable[n + 1], " times risk than ", "when ", tUni_df$dependent_surv_overall_time_outcome[1], " is ", tUni_df$x[1], ".") } results5 <- purrr:: map(.x = c(2 :n_level - 1), .f = tUni_df_descr) print(unlist(results5))
[1] "When is c(\"Absent\", \"Present\"), there is times risk than when is c(\"LVI\", \"\")."
Median Survival
km_fit <- survfit(Surv(OverallTime, Outcome) ~ LVI, data = mydata) km_fit
Call: survfit(formula = Surv(OverallTime, Outcome) ~ LVI, data = mydata) 4 observations deleted due to missingness n events median 0.95LCL 0.95UCL LVI=Absent 144 100 22.0 14.3 31.0 LVI=Present 102 64 10.5 9.9 13.8
km_fit_median_df <- summary(km_fit) km_fit_median_df <- as.data.frame(km_fit_median_df$table) %>% janitor:: clean_names(dat = ., case = "snake") %>% tibble:: rownames_to_column(.data = ., var = "LVI") km_fit_median_definition <- km_fit_median_df %>% dplyr:: mutate(description = glue:: glue("When, LVI, {LVI}, median survival is {median} [{x0_95lcl} - {x0_95ucl}, 95% CI] months.")) %>% dplyr:: mutate(description = gsub(pattern = "thefactor=", replacement = " is ", x = description)) %>% dplyr:: select(description) %>% dplyr:: pull() km_fit_median_definition
When, LVI, LVI=Absent, median survival is 22 [14.3 - 31, 95% CI] months. When, LVI, LVI=Present, median survival is 10.5 [9.9 - 13.8, 95% CI] months.
1-3-5-yr survival
summary(km_fit, times = c(12, 36, 60))
Call: survfit(formula = Surv(OverallTime, Outcome) ~ LVI, data = mydata) 4 observations deleted due to missingness LVI=Absent time n.risk n.event survival std.err lower 95% CI upper 95% CI 12 75 52 0.617 0.0421 0.539 0.705 36 19 35 0.252 0.0452 0.177 0.358 LVI=Present time n.risk n.event survival std.err lower 95% CI upper 95% CI 12 23 49 0.383 0.0566 0.2870 0.512 36 4 12 0.134 0.0488 0.0657 0.274
km_fit_summary <- summary(km_fit, times = c(12, 36, 60)) km_fit_df <- as.data.frame(km_fit_summary[c("strata", "time", "n.risk", "n.event", "surv", "std.err", "lower", "upper")]) km_fit_df
strata time n.risk n.event surv std.err lower upper 1 LVI=Absent 12 75 52 0.6165782 0.04211739 0.53931696 0.7049078 2 LVI=Absent 36 19 35 0.2520087 0.04515881 0.17737163 0.3580528 3 LVI=Present 12 23 49 0.3833784 0.05662684 0.28701265 0.5120993 4 LVI=Present 36 4 12 0.1340646 0.04881983 0.06566707 0.2737036
km_fit_definition <- km_fit_df %>% dplyr:: mutate(description = glue:: glue("When {strata}, {time} month survival is {scales::percent(surv)} [{scales::percent(lower)}-{scales::percent(upper)}, 95% CI].")) %>% dplyr:: select(description) %>% dplyr:: pull() km_fit_definition
When LVI=Absent, 12 month survival is 62% [54%-70.5%, 95% CI]. When LVI=Absent, 36 month survival is 25% [18%-35.8%, 95% CI]. When LVI=Present, 12 month survival is 38% [29%-51.2%, 95% CI]. When LVI=Present, 36 month survival is 13% [7%-27.4%, 95% CI].
records n.max n.start events *rmean *se(rmean) median 0.95LCL LVI=Absent 144 144 144 100 24.71341 1.571856 22.0 14.3 LVI=Present 102 102 102 64 17.48672 1.904576 10.5 9.9 0.95UCL LVI=Absent 31.0 LVI=Present 13.8
km_fit_median_df <- summary(km_fit) results1html <- as.data.frame(km_fit_median_df$table) %>% janitor:: clean_names(dat = ., case = "snake") %>% tibble:: rownames_to_column(.data = ., var = "LVI") results1html[, 1] <- gsub(pattern = "thefactor=", replacement = "", x = results1html[, 1]) knitr:: kable(results1html, row.names = FALSE, align = c("l", rep("r", 9)), format = "html", digits = 1)
LVI | records | n_max | n_start | events | rmean | se_rmean | median | x0_95lcl | x0_95ucl |
---|---|---|---|---|---|---|---|---|---|
LVI=Absent | 144 | 144 | 144 | 100 | 24.7 | 1.6 | 22.0 | 14.3 | 31.0 |
LVI=Present | 102 | 102 | 102 | 64 | 17.5 | 1.9 | 10.5 | 9.9 | 13.8 |
Pairwise Comparisons
Pairwise comparison
dependentKM <- "Surv(OverallTime, Outcome)" explanatoryKM <- "TStage" mydata %>% finalfit:: surv_plot(.data = ., dependent = dependentKM, explanatory = explanatoryKM, xlab= 'Time (months)', pval= TRUE, legend = 'none', break.time.by = 12, xlim = c(0,60) # legend.labs = c('a','b') )
Call: survfit(formula = Surv(OverallTime, Outcome) ~ LVI, data = mydata) 4 observations deleted due to missingness n events median 0.95LCL 0.95UCL LVI=Absent 144 100 22.0 14.3 31.0 LVI=Present 102 64 10.5 9.9 13.8
print(km_fit, scale= 1, digits = max(options()$digits - 4,3), print.rmean= getOption("survfit.print.rmean"), rmean = getOption('survfit.rmean'), print.median= getOption("survfit.print.median"), median = getOption('survfit.median') )
Call: survfit(formula = Surv(OverallTime, Outcome) ~ LVI, data = mydata) 4 observations deleted due to missingness n events median 0.95LCL 0.95UCL LVI=Absent 144 100 22.0 14.3 31.0 LVI=Present 102 64 10.5 9.9 13.8
Multivariate Analysis Survival
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% hr_plot(dependent, explanatory)
library(survival) library(survminer) library(finalfit) mb_followup %>% finalfit:: surv_plot('Surv(OverallTime, Outcome)', 'Operation', xlab= 'Time (months)', pval= TRUE, legend = 'none', # pval.coord break.time.by = 12, xlim = c(0,60), ylim = c(0.8, 1) # legend.labs = c('a','b') )
Univariate Cox-Regression
explanatoryUni <- "Operation" dependentUni <- "Surv(OverallTime, Outcome)" tUni <- mb_followup %>% finalfit(dependentUni, explanatoryUni) knitr:: kable(tUni[, 1 : 4], row.names = FALSE, align = c("l", "l", "r", "r", "r", "r"))
Univariate Cox-Regression Summary
tUni_df <- tibble:: as_tibble(tUni, .name_repair = "minimal") %>% janitor:: clean_names(dat = ., case = "snake") n_level <- dim(tUni_df)[1] tUni_df_descr <- function(n) { paste0("When ", tUni_df$dependent_surv_overall_time_outcome[1], " is ", tUni_df$x[n + 1], ", there is ", tUni_df$hr_univariable[n + 1], " times risk than ", "when ", tUni_df$dependent_surv_overall_time_outcome[1], " is ", tUni_df$x[1], ".") } results5 <- purrr:: map(.x = c(2 :n_level - 1), .f = tUni_df_descr) print(unlist(results5))
Median Survival
km_fit <- survfit(Surv(OverallTime, Outcome) ~ Operation, data = mb_followup) # km_fit # summary(km_fit) km_fit_median_df <- summary(km_fit) km_fit_median_df <- as.data.frame(km_fit_median_df$table) %>% janitor:: clean_names(dat = ., case = "snake") %>% tibble:: rownames_to_column(.data = ., var = "Derece") km_fit_median_df # km_fit_median_df %>% knitr::kable(format = 'latex') %>% # kableExtra::kable_styling(latex_options='scale_down') km_fit_median_definition <- km_fit_median_df %>% dplyr:: mutate(description = glue:: glue("When, Derece, {Derece}, median survival is {median} [{x0_95lcl} - {x0_95ucl}, 95% CI] months.")) %>% dplyr:: mutate(description = gsub(pattern = "thefactor=", replacement = " is ", x = description)) %>% dplyr:: select(description) %>% dplyr:: pull() # km_fit_median_definition
1-3-5-yr survival
summary(km_fit, times = c(12, 36, 60)) km_fit_summary <- summary(km_fit, times = c(12, 36, 60)) km_fit_df <- as.data.frame(km_fit_summary[c("strata", "time", "n.risk", "n.event", "surv", "std.err", "lower", "upper")]) km_fit_df %>% knitr:: kable(format = "latex") %>% kableExtra:: kable_styling(latex_options = "scale_down") km_fit_definition <- km_fit_df %>% dplyr:: mutate(description = glue:: glue("When {strata}, {time} month survival is {scales::percent(surv)} [{scales::percent(lower)}-{scales::percent(upper)}, 95% CI].")) %>% dplyr:: select(description) %>% dplyr:: pull() km_fit_definition
Pairwise Comparisons
survminer:: pairwise_survdiff(formula = Surv(OverallTime, Outcome) ~ Operation, data = mb_followup, p.adjust.method = "BH")
library(gt) library(gtsummary) library(survival) fit1 <- survfit(Surv(ttdeath, death) ~ trt, trial) tbl_strata_ex1 <- tbl_survival(fit1, times = c(12, 24), label = "{time} Months") fit2 <- survfit(Surv(ttdeath, death) ~ 1, trial) tbl_nostrata_ex2 <- tbl_survival(fit2, probs = c(0.1, 0.2, 0.5), header_estimate = "**Months**")
Interactive Survival Analysis
Codes for generating Survival Analysis.33
Codes for generating Shiny Survival Analysis.34
Correlation
Correlation Analysis
Codes for generating correlation analysis.35
https://stat.ethz.ch/R-manual/R-patched/library/stats/html/cor.test.html
https: //neuropsychology.github.io/psycho.R/ 2018 / 05 / 20 /correlation.html devtools:: install_github("neuropsychology/psycho.R") # Install the newest version remove.packages("psycho") renv:: install("neuropsychology/psycho.R@0.4.0") # devtools::install_github("neuropsychology/psycho.R@0.4.0") library(psycho) <!-- library(tidyverse) --> cor <- psycho::affective %>% correlation() summary(cor) plot(cor) print(cor)
summary(cor) %>% knitr:: kable(format = "latex") %>% kableExtra:: kable_styling(latex_options= "scale_down") ggplot(mydata, aes(x = tx_zamani_verici_yasi, y = trombosit)) + geom_point() + geom_smooth(method = lm, size = 1)
Models
Codes used in models 36
Use these descriptions to add autoreporting of new models
Linear Model
generate automatic reporting of model via easystats/report 📦
library(report) model <- lm(Sepal.Length ~ Species, data = iris) report:: report(model)
We fitted a linear model (estimated using OLS) to predict Sepal.Length with Species (formula = Sepal.Length ~ Species). Standardized parameters were obtained by fitting the model on a standardized version of the dataset. Effect sizes were labelled following Funder's (2019) recommendations. The model explains a significant and substantial proportion of variance (R2 = 0.62, F(2, 147) = 119.26, p < .001, adj. R2 = 0.61). The model's intercept, corresponding to Sepal.Length = 0 and Species = setosa, is at 5.01 (SE = 0.07, 95% CI [4.86, 5.15], p < .001). Within this model: - The effect of Speciesversicolor is positive and can be considered as very large and significant (beta = 1.12, SE = 0.12, 95% CI [0.88, 1.37], std. beta = 1.12, p < .001). - The effect of Speciesvirginica is positive and can be considered as very large and significant (beta = 1.91, SE = 0.12, 95% CI [1.66, 2.16], std. beta = 1.91, p < .001).
Table report for a linear model
model <- lm(Sepal.Length ~ Petal.Length + Species, data=iris) r <- report(model) to_text(r) to_table(r)
General Linear Models (GLMs)
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/glm.html
model <- glm(vs ~ mpg + cyl, data=mtcars, family= "binomial") r <- report(model) to_fulltext(r) to_fulltable(r)
Where a multivariable model contains a subset of the variables specified in the full univariable set, this can be specified. explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") explanatory.multi = c("age.factor", "obstruct.factor") dependent = 'mort_5yr' colon_s %>% summarizer(dependent, explanatory, explanatory.multi) Random effects. e.g. lme4:: glmer(dependent ~ explanatory + (1 | random_effect), family= "binomial") explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") explanatory.multi = c("age.factor", "obstruct.factor") random.effect = "hospital" dependent = 'mort_5yr' colon_s %>% summarizer(dependent, explanatory, explanatory.multi, random.effect) metrics=TRUE provides common model metrics. colon_s %>% summarizer(dependent, explanatory, explanatory.multi, metrics= TRUE) Cox proportional hazards e.g. survival:: coxph(dependent ~ explanatory) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "Surv(time, status)" colon_s %>% summarizer(dependent, explanatory) Rather than going all- in -one, any number of subset models can be manually added on to a summary.factorlist() table using summarizer.merge(). This is particularly useful when models take a long-time to run or are complicated. Note requirement for glm.id=TRUE. fit2df is a subfunction extracting most common models to a dataframe. explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") explanatory.multi = c("age.factor", "obstruct.factor") random.effect = "hospital" dependent = 'mort_5yr' # Separate tables colon_s %>% summary.factorlist(dependent, explanatory, glm.id= TRUE) -> example.summary colon_s %>% glmuni(dependent, explanatory) %>% fit2df(estimate.suffix= " (univariable)") -> example.univariable colon_s %>% glmmulti(dependent, explanatory) %>% fit2df(estimate.suffix= " (multivariable)") -> example.multivariable colon_s %>% glmmixed(dependent, explanatory, random.effect) %>% fit2df(estimate.suffix= " (multilevel") -> example.multilevel # Pipe together example.summary %>% summarizer.merge(example.univariable) %>% summarizer.merge(example.multivariable) %>% summarizer.merge(example.multilevel) %>% select(- c(glm.id, index)) -> example.final example.final Cox Proportional Hazards example with separate tables merged together. explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") explanatory.multi = c("age.factor", "obstruct.factor") dependent = "Surv(time, status)" # Separate tables colon_s %>% summary.factorlist(dependent, explanatory, glm.id= TRUE) -> example2.summary colon_s %>% coxphuni(dependent, explanatory) %>% fit2df(estimate.suffix= " (univariable)") -> example2.univariable colon_s %>% coxphmulti(dependent, explanatory.multi) %>% fit2df(estimate.suffix= " (multivariable)") -> example2.multivariable # Pipe together example2.summary %>% summarizer.merge(example2.univariable) %>% summarizer.merge(example2.multivariable) %>% select(- c(glm.id, index)) -> example2.final example2.final # OR plot explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% or.plot(dependent, explanatory) # Previously fitted models (`glmmulti()` or `glmmixed()`) can be provided directly to `glmfit` # HR plot (not fully tested) explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "Surv(time, status)" colon_s %>% hr.plot(dependent, explanatory, dependent_label = "Survival") # Previously fitted models (`coxphmulti`) can be provided directly using `coxfit`
ANOVA
Bayesian
# Full report for a Bayesian logistic mixed model with effect sizes library(rstanarm) stan_glmer(vs ~ mpg + (1 |cyl), data=mtcars, family= "binomial") %>% report(standardize= "smart", effsize= "cohen1988") %>% to_fulltext()
indices of model quality and goodness of fit
Test if your model is a good model
https://easystats.github.io/performance/
Some Text ile sağkalım açısından bir ilişki bulunmamıştır (p = 0.22).
my_text <- kableExtra:: text_spec("Some Text", color = "red", background = "yellow") # `r my_text`
Text Here
Since R Markdown use the bootstrap framework under the hood. It is possible to benefit its powerful grid system. Basically, you can consider that your row is divided in 12 subunits of same width. You can then choose to use only a few of this subunits.
Here, I use 3 subunits of size 4 (4x3=12). The last column is used for a plot. You can read more about the grid system here. I got this result showing the following code in my R Markdown document.
Discussion
-
Interpret the results in context of the working hypothesis elaborated in the introduction and other relevant studies; include a discussion of limitations of the study.
-
Discuss potential clinical applications and implications for future research
References
Knijn, N., F. Simmer, and I. D. Nagtegaal. 2015. "Recommendations for Reporting Histopathology Studies: A Proposal." Virchows Archiv 466 (6): 611–15. https://doi.org/10.1007/s00428-015-1762-3.
Schmidt, Robert L., Deborah J. Chute, Jorie M. Colbert-Getz, Adolfo Firpo-Betancourt, Daniel S. James, Julie K. Karp, Douglas C. Miller, et al. 2017. "Statistical Literacy Among Academic Pathologists: A Survey Study to Gauge Knowledge of Frequently Used Statistical Tests Among Trainees and Faculty." Archives of Pathology & Laboratory Medicine 141 (2): 279–87. https://doi.org/10.5858/arpa.2016-0200-OA.
A work by Serdar Balci
drserdarbalci@gmail.com
Survminer Drawing Survival Curves Using Ggplot2 R Package Version 0.4.3
Source: https://www.serdarbalci.com/histopathology-template/Report.html
0 Response to "Survminer Drawing Survival Curves Using Ggplot2 R Package Version 0.4.3"
Post a Comment