Title: | Statistical Methods for the Analysis of Excess Lifetimes |
---|---|
Description: | A collection of parametric and nonparametric methods for the analysis of survival data. Parametric families implemented include Gompertz-Makeham, exponential and generalized Pareto models and extended models. The package includes an implementation of the nonparametric maximum likelihood estimator for arbitrary truncation and censoring pattern based on Turnbull (1976) <doi:10.1111/j.2517-6161.1976.tb01597.x>, along with graphical goodness-of-fit diagnostics. Parametric models for positive random variables and peaks over threshold models based on extreme value theory are described in Rootzén and Zholud (2017) <doi:10.1007/s10687-017-0305-5>; Belzile et al. (2021) <doi:10.1098/rsos.202097> and Belzile et al. (2022) <doi:10.1146/annurev-statistics-040120-025426>. |
Authors: | Leo Belzile [aut, cre] |
Maintainer: | Leo Belzile <[email protected]> |
License: | GPL-3 |
Version: | 1.1.1 |
Built: | 2025-01-12 06:07:08 UTC |
Source: | https://github.com/lbelzile/longevity |
The Northrop-Coleman tests for penultimate models are comparing the piece-wise generalized Pareto distribution to the generalized Pareto above the lower threshold.
autoplot.elife_northropcoleman(object, ...) ## S3 method for class 'elife_northropcoleman' plot(x, plot.type = c("base", "ggplot"), plot = TRUE, ...)
autoplot.elife_northropcoleman(object, ...) ## S3 method for class 'elife_northropcoleman' plot(x, plot.type = c("base", "ggplot"), plot = TRUE, ...)
object |
object of class |
... |
additional arguments for base |
x |
an object of class |
plot.type |
string indicating the type of plot |
plot |
logical; should the routine print the graph if |
a base R or ggplot object with p-values for the Northrop-Coleman test against thresholds.
Because of censoring and truncation, the plotting positions must be adjusted accordingly. For right-censored data, the methodology is described in Waller & Turnbull (1992). Only non-censored observations are displayed, which can create distortion.
autoplot.elife_par(object, ...) ## S3 method for class 'elife_par' plot( x, plot.type = c("base", "ggplot"), which.plot = c("pp", "qq"), confint = c("none", "pointwise", "simultaneous"), plot = TRUE, ... )
autoplot.elife_par(object, ...) ## S3 method for class 'elife_par' plot( x, plot.type = c("base", "ggplot"), which.plot = c("pp", "qq"), confint = c("none", "pointwise", "simultaneous"), plot = TRUE, ... )
object |
an object of class |
... |
additional arguments, currently ignored by the function. |
x |
a parametric model of class |
plot.type |
string, one of |
which.plot |
vector of string indicating the plots, among |
confint |
logical; if |
plot |
logical; if |
For truncated data, we first estimate the distribution function
nonparametrically, . The uniform plotting positions of the data
For probability-probability plots, the empirical quantiles are transformed
using the same transformation, with replaced by the postulated or estimated
distribution function
.
For quantile-quantile plots, the plotting positions
are mapped back
to the data scale viz.
When data are truncated and observations are mapped back to the untruncated scale (with, e.g., exp
), the plotting positions need not be in the same order as the order statistics of the data.
The function produces graphical goodness-of-fit plots using base R or ggplot objects (returned as an invisible list).
set.seed(1234) samp <- samp_elife( n = 200, scale = 2, shape = 0.3, family = "gomp", lower = 0, upper = runif(200, 0, 10), type2 = "ltrc") fitted <- fit_elife( time = samp$dat, thresh = 0, event = ifelse(samp$rcens, 0L, 1L), type = "right", family = "exp", export = TRUE) plot(fitted, plot.type = "ggplot") # Left- and right-truncated data n <- 40L samp <- samp_elife( n = n, scale = 2, shape = 0.3, family = "gp", lower = ltrunc <- runif(n), upper = rtrunc <- ltrunc + runif(n, 0, 15), type2 = "ltrt") fitted <- fit_elife( time = samp, thresh = 0, ltrunc = ltrunc, rtrunc = rtrunc, family = "gp", export = TRUE) plot(fitted, which.plot = c("tmd", "dens"))
set.seed(1234) samp <- samp_elife( n = 200, scale = 2, shape = 0.3, family = "gomp", lower = 0, upper = runif(200, 0, 10), type2 = "ltrc") fitted <- fit_elife( time = samp$dat, thresh = 0, event = ifelse(samp$rcens, 0L, 1L), type = "right", family = "exp", export = TRUE) plot(fitted, plot.type = "ggplot") # Left- and right-truncated data n <- 40L samp <- samp_elife( n = n, scale = 2, shape = 0.3, family = "gp", lower = ltrunc <- runif(n), upper = rtrunc <- ltrunc + runif(n, 0, 15), type2 = "ltrt") fitted <- fit_elife( time = samp, thresh = 0, ltrunc = ltrunc, rtrunc = rtrunc, family = "gp", export = TRUE) plot(fitted, which.plot = c("tmd", "dens"))
Threshold stability plots
autoplot.elife_tstab(object, ...) ## S3 method for class 'elife_tstab' plot( x, plot.type = c("base", "ggplot"), which.plot = c("scale", "shape"), plot = TRUE, ... )
autoplot.elife_tstab(object, ...) ## S3 method for class 'elife_tstab' plot( x, plot.type = c("base", "ggplot"), which.plot = c("scale", "shape"), plot = TRUE, ... )
object |
object of class |
... |
additional arguments, currently ignored by the function. |
x |
an object of class |
plot.type |
string, one of |
which.plot |
vector of string indicating the plots, among |
plot |
logical; if |
This data frame contains information about all Dutch who died
above age 92 years between 1986 and 2015. Observations are
doubly truncated and such bounds are calculated based on the
range of plausible values for these variables.
There are 226 records that are interval-censored and interval-truncated
for which bdate
, ddate
and ndays
is missing (NA
).
dutch
dutch
A data frame with 305143 rows and 11 variables:
survival time (in days)
the smallest plausible birth date given information
about month of birth and death and survival (Date
)
month of birth
year of birth
the largest plausible death date given information
about month of birth and death and survival (Date
)
month of death
year of death
minimum age (in days); the maximum of either 92 years or the number of days reached in 1986
maximum age (in days) an individual could have reached by the end of 2015
factor indicating gender of individual, either
female
or male
quality flag; A
for individuals born in
the Netherlands, B
for individuals born abroad who died
in the Netherlands
Statistics Netherlands (CBS). Accessed via the Supplemental material of Einmahl, Einmahl and de Haan (2019)
Einmahl, J.J., J.H.J. Einmahl and L. de Haan (2019). Limits to Human Life Span Through Extreme Value Theory, Journal of the American Statistical Association, 114(527), 1075-1080. doi:10.1080/01621459.2018.1537912
This data frame contains information about 179 fake records mimicking Welsh and English who died age 110 and above
ewsim
ewsim
A data frame with 179 rows and 3 variables:
survival time above 110 (in years)
minimum age above 110 (in years), or zero
;
maximum age (in years) an individual could have reached by the end of the time frame
This function is a wrapper around constrained optimization routines for different models with non-informative censoring and truncation patterns.
fit_elife( time, time2 = NULL, event = NULL, type = c("right", "left", "interval", "interval2"), ltrunc = NULL, rtrunc = NULL, thresh = 0, status = NULL, family = c("exp", "gp", "weibull", "gomp", "gompmake", "extgp", "gppiece", "extweibull", "perks", "perksmake", "beard", "beardmake"), weights = NULL, export = FALSE, start = NULL, restart = FALSE, arguments = NULL, check = FALSE, ... )
fit_elife( time, time2 = NULL, event = NULL, type = c("right", "left", "interval", "interval2"), ltrunc = NULL, rtrunc = NULL, thresh = 0, status = NULL, family = c("exp", "gp", "weibull", "gomp", "gompmake", "extgp", "gppiece", "extweibull", "perks", "perksmake", "beard", "beardmake"), weights = NULL, export = FALSE, start = NULL, restart = FALSE, arguments = NULL, check = FALSE, ... )
time |
excess time of the event of follow-up time, depending on the value of event |
time2 |
ending excess time of the interval for interval censored data only. |
event |
status indicator, normally 0=alive, 1=dead. Other choices are |
type |
character string specifying the type of censoring. Possible values are " |
ltrunc |
lower truncation limit, default to |
rtrunc |
upper truncation limit, default to |
thresh |
vector of thresholds |
status |
integer vector giving status of an observation. If |
family |
string; choice of parametric family |
weights |
weights for observations |
export |
logical; should data be included in the returned object to produce diagnostic plots? Default to |
start |
vector of starting values for the optimization routine. If |
restart |
logical; should multiple starting values be attempted? Default to |
arguments |
a named list specifying default arguments of the function that are common to all |
check |
logical; if |
... |
additional parameters, currently ignored |
an object of class elife_par
The extended generalized Pareto model is constrained
to avoid regions where the likelihood is flat so in
the optimization algorithm.
The standard errors are obtained via the observed information matrix, calculated using the hessian. In many instances, such as when the shape parameter is zero or negative, the hessian is singular and no estimates are returned.
data(ewsim, package = "longevity") fit1 <- fit_elife( arguments = ewsim, export = TRUE, family = "exp") fit2 <- fit_elife( arguments = ewsim, export = TRUE, family = "gp") plot(fit1) summary(fit1) anova(fit2, fit1)
data(ewsim, package = "longevity") fit1 <- fit_elife( arguments = ewsim, export = TRUE, family = "exp") fit2 <- fit_elife( arguments = ewsim, export = TRUE, family = "gp") plot(fit1) summary(fit1) anova(fit2, fit1)
This data frame contains country codes and the associated data collection period corresponding to the range for age at death.
idlmetadata
idlmetadata
A data frame with 21 rows and 4 variables:
factor, one of AUT
(Austria), BEL
(Belgium), CAN
(Quebec), DEU
(Germany), DNK
(Denmark), ESP
(Spain), FIN
(Finland), FRA
(France), JPN
(Japan), NOR
(Norway), SWE
(Sweden), EAW
(England and Wales) and USA
(United States of America)
factor, either 105-109
for semi-supercentenarians or 110+
for supercentenarians"
Date, smallest death date
Date, latest death date
Due to confidentiality restrictions, some data that were available in previous versions of the IDL for Switzerland, Italy and some entries for Japan and Belgium have been removed. As the IDL metadata are updated somewhat regularly and former versions of the database are not preserved, results from published analyses are replicable but not reproducible.
International Database on Longevity, extracted on February 13th, 2023
This data frame contains information about the counts of dead Japanese by gender and year of birth (cohort), categorized by the whole part of age attained at death.
japanese
japanese
A data frame with 1038 rows and 4 variables:
integer, age (to the smallest year) at death (in years)
integer, birth year
integer, number of death for cohort at given age
factor, the gender of the individuals; either male
or female
These data were obtained from the Annual Vital Statistics Report of Japan, released by the Japanese government every year since 1947. The authors note that "All the members of that cohort have died by the end of the observation period, a procedure referred to as the extinct cohort method". The data were obtained from the Human Mortality Database by the authors. Only positive counts are reported and two records (Misao Okawa and Jiroemon Kimura) are excluded because they do not correspond to the same selection mechanism.
Table extracted from Hanayama & Sibuya (2016).
Hanayama, N. and M. Sibuya (2016). Estimating the Upper Limit of Lifetime Probability Distribution, Based on Data of Japanese Centenarians, The Journals of Gerontology: Series A, 71(8), 1014–1021. doi:10.1093/gerona/glv113
This data frame is extracted from Table 10.3 from Chapter 10, "Centenarians and Supercentenarians in Japan", in the Monograph Exceptional lifespans. The data were constructed by the extinct cohort method and are stratified by age cohort (five year group, except 1899-1900) and by sex. Note that the family registry system (KOSEKI), introduced in 1872, was standardized in 1886.
japanese2
japanese2
A data frame with 216 rows and 4 variables:
integer, age (to the smallest year) at death (in years)
factor, birth cohort
integer, number of death for cohort at given age
factor, the gender of the individuals; either male
or female
Table 10.3
Saito, Yasuhiko and Futoshi Ishii, and Jean-Marie Robine (2021). Centenarians and Supercentenarians in Japan. In Exceptional lifespans, Maier, H., Jeune, B., Vaupel, J. W. (Eds.), Demographic research monographs 17 VII, pp. 125-145. Cham, Springer.
Log of the posterior distribution for excess lifetime distribution with maximal data information priors.
lpost_elife( par, time, time2 = NULL, event = NULL, type = c("right", "left", "interval", "interval2"), ltrunc = NULL, rtrunc = NULL, family = c("exp", "gp", "gomp"), thresh = 0, weights = rep(1, length(time)), status = NULL, arguments = NULL, ... )
lpost_elife( par, time, time2 = NULL, event = NULL, type = c("right", "left", "interval", "interval2"), ltrunc = NULL, rtrunc = NULL, family = c("exp", "gp", "gomp"), thresh = 0, weights = rep(1, length(time)), status = NULL, arguments = NULL, ... )
par |
vector of parameters, in the following order: scale, rate and shape |
time |
excess time of the event of follow-up time, depending on the value of event |
time2 |
ending excess time of the interval for interval censored data only. |
event |
status indicator, normally 0=alive, 1=dead. Other choices are |
type |
character string specifying the type of censoring. Possible values are " |
ltrunc |
lower truncation limit, default to |
rtrunc |
upper truncation limit, default to |
family |
string; choice of parametric family |
thresh |
vector of thresholds |
weights |
weights for observations |
status |
integer vector giving status of an observation. If |
arguments |
a named list specifying default arguments of the function that are common to all |
... |
additional arguments for optimization, currently ignored. |
a vector proportional to the log posterior (the sum of the log likelihood and log prior)
This function computes the score test with the piecewise generalized Pareto distribution under the null hypothesis that the generalized Pareto with a single shape parameter is an adequate simplification. The score test statistic is calculated using the observed information matrix; both hessian and score vector are obtained through numerical differentiation.
nc_test( time, time2 = NULL, event = NULL, thresh = 0, ltrunc = NULL, rtrunc = NULL, type = c("right", "left", "interval", "interval2"), weights = rep(1, length(time)), test = c("score", "lrt"), arguments = NULL, ... )
nc_test( time, time2 = NULL, event = NULL, thresh = 0, ltrunc = NULL, rtrunc = NULL, type = c("right", "left", "interval", "interval2"), weights = rep(1, length(time)), test = c("score", "lrt"), arguments = NULL, ... )
time |
excess time of the event of follow-up time, depending on the value of event |
time2 |
ending excess time of the interval for interval censored data only. |
event |
status indicator, normally 0=alive, 1=dead. Other choices are |
thresh |
a vector of thresholds |
ltrunc |
lower truncation limit, default to |
rtrunc |
upper truncation limit, default to |
type |
character string specifying the type of censoring. Possible values are " |
weights |
weights for observations |
test |
string, either |
arguments |
a named list specifying default arguments of the function that are common to all |
... |
additional parameters, currently ignored |
The score test is much faster and perhaps less fragile than the likelihood ratio test: fitting the piece-wise generalized Pareto model is difficult due to the large number of parameters and multimodal likelihood surface.
The reference distribution is chi-square
a data frame with the following variables:
thresh
: threshold for the generalized Pareto distribution
nexc
: number of exceedances
score
: score statistic
df
: degrees of freedom
pval
: the p-value obtained from the asymptotic chi-square approximation.
set.seed(1234) n <- 100L x <- samp_elife(n = n, scale = 2, shape = -0.2, lower = low <- runif(n), upper = upp <- runif(n, min = 3, max = 20), type2 = "ltrt", family = "gp") test <- nc_test( time = x, ltrunc = low, rtrunc = upp, thresh = quantile(x, seq(0, 0.5, by = 0.1))) print(test) plot(test)
set.seed(1234) n <- 100L x <- samp_elife(n = n, scale = 2, shape = -0.2, lower = low <- runif(n), upper = upp <- runif(n, min = 3, max = 20), type2 = "ltrt", family = "gp") test <- nc_test( time = x, ltrunc = low, rtrunc = upp, thresh = quantile(x, seq(0, 0.5, by = 0.1))) print(test) plot(test)
Computes the log-likelihood for various parametric models suitable for threshold exceedances. If threshold is non-zero, then only right-censored, observed event time and interval censored data whose timing exceeds the thresholds are kept.
nll_elife( par, time, time2 = NULL, event = NULL, type = c("right", "left", "interval", "interval2"), ltrunc = NULL, rtrunc = NULL, family = c("exp", "gp", "gomp", "gompmake", "weibull", "extgp", "gppiece", "extweibull", "perks", "beard", "perksmake", "beardmake"), thresh = 0, weights = NULL, status = NULL, arguments = NULL, ... )
nll_elife( par, time, time2 = NULL, event = NULL, type = c("right", "left", "interval", "interval2"), ltrunc = NULL, rtrunc = NULL, family = c("exp", "gp", "gomp", "gompmake", "weibull", "extgp", "gppiece", "extweibull", "perks", "beard", "perksmake", "beardmake"), thresh = 0, weights = NULL, status = NULL, arguments = NULL, ... )
par |
vector of parameters, in the following order: scale, rate and shape |
time |
excess time of the event of follow-up time, depending on the value of event |
time2 |
ending excess time of the interval for interval censored data only. |
event |
status indicator, normally 0=alive, 1=dead. Other choices are |
type |
character string specifying the type of censoring. Possible values are " |
ltrunc |
lower truncation limit, default to |
rtrunc |
upper truncation limit, default to |
family |
string; choice of parametric family |
thresh |
vector of thresholds |
weights |
weights for observations |
status |
integer vector giving status of an observation. If |
arguments |
a named list specifying default arguments of the function that are common to all |
... |
additional arguments for optimization, currently ignored. |
log-likelihood values
data(ewsim, package = "longevity") nll_elife(par = c(5, 0.3), family = "gp", arguments = ewsim)
data(ewsim, package = "longevity") nll_elife(par = c(5, 0.3), family = "gp", arguments = ewsim)
The survival function is obtained through the EM algorithm
described in Turnbull (1976); censoring and truncation are
assumed to be non-informative.
The survival function changes only at the J
distinct
exceedances and truncation points.
np_elife( time, time2 = NULL, event = NULL, type = c("right", "left", "interval", "interval2"), thresh = 0, ltrunc = NULL, rtrunc = NULL, tol = 1e-12, weights = NULL, method = c("em", "sqp"), arguments = NULL, maxiter = 100000L, ... )
np_elife( time, time2 = NULL, event = NULL, type = c("right", "left", "interval", "interval2"), thresh = 0, ltrunc = NULL, rtrunc = NULL, tol = 1e-12, weights = NULL, method = c("em", "sqp"), arguments = NULL, maxiter = 100000L, ... )
time |
excess time of the event of follow-up time, depending on the value of event |
time2 |
ending excess time of the interval for interval censored data only. |
event |
status indicator, normally 0=alive, 1=dead. Other choices are |
type |
character string specifying the type of censoring. Possible values are " |
thresh |
double thresh |
ltrunc |
lower truncation limit, default to |
rtrunc |
upper truncation limit, default to |
tol |
double, relative tolerance for convergence of the EM algorithm |
weights |
double, vector of weights for the observations |
method |
string, one of |
arguments |
a named list specifying default arguments of the function that are common to all |
maxiter |
integer, maximum number of iterations for the EM algorithm |
... |
additional arguments, currently ignored |
The unknown parameters of the model are
subject to the constraint that
.
a list with elements
cdf
: right-continuous stepfun
object defined by probabilities
time
: matrix of unique values for the Turnbull intervals defining equivalence classes; only those with non-zero probabilities are returned
prob
: J
vector of non-zero probabilities
niter
: number of iterations
Turnbull, B. W. (1976). The Empirical Distribution Function with Arbitrarily Grouped, Censored and Truncated Data. Journal of the Royal Statistical Society. Series B (Methodological) 38(3), 290–295.
Gentleman, R. and C. J. Geyer (1994). Maximum likelihood for interval censored data: Consistency and computation, Biometrika, 81(3), 618–623.
Frydman, H. (1994). A Note on Nonparametric Estimation of the Distribution Function from Interval-Censored and Truncated Observations, Journal of the Royal Statistical Society. Series B (Methodological) 56(1), 71-74.
set.seed(2021) n <- 20L # Create fake data ltrunc <- pmax(0, runif(n, -0.5, 1)) rtrunc <- runif(n, 6, 10) dat <- samp_elife(n = n, scale = 1, shape = -0.1, lower = ltrunc, upper = rtrunc, family = "gp", type2 = "ltrt") npi <- np_elife(time = dat, rtrunc = rtrunc, ltrunc = ltrunc) print(npi) summary(npi) plot(npi)
set.seed(2021) n <- 20L # Create fake data ltrunc <- pmax(0, runif(n, -0.5, 1)) rtrunc <- runif(n, 6, 10) dat <- samp_elife(n = n, scale = 1, shape = -0.1, lower = ltrunc, upper = rtrunc, family = "gp", type2 = "ltrt") npi <- np_elife(time = dat, rtrunc = rtrunc, ltrunc = ltrunc) print(npi) summary(npi) plot(npi)
The syntax is reminiscent of the Surv function, with additional vectors for left-truncation and right-truncation.
npsurv( time, time2 = NULL, event = NULL, type = c("right", "left", "interval", "interval2"), ltrunc = NULL, rtrunc = NULL, weights = NULL, arguments = NULL, ... )
npsurv( time, time2 = NULL, event = NULL, type = c("right", "left", "interval", "interval2"), ltrunc = NULL, rtrunc = NULL, weights = NULL, arguments = NULL, ... )
time |
excess time of the event of follow-up time, depending on the value of event |
time2 |
ending excess time of the interval for interval censored data only. |
event |
status indicator, normally 0=alive, 1=dead. Other choices are |
type |
character string specifying the type of censoring. Possible values are " |
ltrunc |
lower truncation limit, default to |
rtrunc |
upper truncation limit, default to |
weights |
vector of weights, default to |
arguments |
a named list specifying default arguments of the function that are common to all |
... |
additional arguments passed to the functions |
a list with components
xval
: unique ordered values of sets on which the distribution function is defined
prob
: estimated probability of failure on intervals
convergence
: logical; TRUE
if the EM algorithm iterated until convergence
niter
: logical; number of iterations for the EM algorithm
cdf
: nonparametric maximum likelihood estimator of the distribution function
Contrary to the Kaplan-Meier estimator, the mass is placed in the interval
[max(time), Inf
) so the resulting distribution function is not deficient.
#' # Toy example with interval censoring and right censoring # Two observations: A1: [1,3], A2: 4 # Probability of 0.5 test_simple2 <- npsurv( time = c(1,4), time2 = c(3,4), event = c(3,1), type = "interval" )
#' # Toy example with interval censoring and right censoring # Two observations: A1: [1,3], A2: 4 # Probability of 0.5 test_simple2 <- npsurv( time = c(1,4), time2 = c(3,4), event = c(3,1), type = "interval" )
Plot empirical distribution function
## S3 method for class 'elife_ecdf' plot(x, ...)
## S3 method for class 'elife_ecdf' plot(x, ...)
x |
argument of class |
... |
additional arguments for the plot |
base R plot of the empirical distribution function
Plot profile of endpoint
## S3 method for class 'elife_profile' plot(x, plot.type = c("base", "ggplot"), plot = TRUE, ...) autoplot.elife_profile(object, ...)
## S3 method for class 'elife_profile' plot(x, plot.type = c("base", "ggplot"), plot = TRUE, ...) autoplot.elife_profile(object, ...)
x |
an object of class |
plot.type |
string indicating whether to use base R for plots or |
plot |
logical; if |
... |
additional arguments to pass to |
object |
object of class |
base R or ggplot object for a plot of the profile log likelihood of the endpoint of the generalized Pareto distribution
This function returns the profile log likelihood over a grid of values of psi
, the endpoints.
prof_gp_endpt( time, time2 = NULL, event = NULL, thresh = 0, type = c("right", "left", "interval", "interval2"), ltrunc = NULL, rtrunc = NULL, weights = rep(1, length(time)), psi = NULL, confint = FALSE, level = 0.95, arguments = NULL, ... )
prof_gp_endpt( time, time2 = NULL, event = NULL, thresh = 0, type = c("right", "left", "interval", "interval2"), ltrunc = NULL, rtrunc = NULL, weights = rep(1, length(time)), psi = NULL, confint = FALSE, level = 0.95, arguments = NULL, ... )
time |
excess time of the event of follow-up time, depending on the value of event |
time2 |
ending excess time of the interval for interval censored data only. |
event |
status indicator, normally 0=alive, 1=dead. Other choices are |
thresh |
vector of thresholds |
type |
character string specifying the type of censoring. Possible values are " |
ltrunc |
lower truncation limit, default to |
rtrunc |
upper truncation limit, default to |
weights |
weights for observations |
psi |
mandatory vector of endpoints at which to compute the profile |
confint |
logical; if |
level |
numeric; the level for the confidence intervals |
arguments |
a named list specifying default arguments of the function that are common to all |
... |
additional parameters, currently ignored |
a list with the maximum likelihood estimate of the endpoint and the profile log-likelihood
set.seed(2023) time <- relife(n = 100, scale = 3, shape = -0.3, family = "gp") endpt <- prof_gp_endpt( time = time, psi = seq(max(time) + 1e-4, max(time) + 40, length.out = 51L)) print(endpt) plot(endpt) confint(endpt)
set.seed(2023) time <- relife(n = 100, scale = 3, shape = -0.3, family = "gp") endpt <- prof_gp_endpt( time = time, psi = seq(max(time) + 1e-4, max(time) + 40, length.out = 51L)) print(endpt) plot(endpt) confint(endpt)
This function dispatches simulations accounting for potential left-truncation (remove by setting lower to zero).
If type2=ltrt
, simulated observations will be lower than the upper bounds upper
.
If type2=ltrc
, simulated observations are capped at upper
and the observation is right-censored (rcens=TRUE
).
samp_elife( n, scale, rate, shape = NULL, lower = 0, upper = Inf, family = c("exp", "gp", "gomp", "gompmake", "weibull", "extgp", "gppiece", "extweibull", "perks", "beard", "perksmake", "beardmake"), type2 = c("none", "ltrt", "ltrc", "ditrunc") )
samp_elife( n, scale, rate, shape = NULL, lower = 0, upper = Inf, family = c("exp", "gp", "gomp", "gompmake", "weibull", "extgp", "gppiece", "extweibull", "perks", "beard", "perksmake", "beardmake"), type2 = c("none", "ltrt", "ltrc", "ditrunc") )
n |
sample size |
scale |
scale parameter(s) |
rate |
rate parameter(s) |
shape |
shape parameter(s) |
lower |
vector of lower bounds |
upper |
vector of upper bounds |
family |
string; choice of parametric family |
type2 |
string, either |
either a vector of observations or, if type2=ltrc
, a list with n
observations dat
and a logical vector of the same length with TRUE
for right-censored observations and FALSE
otherwise.
As the tails of the Gompertz and Gompertz–Makeham models decrease exponentially fast, the method fails in the rare event case if the lower bound is too large (say larger than the 99.99
set.seed(1234) n <- 500L # Simulate interval truncated data x <- samp_elife(n = n, scale = 2, shape = 1.5, lower = low <- runif(n), upper = upp <- runif(n, min = 3, max = 15), type2 = "ltrt", family = "weibull") coef(fit_elife( time = x, ltrunc = low, rtrunc = upp, family = "weibull")) # Simulate left-truncated right-censored data x <- samp_elife(n = n, scale = 2, shape = 1.5, lower = low <- runif(n), upper = upp <- runif(n, min = 3, max = 15), type2 = "ltrc", family = "gomp") #note that the return value is a list... coef(fit_elife( time = x$dat, ltrunc = low, event = !x$rcens, family = "gomp"))
set.seed(1234) n <- 500L # Simulate interval truncated data x <- samp_elife(n = n, scale = 2, shape = 1.5, lower = low <- runif(n), upper = upp <- runif(n, min = 3, max = 15), type2 = "ltrt", family = "weibull") coef(fit_elife( time = x, ltrunc = low, rtrunc = upp, family = "weibull")) # Simulate left-truncated right-censored data x <- samp_elife(n = n, scale = 2, shape = 1.5, lower = low <- runif(n), upper = upp <- runif(n, min = 3, max = 15), type2 = "ltrc", family = "gomp") #note that the return value is a list... coef(fit_elife( time = x$dat, ltrunc = low, event = !x$rcens, family = "gomp"))
This function fits separate models for each distinct
value of the factor covariate
and computes a likelihood ratio test
to test whether there are significant differences between
groups.
test_elife( time, time2 = NULL, event = NULL, covariate, thresh = 0, ltrunc = NULL, rtrunc = NULL, type = c("right", "left", "interval", "interval2"), family = c("exp", "gp", "weibull", "gomp", "gompmake", "extgp", "extweibull", "perks", "perksmake", "beard", "beardmake"), weights = rep(1, length(time)), arguments = NULL, ... )
test_elife( time, time2 = NULL, event = NULL, covariate, thresh = 0, ltrunc = NULL, rtrunc = NULL, type = c("right", "left", "interval", "interval2"), family = c("exp", "gp", "weibull", "gomp", "gompmake", "extgp", "extweibull", "perks", "perksmake", "beard", "beardmake"), weights = rep(1, length(time)), arguments = NULL, ... )
time |
excess time of the event of follow-up time, depending on the value of event |
time2 |
ending excess time of the interval for interval censored data only. |
event |
status indicator, normally 0=alive, 1=dead. Other choices are |
covariate |
vector of factors, logical or integer whose distinct values define groups |
thresh |
vector of thresholds |
ltrunc |
lower truncation limit, default to |
rtrunc |
upper truncation limit, default to |
type |
character string specifying the type of censoring. Possible values are " |
family |
string; choice of parametric family |
weights |
weights for observations |
arguments |
a named list specifying default arguments of the function that are common to all |
... |
additional arguments for optimization, currently ignored. |
a list with elements
stat
likelihood ratio statistic
df
degrees of freedom
pval
the p-value obtained from the asymptotic chi-square approximation.
test <- with(subset(dutch, ndays > 39082), test_elife( time = ndays, thresh = 39082L, covariate = gender, ltrunc = ltrunc, rtrunc = rtrunc, family = "exp")) test
test <- with(subset(dutch, ndays > 39082), test_elife( time = ndays, thresh = 39082L, covariate = gender, ltrunc = ltrunc, rtrunc = rtrunc, family = "exp")) test
The generalized Pareto and exponential distribution
are threshold stable. This property, which is used
for extrapolation purposes, can also be used to diagnose
goodness-of-fit: we expect the parameters and
to be constant over a range of thresholds. The threshold stability
plot consists in plotting maximum likelihood estimates with pointwise confidence interval.
This function handles interval truncation and right-censoring.
tstab( time, time2 = NULL, event = NULL, thresh = 0, ltrunc = NULL, rtrunc = NULL, type = c("right", "left", "interval", "interval2"), family = c("gp", "exp"), method = c("wald", "profile"), level = 0.95, plot = TRUE, plot.type = c("base", "ggplot"), which.plot = c("scale", "shape"), weights = NULL, arguments = NULL, ... )
tstab( time, time2 = NULL, event = NULL, thresh = 0, ltrunc = NULL, rtrunc = NULL, type = c("right", "left", "interval", "interval2"), family = c("gp", "exp"), method = c("wald", "profile"), level = 0.95, plot = TRUE, plot.type = c("base", "ggplot"), which.plot = c("scale", "shape"), weights = NULL, arguments = NULL, ... )
time |
excess time of the event of follow-up time, depending on the value of event |
time2 |
ending excess time of the interval for interval censored data only. |
event |
status indicator, normally 0=alive, 1=dead. Other choices are |
thresh |
vector of thresholds |
ltrunc |
lower truncation limit, default to |
rtrunc |
upper truncation limit, default to |
type |
character string specifying the type of censoring. Possible values are " |
family |
string; distribution, either generalized Pareto ( |
method |
string; the type of pointwise confidence interval, either Wald ( |
level |
probability level for the pointwise confidence intervals |
plot |
logical; should a plot be returned alongside with the estimates? Default to |
plot.type |
string; either |
which.plot |
string; which parameters to plot; |
weights |
weights for observations |
arguments |
a named list specifying default arguments of the function that are common to all |
... |
additional arguments for optimization, currently ignored. |
The shape estimates are constrained
an invisible list with pointwise estimates and confidence intervals for the scale and shape parameters
tstab.gpd
from package mev
, gpd.fitrange
from package ismev
or tcplot
from package evd
, among others.
set.seed(1234) n <- 100L x <- samp_elife(n = n, scale = 2, shape = -0.2, lower = low <- runif(n), upper = upp <- runif(n, min = 3, max = 20), type2 = "ltrt", family = "gp") tstab_plot <- tstab(time = x, ltrunc = low, rtrunc = upp, thresh = quantile(x, seq(0, 0.5, length.out = 4))) plot(tstab_plot, plot.type = "ggplot")
set.seed(1234) n <- 100L x <- samp_elife(n = n, scale = 2, shape = -0.2, lower = low <- runif(n), upper = upp <- runif(n, min = 3, max = 20), type2 = "ltrt", family = "gp") tstab_plot <- tstab(time = x, ltrunc = low, rtrunc = upp, thresh = quantile(x, seq(0, 0.5, length.out = 4))) plot(tstab_plot, plot.type = "ggplot")