The HPSEVERITY Procedure

Predefined Utility Functions

The following predefined utility functions are provided with the HPSEVERITY procedure and are available in the Sashelp.Svrtdist library:

SVRTUTIL_EDF:

This function computes the empirical distribution function (EDF) estimate at the specified value of the random variable given the EDF estimate for a sample.

  • Type: Function

  • Signature: SVRTUTIL_EDF(y, n, x{*}, F{*}, Ftype)

  • Argument Description:

    y

    Value of the random variable at which the EDF estimate is desired.

    n

    Dimension of the x and F input arrays.

    x{*}

    Input numeric array of dimension n that contains values of the random variable observed in the sample. These values are sorted in nondecreasing order.

    F{*}

    Input numeric array of dimension n in which each F[$i$] contains the EDF estimate for x[$i$]. These values must be sorted in nondecreasing order.

    Ftype

    Type of the empirical estimate that is stored in the x and F arrays. See the section Supplying EDF Estimates to Functions and Subroutines for definition of types.

  • Return value: The EDF estimate at $y$.

The type of the sample EDF estimate determines how the EDF estimate at $y$ is computed. For more information, see the section Supplying EDF Estimates to Functions and Subroutines.

SVRTUTIL_EMPLIMMOMENT:

This function computes the empirical estimate of the limited moment of specified order for the specified upper limit, given the EDF estimate for a sample.

  • Type: Function

  • Signature: SVRTUTIL_EMPLIMMOMENT(k, u, n, x{*}, F{*}, Ftype)

  • Argument Description:

    k

    Order of the desired empirical limited moment.

    u

    Upper limit on the value of the random variable to be used in the computation of the desired empirical limited moment.

    n

    Dimension of the x and F input arrays.

    x{*}

    Input numeric array of dimension n that contains values of the random variable observed in the sample. These values are sorted in nondecreasing order.

    F{*}

    Input numeric array of dimension n in which each F[$i$] contains the EDF estimate for x[$i$]. These values must be sorted in nondecreasing order.

    Ftype

    Type of the empirical estimate that is stored in the x and F arrays. See the section Supplying EDF Estimates to Functions and Subroutines for definition of types.

  • Return value: The desired empirical limited moment.

The empirical limited moment is computed by using the empirical estimate of the CDF. If $F_ n(x)$ denotes the EDF at $x$, then the empirical limited moment of order $k$ with upper limit $u$ is defined as

\[  E_ n[(X \wedge u)^ k] = k \int _{0}^{u} (1-F_ n(x)) x^{k-1} dx  \]

The SVRTUTIL_EMPLIMMOMENT function uses the piecewise linear nature of $F_ n(x)$ as described in the section Supplying EDF Estimates to Functions and Subroutines to compute the integration.

SVRTUTIL_HILLCUTOFF:

This function computes an estimate of the value where the right tail of a distribution is expected to begin. The function implements the algorithm described in Danielsson et al. 2001. The description of the algorithm uses the following notation:

$n$

Number of observations in the original sample.

$B$

Number of bootstrap samples to draw.

$m_1$

Size of the bootstrap sample in the first step of the algorithm ($m_1 < n$).

$x^{j,m}_{(i)}$

$i$th order statistic of $j$th bootstrap sample of size $m$ ($1 \leq i \leq m, 1 \leq j \leq B$).

$x_{(i)}$

$i$th order statistic of the original sample ($1 \leq i \leq n$).

Given the input sample $x$ and values of $B$ and $m_1$, the steps of the algorithm are as follows:

  1. Take $B$ bootstrap samples of size $m_1$ from the original sample.

  2. Find the integer $k_1$ that minimizes the bootstrap estimate of the mean squared error:

    \[  k_1 = \arg \min _{1 \leq k < m_1} Q(m_1, k)  \]
  3. Take $B$ bootstrap samples of size $m_2 = m_1^2/n$ from the original sample.

  4. Find the integer $k_2$ that minimizes the bootstrap estimate of the mean squared error:

    \[  k_2 = \arg \min _{1 \leq k < m_2} Q(m_2, k)  \]
  5. Compute the integer $k_{\text {opt}}$, which is used for computing the cutoff point:

    \[  k_{\text {opt}} = \frac{k_1^2}{k_2} \left( \frac{\log (k_1)}{2 \log (m_1) - \log (k_1)} \right)^{2 - 2\log (k_1)/\log (m_1)}  \]
  6. Set the cutoff point equal to $x_{(k_{\text {opt}} + 1)}$.

The bootstrap estimate of the mean squared error is computed as

\[  Q(m,k) = \frac{1}{B} \sum _{j=1}^{B} \text {MSE}_ j(m,k)  \]

The mean squared error of $j$th bootstrap sample is computed as

\[  \text {MSE}_ j(m,k) = (M_ j(m,k) - 2 (\gamma _ j(m,k))^2)^2  \]

where $M_ j(m,k)$ is a control variate proposed by Danielsson et al. 2001,

\[  M_ j(m,k) = \frac{1}{k} \sum _{i=1}^{k} \left( \log (x^{j,m}_{(m-i+1)}) - \log (x^{j,m}_{(m-k)}) \right)^2  \]

and $\gamma _ j(m,k)$ is the Hill’s estimator of the tail index (Hill, 1975),

\[  \gamma _ j(m,k) = \frac{1}{k} \sum _{i=1}^{k} \log (x^{j,m}_{(m-i+1)}) - \log (x^{j,m}_{(m-k)})  \]

This algorithm has two tuning parameters, $B$ and $m_1$. The number of bootstrap samples $B$ is chosen based on the availability of computational resources. The optimal value of $m_1$ is chosen such that the following ratio, $R(m_1)$, is minimized:

\[  R(m_1) = \frac{(Q(m_1,k_1))^2}{Q(m_2,k_2)}  \]

The SVRTUTIL_HILLCUTOFF utility function implements the preceding algorithm. It uses the grid search method to compute the optimal value of $m_1$.

  • Type: Function

  • Signature: SVRTUTIL_HILLCUTOFF(n, x{*}, b, s, status)

  • Argument Description:

    n

    Dimension of the array x.

    x{*}

    Input numeric array of dimension $n$ that contains the sample.

    b

    Number of bootstrap samples used to estimate the mean squared error. If b is less than 10, then a default value of 50 is used.

    s

    Approximate number of steps used to search the optimal value of $m_1$ in the range $[n^{0.75}, n-1]$. If s is less than or equal to 1, then a default value of 10 is used.

    status

    Output argument that contains the status of the algorithm. If the algorithm succeeds in computing a valid cutoff point, then status is set to 0. If the algorithm fails, then status is set to 1.

  • Return value: The cutoff value where the right tail is estimated to start. If the size of the input sample is inadequate ($n \leq 5$), then a missing value is returned and status is set to a missing value. If the algorithm fails to estimate a valid cutoff value (status = 1), then the fifth largest value in the input sample is returned.

SVRTUTIL_PERCENTILE:

This function computes the specified empirical percentile given the EDF estimates.

  • Type: Function

  • Signature: SVRTUTIL_PERCENTILE(p, n, x{*}, F{*}, Ftype)

  • Argument Description:

    p

    Desired percentile. The value must be in the interval (0,1). The function returns the 100$p$th percentile.

    n

    Dimension of the x and F input arrays.

    x{*}

    Input numeric array of dimension n that contains values of the random variable observed in the sample. These values are sorted in nondecreasing order.

    F{*}

    Input numeric array of dimension n in which each F[$i$] contains the EDF estimate for x[$i$]. These values must be sorted in nondecreasing order.

    Ftype

    Type of the empirical estimate that is stored in the x and F arrays. See the section Supplying EDF Estimates to Functions and Subroutines for definition of types.

  • Return value: The 100$p$th percentile of the input sample.

The method used to compute the percentile depends on the type of the EDF estimate (Ftype argument).

Ftype = 1

Smoothed empirical estimates are computed using the method described in Klugman, Panjer, and Willmot (1998). Let $\lfloor x \rfloor $ denote the greatest integer less than or equal to $x$. Define $g = \lfloor p(n+1) \rfloor $ and $h = p(n+1) - g$. Then the empirical percentile $\hat{\pi }_ p$ is defined as

\[  \hat{\pi }_ p = (1-h) x[g] + h x[g+1]  \]

This method does not work if $p < 1/(n+1)$ or $p > n/(n+1)$. If $p < 1/(n+1)$, then the function returns $\hat{\pi }_ p = x[1]/2$, which assumes that the EDF is 0 in the interval $[0,x[1])$. If $p > n/(n+1)$, then $\hat{\pi }_ p = x[n]$.

Ftype = 2

If $p < F[1]$, then $\hat{\pi }_ p = x[1]/2$, which assumes that the EDF is 0 in the interval $[0,x[1])$. If $|p-F[i]| < \epsilon $ for some value of $i$ and $i < n$, then $\hat{\pi }_ p$ is computed as

\[  \hat{\pi }_ p = \frac{x[i] + x[i+1]}{2}  \]

where $\epsilon $ is a machine-precision constant as returned by the SAS function CONSTANT('MACEPS'). If $F[i-1] < p < F[i]$, then $\hat{\pi }_ p$ is computed as

\[  \hat{\pi }_ p = x[i]  \]

If $p \geq F[n]$ , then $\hat{\pi }_ p = x[n]$.

SVRTUTIL_RAWMOMENTS:

This subroutine computes the raw moments of a sample.

  • Type: Subroutine

  • Signature: SVRTUTIL_RAWMOMENTS(n, x{*}, nx{*}, nRaw, raw{*})

  • Argument Description:

    n

    Dimension of the x and nx input arrays.

    x{*}

    Input numeric array of dimension n that contains distinct values of the random variable that are observed in the sample.

    nx{*}

    Input numeric array of dimension n in which each nx[$i$] contains the number of observations in the sample that have the value x[$i$].

    nRaw

    Desired number of raw moments. The output array raw contains the first nRaw raw moments.

    raw{*}

    Output array of raw moments. The $k$th element in the array (raw{$k$}) contains the $k$th raw moment, where $1 \leq k \leq \text {nRaw}$.

  • Return value: Numeric array raw that contains the first nRaw raw moments. The array contains missing values if the sample has no observations (that is, if all the values in the nx array add up to zero).

SVRTUTIL_SORT:

This function sorts the given array of numeric values in an ascending or descending order.

  • Type: Subroutine

  • Signature: SVRTUTIL_SORT(n, x{*}, flag)

  • Argument Description:

    n

    Dimension of the input array x.

    x{*}

    Numeric array that contains the values to be sorted at input. The subroutine uses the same array to return the sorted values.

    flag

    A numeric value that controls the sort order. If flag is 0, then the values are sorted in an ascending order. If flag has any value other than 0, then the values are sorted in descending order.

  • Return value: Numeric array x, which is sorted in place (that is, the sorted array is stored in the same storage area occupied by the input array x).

You can use the following predefined functions when you use the FCMP procedure to define functions and subroutines. They are summarized here for your information. For more information, see the FCMP procedure documentation in Base SAS Procedures Guide.

INVCDF:

This function computes the quantile from any continuous probability distribution by numerically inverting the CDF of that distribution. You need to specify the CDF function of the distribution, the values of its parameters, and the cumulative probability to compute the quantile.

LIMMOMENT:

This function computes the limited moment of order $k$ with upper limit $u$ for any continuous probability distribution. The limited moment is defined as

$\displaystyle  E[(X \wedge u)^ k]  $
$\displaystyle = \int _{0}^{u} x^ k f(x) dx + \int _{u}^{\infty } u^ k f(x) dx  $
$\displaystyle  $
$\displaystyle = \int _{0}^{u} x^ k f(x) dx + u^ k (1 - F(u))  $

where $f(x)$ and $F(x)$ denote the PDF and the CDF of the distribution, respectively. The LIMMOMENT function uses the following alternate definition, which can be derived using integration-by-parts:

\[  E[(X \wedge u)^ k] = k \int _{0}^{u} (1-F(x)) x^{k-1} dx  \]

You need to specify the CDF function of the distribution, the values of its parameters, and the values of $k$ and $u$ to compute the limited moment.