| The HPFRECONCILE Procedure |
| Data Set Input/Output |
The AGGDATA= data set contains either a proper subset of or none of the variables specified in the BY statement, the time ID variable in the ID statement (when this statement is specified), and the following variables:
variable name
predicted values
The following variables can optionally be present in the AGGDATA= data set and are used when available. If not present, their value is assumed to be missing for computational purposes.
actual values
lower confidence limits
upper confidence limits
prediction errors
prediction standard errors
Typically, the AGGDATA= data set is generated by the OUTFOR= option of the HPFENGINE procedure. See Chapter 5, The HPFENGINE Procedure, for more details.
The AGGDATA= data set must be either sorted by the AGGBY variables and by the ID variable (when the latter is specified) or indexed on the AGGBY variables. Even when the data set is indexed, if the ID variable is specified, its values must be sorted in ascending order within each AGGBY group. See section BY Statement for details about AGGBY variables and AGGBY groups.
You can specify custom names for the variables in the AGGDATA= data set by using the AGGDATA statement. See the section AGGDATA Statement for more details.
The DISAGGDATA= data set contains the variables specified in the BY statement, the variable in the ID statement (when this statement is specified), and the following variables:
variable name
predicted values
The following variables can optionally be present in the DISAGGDATA= data set and are used when available. If not present, their value is assumed to be missing for computational purposes.
actual values
lower confidence limits
upper confidence limits
prediction errors
prediction standard errors
Typically, the DISAGGDATA= data set is generated by the OUTFOR= option of the HPFENGINE procedure. See Chapter 5, The HPFENGINE Procedure, for more details.
The DISAGGDATA= data set must be either sorted by the BY variables and by the ID variable when the latter is specified, or indexed on the BY variables. If the variable _NAME_ is present and has multiple values, then the index must be a composite index on BY variables and _NAME_, in that order. If _NAME_ is present and has only one value, then the index can contain only BY variables. Even when the data set is indexed, if the ID variable is specified, its values must be sorted in ascending order within each BY, or BY and _NAME_ group, as applicable. Indexing the DISAGGDATA= data set on the BY variables when it is already sorted by the BY variables leads to less efficient and less scalable operation if the available memory is not sufficient to hold the disaggregated data for the AGGBY group that is being processed. The amount of memory required depends on, among other things, the length of the series, the number of BY groups for each AGGBY group, and the number and format of the BY variables. For example, if there are four BY variables, each 16 characters long, 10,000 BY groups within each AGGBY group, and each series has length 100, then the minimum required memory for efficient processing is approximately 100 MB. If the memory is not sufficient, sorting the DISAGGDATA= data set, not indexing, is more efficient.
You can specify custom names for the variables in the DISAGGDATA= data set by using the DISAGGDATA statement. See the section DISAGGDATA Statement for more details.
The CONSTRAINT= data set specifies the constraints to be applied to the reconciled forecasts. It contains the BY variables for the level at which reconciled forecasts are generated. That is, it contains the AGGBY variables when DIRECTION=BU, and the variables specified in the BY statement when DIRECTION=TD. If the _NAME_ variable is present in the AGGDATA= and DISAGGDATA= data set, it must also be present in the CONSTRAINT= data set. Additionally, the CONSTRAINT= data set contains the variable in the ID statement (when this statement is specified), and the following variables:
an equality constraint for the predicted reconciled value
a flag that specifies whether the equality constraint should be strictly enforced. Admissible values are as follows:
The equality constraint is locked.
The equality constraint is unlocked.
When EQUALITY is nonmissing and the UNLOCK flag is missing, the equality is treated as locked.
lower bounds for the reconciled forecasts
upper bounds for the reconciled forecasts
Locked equality constraints are treated as constraints, and therefore their value is honored. Unlocked equalities are instead treated as regular forecasts and, in general, are changed by the reconciliation process.
A constraint is said to be active when the reconciled prediction lies on the constraint. By definition, locked equalities are always active constraints.
If the NOTSORTED option is specified in the BY statement, then any BY group in the CONSTRAINT= data set that is out of order with respect to the BY groups in the AGGDATA= or DISAGGDATA= data set is ignored without any error or warning message. If the NOTSORTED option is not specified, then the BY groups in the CONSTRAINT= data set must be in the same sorted order as the AGGBY groups in the AGGDATA= data set when DIRECTION=BU, and in the same sorted order as the BY groups in the DISAGGDATA= data set when DIRECTION=TD; otherwise processing stops at the first such occurrence of a mismatch.
The OUTFOR= data set contains the following variables:
variable name
actual values
predicted values
lower confidence limits
upper confidence limits
prediction errors
prediction standard errors
reconciliation status
Additionally, it contains any other variable that was present in the input data set at the same level—that is, the DISAGGDATA= data set when DIRECTION=TD and the AGGDATA= data set when DIRECTION=BU.
When DIRECTION=BU and the AGGDATA= data set has not been specified, the OUTFOR= data set contains the variables in the previous list, the BY variables specified in the AGGBY statement, and the time ID variable in the ID statement.
If reconciliation fails with _RECONSTATUS_ between 1000 and 6000, PROC HPFRECONCILE copies the input values of the relevant variables to the OUTFOR= data set. If a variable is not present in the input data set, its value is set to missing in the OUTFOR= data set. The only exception to this rule is when the problem is infeasible and the FORCECONSTRAINT option is specified. See the section The FORCECONSTRAINT Option for more details on the latter case.
The OUTFOR= data set is always sorted by the BY variables (and by the _NAME_ variable and time ID variable when these variables are present) even if input data sets are indexed and not sorted.
If the ID statement is specified, then the values of the ID variable in OUTFOR= data set are aligned based on the ALIGN= and INTERVAL= options specified on the ID statement. If ALIGN= option is not specified, then the values are aligned to the beginning of the interval.
If the RECDIFF option of the HPFRECONCILE statement has been specified, the OUTFOR= data sets also contains the following variable:
difference between the reconciled predicted value and the original predicted value
The _RECONSTATUS_ variable contains a code that specifies whether the reconciliation was successful or not. A corresponding message is also displayed in the log. You can use the ERRORTRACE= option to define how often the error and warning messages are displayed in the log. The _RECONSTATUS_ variable can take the following values:
Reconciliation was successful.
A unlocked equality constraint has been imposed.
A locked equality constraint has been imposed.
A lower bound is active.
An upper bound is active.
The ID value is out of the range with respect to the START= and END= interval.
There is insufficient data to reconcile.
Reconciliation failed for the predicted value. This implies that it also failed for the confidence limits and standard error.
Reconciliation failed for the standard error.
Reconciliation failed for the confidence limits.
The constrained optimization problem is infeasible.
The option DISAGGREGATION=PROPORTION has been changed to DISAGGREGATION=DIFFERENCE for this observation because of a discordant sign in the input.
The option STDMETHOD= provided by the user has been changed for this observation.
The option CLMETHOD= provided by the user has been changed for this observation.
The standard error hit the limits imposed by the STDDIFBD= option.
Multiple warnings have been displayed in the log for this observation.
The number of missing values in the STD variable in the DISAGGDATA= data set is different from the number of missing values in the union of the PREDICT and ACTUAL variables.
The solution might be suboptimal. This means that the optimizer did not find an optimal solution, but the solution provided satisfies all constraints.
A failed forecast ".F" has been detected in a relevant input variable.
The FORCECONSTRAINT option applies when there are conflicts between the aggregation constraint and one or more constraints that you specify using the CONSTRAINT= data set, the SIGN= option, or the WEIGHTED option with zero weights. By default, when reconciliation is impossible, PROC HPFRECONCILE copies the input to the OUTFOR= data set without modification. However, if the reconciliation is infeasible because of a conflict between the constraints you specified and the aggregation constraint, you can ask PROC HPFRECONCILE to impose your constraints on the output even though that results in a violation of the aggregation constraint. For example, assume the input is described by the diagram in Figure 10.1 and assume you want to impose the following constraints on the reconciled forecasts:
.
The constraints are clearly in conflict the aggregation constraint
; therefore, PROC HPFRECONCILE will consider the problem infeasible. If you do not specify the FORCECONSTRAINT option, the predicted values in the OUTFOR= data set will equal the input predicted values (that is,
) and the _RECONSTATUS_ variable will take the value 6000. If you specify the FORCECONSTRAINT option, the OUTFOR= data set will contain the values
.
The OUTINFEASIBLE= data set contains summary information about the nodes in the hierarchy for which reconciliation is infeasible because the aggregation constraint is incompatible with the constraints supplied by the user.
The OUTINFEASIBLE= data set is always produced at the level of the AGGDATA= data set.
The OUTINFEASIBLE= data set contains the AGGBY variables present in the AGGDATA= data set, the time ID variable, when it is specified, and the following variables:
variable name
takes value 1 when the node is reconciled, and value 0 when it is not
the predicted value for the parent node
the aggregated prediction of the children nodes
the lower bound implied by the constraints on FINALPREDICT
the upper bound implied by the constraints on FINALPREDICT
If the ID statement is specified, then the values of the ID variable in OUTINFEASIBLE= data set are aligned based on the ALIGN= and INTERVAL= options specified in the ID statement. If ALIGN= option is not specified, then the values are aligned to the beginning of the interval.
The OUTNODESUM= data set contains the BY variables in the AGGDATA= data set (or in the AGGBY statement if the AGGDATA= data set is not specified), the time ID variable in the ID statement when this statement is specified, and the following variables:
variable name
number of nonmissing children of the current AGGBY group
The OUTPROCINFO= data set contains the following variables:
source procedure that produces this data set
stage of the procedure execution for which the summary variable is reported
name of the summary variable
description of the summary variable
value of the summary variable
For PROC HPFRECONCILE , the value of the _SOURCE_ variable is HPFRECONCILE and the value of the _STAGE_ variable is ALL for all observations. It contains observations that corresponds to each of the following values of the _NAME_.
total number of observations subject to reconciliation
number of observations with successful reconciliation
number of observations for which reconciliation failed for PREDICT. This number does not include failures to reconcile due to an infeasible problem or a failed (".F") forecast.
number of observations for which some problem was encountered. This is the number of observations in the OUTFOR= data set that have a _RECONSTATUS_ value greater or equal to 1000.
number of observations for which reconciliation is infeasible due to incompatible constraints
number of observations for which the optimizer did not find an optimal solution
number of observations subject to a locked equality constraint
number of observations for which a locked equality was specified in the CONSTRAINT= data set
number of observations for which a lower bound was imposed
number of observations for which a lower bound was specified in the CONSTRAINT= data set
number of observations for which an upper bound was imposed
number of observations for which an upper bound was specified in the CONSTRAINT= data set
number of observations for which a lower bound is active
number of observations for which an upper bound is active
number of observations for which a failed forecast (".F") was written
total number of possible problems. One reconciliation problem is possible for each distinct value of time ID variable that appears in AGGDATA= or DISAGGDATA= data sets.
number of reconciliation problems for which DISAGGREGATION= option was changed internally because the supplied or default option was not feasible
number of reconciliation problems for which CLMETHOD= option was changed internally because the supplied or default option was not feasible
number of reconciliation problems for which STDMETHOD= option was changed internally because the supplied or default option was not feasible
total number of problems subject to reconciliation
number of infeasible reconciliation problems due to incompatible constraints
number of reconciliation problems for which the optimizer did not find an optimal solution
number of reconciliation problems solved by using the optimizer
minimum value of time ID for which reconciliation was attempted
maximum value of time ID for which reconciliation was attempted
total number of AGGBY groups processed
average number of BY groups per AGGBY group
number of BY groups processed partially because of irregular ID values
number of AGGBY groups processed partially because of irregular ID values
average number of active BY groups per time ID for which reconciliation was attempted
total number of constraints read in the CONSTRAINT= data set
total number of constraints in the CONSTRAINT= data set with time ID values in the [START=,END=] range
number of constraints used
If any of the constraints in the CONSTRAINT= data set is left unmatched and unprocessed, then _VALUE_ for this observation is set to 1; otherwise, it is set to 0.
return code of PROC HPFRECONCILE
Copyright © 2008 by SAS Institute Inc., Cary, NC, USA. All rights reserved.