The METHOD statement adds a method for computing recommendations for a recommender system. You can use the METHOD statement to specify details for a method definition, rather than using the default settings for that method.
specifies the name of the method to add to a recommender system. The method name can be one of the following values:
AVERAGE | AVE | AVG | a default method that is used to produce recommendations for users that have insufficient information in the recommender system. For example, a method might require that at least two ratings are on record for a user. If that is not the case, a request for a recommendation is provided with the AVG method. |
SLOPEONE | SLOPE1 | a simple regression-based method. |
NEAREST | KNN | a k-nearest-neighbor method that is based on measures of association between items or users. This method is also called a collaborative filter. |
SVD | a recommender method that is based on a singular-value decomposition of a user-item-ratings matrix. |
ENSEMBLE | a collection of other methods that you specify. |
ARM | a method that performs associative rule mining (ARM). |
CLUSTER | a cluster-based method that uses item or user profiles. Items or users are clustered. Then the similarity information between items or users for each cluster is computed to make recommendations. |
specifies an optional WHERE clause for each method. All of the data is filtered by this WHERE clause.
requests that additional details are provided for the numerically intensive SVD and ensemble methods.
specifies a label by which the method can be identified. A label is important if you have multiple instances of a method definition (with different parameter values) in the recommender system.
specifies a relative function convergence criterion for the numerical optimization in SVD and ensemble methods.
specifies a relative gradient convergence criterion for the numerical optimization in SVD and ensemble methods.
specifies the maximum number of iterations for the numerical optimization in SVD and ensemble methods.
Default | 1 (a one-step update) |
specifies the maximum number of function evaluations for the numerical optimization in SVD and ensemble methods.
specifies the seed for random number generation in SVD and ensemble methods.
specifies the name of the recommender system in the SAS LASR Analytic Server that the procedure works with. Specify a two-level name, similar to a LIBNAME.MEMBER construct.
Alias | RECOM= |
Default | RECOM.SYSTEM |
specifies the number of ratings to hold for users that are selected by the WITHHOLD= option. The specified number of ratings are selected at random to be held in a validation data set, which is a subset of the original data set.
Default | 1 |
Interaction | The HOLD= option is ignored if the WITHHOLD= option is not also specified. |
specifies a relative percentage of users whose ratings are included in a validation data set, which is a subset of the original data set. For example, WITHHOLD=0.1 indicates that 10% of users should be selected at random. A portion of the selected users’ ratings are held in the validation data set. The number of ratings to select is specified by the HOLD= option.
Range | 0–1, exclusive |
specifies the number of ratings to hold for users that are selected by the WITHHOLD= option. The specified number of ratings are selected at random to be held in a validation data set, which is a subset of the original data set.
Default | 1 |
Interaction | The HOLD= option is ignored if the WITHHOLD= option is not also specified. |
specifies the parameter k for a k-nearest-neighbor method. Only the k nearest neighbors are considered in deriving a recommendation for a particular user.
Alias | K= |
requests that only positive associations are used when computing a neighborhood in a k-nearest-neighbor method.
Alias | POSITIVE |
specifies the type of prefiltering to apply when computing a neighborhood. If you specify PREFILTER=TOP(n), then a list of only the n nearest neighbors and their similarities are kept. If you specify PREFILTER=THRESHOLD(r), then the list of nearest neighbors includes items or users with similarities that exceed the threshold value r. If you specify PREFILTER=NONE, then neighborhoods are formed based on all similarities.
Default | TOP(10) |
specifies the similarity measure that is used in k-nearest-neighbor collaborative filtering. If you specify SIMILARITY=COSINE (or COS or CV), then the cosine measure is the similarity measure. If you specify SIMILARITY=CORR (or PEARSON or PC), then the Pearson’s correlation coefficient, or product-moment correlation, is the similarity measure. If you specify SIMILARITY=ADJCOS (orAC), then the adjusted cosine measure is the similarity measure. For more information, see How Similarity Measures Are Calculated.
specifies a relative percentage of users whose ratings are included in a validation data set, which is a subset of the original data set. For example, WITHHOLD=0.1 indicates that 10% of users should be selected at random. A portion of the selected users’ ratings are held in the validation data set. The number of ratings to select is specified by the HOLD= option.
Range | 0–1, exclusive |
specifies a rule to generate a binary rating. If a numeric rating exceeds n, then the binary rating is set to 1. Otherwise, the binary rating is set to 0.
specifies a weighting factor for the squared errors in the loss function of the matrix factorization.
specifies the number of features for the user-item matrix. Values between 50 and 100 are typical. A value as low as 10 is useful for evaluating this option. Larger values increase the computational complexity and require more time to run.
specifies the number of ratings to hold for users that are selected by the WITHHOLD= option. The specified number of ratings are selected at random to be held in a validation data set, which is a subset of the original data set.
Default | 1 |
Interaction | The HOLD= option is ignored if the WITHHOLD= option is not also specified. |
specifies the loss function for the matrix factorization. The LOSS=SE option indicates that the squared-error function is the loss function. The LOSS=SEREG and LOSS=SEWREG options are modifications of the squared-error loss function that include regularization terms in matrix norms or weighted matrix norms, respectively. Weighted regularization terms are weighted by λ, and you set the value of this parameter with the LAMBDA= option. The LOSS=KL (or ENTROPY) option indicates that the Kullback-Leibler divergence, or relative entropy, is the loss function.
specifies the regularization factor for the loss functions.
Applies to | LOSS=SEREG or LOSS=SEWREG |
specifies the optimization method for the singular-value decomposition. The TECHNIQUE=LBFGS option indicates a limited-memory Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimization method. This method is often used for solving neural network problems. The TECHNIQUE=ALS option indicates an alternating least squares optimization method.
specifies a relative percentage of users whose ratings are included in a validation data set, which is a subset of the original data set. For example, WITHHOLD=0.1 indicates that 10% of users should be selected at random. A portion of the selected users’ ratings are held in the validation data set. The number of ratings to select is specified by the HOLD= option.
Range | 0–1, exclusive |
restricts the weights in the ensemble to lie between 0 and 1.
specifies the number of ratings to hold for users that are selected by the WITHHOLD= option. The specified number of ratings are selected at random to be held in a validation data set, which is a subset of the original data set.
Default | 1 |
Interaction | The HOLD= option is ignored if the WITHHOLD= option is not also specified. |
specifies the methods that participate in the ensemble. Enclose each method in quotation marks, and separate multiple values with a comma.
Default | All methods except the AVERAGE method. |
Restriction | The AVERAGE method is not part of any ensemble. |
specifies a relative percentage of users whose ratings are included in a validation data set, which is a subset of the original data set. For example, WITHHOLD=0.1 indicates that 10% of users should be selected at random. A portion of the selected users’ ratings are held in the validation data set. The number of ratings to select is specified by the HOLD= option.
Range | 0–1, exclusive |
specifies the maximum number of points in each bubble. This number must exceed the value of the BUBMINPTS= option.
specifies the minimum number of points in each bubble.
Default | 1 |
generates the temporary table that contains the cluster results for each user or item.
lists the variables to use with the CLUSTER method.
specifies the clustering technique.
Default | KMEANS |
specifies the convergence criterion c for the k-means analysis. When the relative change in WCSS between successive iterations is less than c, the analysis is presumed to have converged.
Default | 0.00001 |
specifies the distance measure that is used in the clustering method. The k-means method uses DIST=EUC.
Applies to | CLUSTERTECH=DBSCAN |
specifies the maximum diameter of bubbles with the given distance measure.
Default | 0 |
specifies the distance value for neighborhood querying. For more information, see CLUSTER Statement.
Applies to | CLUSTERTECH=DBSCAN |
specifies the method for obtaining the initial estimate of cluster assignment. For more information, see CLUSTER Statement.
Alias | INIT= |
specifies the minimum number of points that are required in one cluster.
Applies to | CLUSTERTECH=DBSCAN |
specifies that the comparisons between terms and the values of character variables are case insensitive. By default, comparisons are case-sensitive.
specifies that only the term frequency is used to construct the vectors and that inverse document frequency is not used.
specifies that the TF-IDF vectors are not normalized.
specifies the number of representative points for each bubble.
Default | 1 |
specifies the number of clusters for the k-means analysis.
Alias | NUMCLUS= |
Default | 2 |
saves the TF-IDF vectors in the temporary table when the CLUSTINFO option is enabled.
specifies terms that are used to compute term frequency. Each string represents one term. For more information, see CLUSTER Statement.
specifies an in-memory table in the server that contains the term list. For more information, see CLUSTER Statement.
specifies the tokens that separate terms when scanning character variables. For more information, see CLUSTER Statement.
specifies an in-memory table in the server that contains the tokens list.
specifies which type of profile is used for the CLUSTER method. The CLUSTER method that uses a user profile table cannot be used in the ensemble model with other methods.
Requirement | The user or item table must be added into the recommender system. |