This sample describes how to use PROC RECOMMEND to build and deploy recommendation models. A recommender system is an engine that provides personalized suggestions and recommendations to individual users from a huge number of potential choices of items or it can make suggestions to items from a vast pool of users. One of the main features of these systems is that they rely on understanding user preferences in order to estimate the utility of items and decide whether they should be recommended. These user preferences are inferred by taking into account direct feedback from the user, either in explicit or implicit form. Explicit feedback refers to a user's preferences that are made known when the user rates subsets of the available items on a fixed scale. A scale of one to five stars is a common rating practice for explicit feedback. Implicit feedback, however, infers preferences from features like browsing history, page view counts, or purchases. These are examples where the user does not express a preference explicitly.
The recommendations relate to various decision-making processes, such as what movies to watch, what items to buy, what music to listen to, what papers or online news to read, or what people to contact for advertising and promoting an item. "Item" is the general term used to denote what the system recommends to users, which can include products, services, packages and so on.
Collaborative filtering is the main technique used by PROC RECOMMEND. Collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating). The underlying assumption of the collaborative filtering approach is that if user A has the same opinion as user B on an item, then A is more likely to share B's opinion on a different item X than to share an opinion with a person chosen randomly. PROC RECOMMEND takes the explicit ratings of users for items as inputs and outputs the recommendations for individual users.
PROC RECOMMEND supports different types of methods:
Details about the methods and an introduction can be found in SAS LASR Analytic Server: Reference Guide.
This sample demonstrates creating a recommender system with the MovieLens 1M data set from GroupLens Research. The data set has 1,000,209 anonymous ratings of approximately 3,900 movies made by 6,040 MovieLens users. The users' ratings for movies are on a 1 to 5 scale, with 5 being the highest rating and 1 the lowest rating. The movie profile contains some simple information about each movie such as the year of production, title, and category.
This example demonstrates two different methods for generating recommendations from data with explicit ratings. The PROC RECOMMEND statement starts the recommender system. Each METHOD statement specifies a recommendation model to build. There are different options corresponding to the different methods.
For the KNN method, the K=option specifies the number of neighbors used. The SIMILARITY=PC option specifies that a Pearson Coefficient similarity is used when computing the distance of two items. The FACTORS=20 option specifies the latent factors of matrix factorization.
The WITHHOLD = option in the ENSEMBLE method and SVD method specifies the proportion of users that hold the number of ratings defined by the HOLD= option. By specifying the WITHHOLD= and HOLD= options, a certain number of ratings are held out for validation. In this example, ten percent of users are selected. For each of the selected users, one rating is held out for validation and the remaining ratings and users are put back into the training data set.
The NUM= option in the PREDICT statement specifies the number of recommendations to generate for each user that is specified in the USERS= option. Several different methods can be added to the recommender system, each receiving a name with the LABEL=option to the METHOD statement. When you use the PREDICT statement, you can specify the same name with the LABEL= option to use the specified model.
It is important to examine the model performance before you proceed with using a model to generate recommendations. PROC RECOMMEND produces ODS tables that show the performance of the models and enables you to assess the model performance with several different criteria. For the KNN method, the ENSEMBLE method is used to evaluate the numeric performance of the training and validation data sets. The evaluation is performed by specifying the METHODS=(“KNN”) option and the CONSTRAINT option. The numeric evaluation for KNN is listed in Table 1.
Table 1: Movie Lens - Numerical Evaluations using Method KNN
The model performance for the SVD method is evaluated directly within the method. The example code shows using the ODS OUTPUT statement to save the numeric performance table as a SAS table that can be used for visual examination. Figure 1 shows a plot of the root mean square error and mean absolute error as a function of iterations for the training and validation data sets for the SVD method.
A model might not be used if it does not perform well according to the selected numeric criteria. In this example, the KNN method has lower values (RMSE: 0.7705) of root means square error than the SVD method (RMSE: 0.9151) for holdout sample. Because of the better performance the example code shows the KNN method being used for scoring and generating recommendations with the PREDICT statement.
Figure 1(a): Root Mean Square Error for SVD Figure 1(b): Mean Absolute Error for SVD
The PREDICT statement produces a recommendations tables for each user. See Table 2 for recommendations using the KNN method.
Table 2: Movie Lens -Recommendation Table Using Method KNN
This example provides the PROC RECOMMEND syntax for creating a recommender system. This section includes a few helpful hints for working with data and viewing data in SAS LASR Analytic Server.
To view the rating history for users 1 and 33 that were used for generating recommendations, you can use the following example code.
/* Continue to use the lasr libref from the previous code example */ /* Use SCHEMA to join the ratings and movies tables. */ proc imstat data=lasr.ratings; schema movies (movieid=movieid); run; /* Find the number of ratings for users 1 and 33 and view them. */ table lasr.&_templast_; where userid in (1 33); numrows / save=numrowstab; run; store numrowstab(_last_, 1) = obscount; run; fetch / format orderby=(userid rating movies_genres) descending=(rating) to=&obscount.; run; quit;
SAS Institute Inc. 2014. SAS® LASR™ Analytic Server 2.4: Reference Guide. Cary, NC: SAS Institute Inc. Available at http://support.sas.com/documentation/cdl/en/inmsref/67597/HTML/default/viewer.htm
GroupLens Research. MovieLens data sets. Available at http://grouplens.org/datasets/movielens/
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
/* Start a LASR Analytic Server */
options set=GRIDHOST="grid001.example.com";
options set=GRIDINSTALLLOC="/opt/TKGrid";
%Let portNumber = 10010;
libname hdfs sashdat path="/hps";
proc lasr create path="/tmp" port=&portNumber noclass;
performance nodes=all;
run;
/* Load rating table into memory */
proc lasr add port=&portNumber data = hdfs.ratings;
/* Load movies table into memory */
proc lasr add port=&portNumber data = hdfs.movies;
run;
/* Assign a libref to access tables in the server */
libname lasr sasiola port = &portNumber tag="hps";
/* Invoke PROC RECOMMEND */
proc recommend port=&portNumber recom = rs.movie;
/* Add a new recommendation project */
add rs.movie / item = movieid user = userid rating = rating;
/* Add tables */
addtable lasr.ratings / recom = rs.movie type = rating vars = (movieid userid rating);
addtable lasr.movies / recom = rs.movie type = item;
run;
/* Method -- Develop KNN model with neighbors 20 */
method knn / label="knn"
k=20
positive
similarity=pc
seed=1234;
/* Method -- Use ensemble to evaluate KNN for training and validation*/
method ensemble /label="knn_eva"
methods=("knn")
withhold=0.1
hold=1
details
constraint
seed=1234
fconv=1e-3
gconv=1e-3
maxiter=100
;
run;
/* Method -- Develop and evaluate a svd LBFGS with 20 factors for training and validation */
method svd / label="svd"
factors=20
fconv=1e-3
gconv=1e-3
maxiter=100
seed=1234
MAXFEVAL=5000
function=L2
lamda=0.2
technique=lbfgs
withhold=0.1
hold=1
details
;
ods output RecommenderFuncEvalInfo = movie_funcEval_svd;
run;
/* Score and make recommendations with KNN */
predict /label = "knn"
method = knn
Num = 5
users = ("1","33");
run;
remove rs.movie;
run;
quit;
/* Plot the numeric results of svd */
proc sgplot data = movie_funcEval_svd;
title "Movie Lens - Matrix Factorization Model";
title2 "Root Mean Square Error";
series x=NumFunc y=RMSE /MARKERS LINEATTRS = (THICKNESS = 2) legendlabel ="RMSE for Training Data";
series x=NumFunc y=RMSEhold /MARKERS LINEATTRS = (THICKNESS = 2) legendlabel ="RMSE for Hold Out";
xaxis label = 'Number of Iterations' type = discrete grid values = (0 to 100 by 5);
yaxis label = 'Root Mean Square Error';
run;
proc sgplot data = movie_funcEval_svd;
title "Movie Lens - Matrix Factorization Model";
title2 "Mean Absolute Error";
series x=NumFunc y=MAE /MARKERS LINEATTRS = (THICKNESS = 2) legendlabel ="MAE for Traing Data";
series x=NumFunc y=MAEhold /MARKERS LINEATTRS = (THICKNESS = 2) legendlabel ="MAE for Hold Out";
xaxis label = 'Number of Iterations' type = discrete grid values = (0 to 100 by 5) ;
yaxis label = 'Mean Absolute Error';
run;
/* Stop the LASR Analytic Server */
proc lasr term port = &portNumber;
run;
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
Table 1: Movie Lens - Numerical Evaluations using Method KNN Figure 1(a): Root Mean Square Error for SVD Figure 1(b): Mean Absolute Error for SVD Table 2: Movie Lens -Recommendation Table Using Method KNN
Type: | Sample |
Topic: | Analytics ==> Data Mining |
Date Modified: | 2014-08-18 14:34:04 |
Date Created: | 2014-08-14 09:50:00 |
Product Family | Product | Host | Product Release | SAS Release | ||
Starting | Ending | Starting | Ending | |||
SAS System | SAS LASR Analytic Server | Solaris for x64 | 2.3_M1 | 9.4 TS1M2 | ||
Linux for x64 | 2.3_M1 | 9.4 TS1M2 | ||||
64-bit Enabled Solaris | 2.3_M1 | 9.4 TS1M2 | ||||
64-bit Enabled AIX | 2.3_M1 | 9.4 TS1M2 | ||||
Microsoft® Windows® for x64 | 2.3_M1 | 9.4 TS1M2 | ||||
SAS System | SAS In-Memory Statistics for Hadoop | Solaris for x64 | 2.2 | 9.4 TS1M2 | ||
Linux for x64 | 2.2 | 9.4 TS1M2 | ||||
64-bit Enabled Solaris | 2.2 | 9.4 TS1M2 | ||||
Microsoft® Windows® for x64 | 2.2 | 9.4 TS1M2 | ||||
64-bit Enabled AIX | 2.2 | 9.4 TS1M2 |