This paper benchmarks SAS and open-source products to analyze big data by modeling four classification problems from real customers. The products that were benchmarked are SAS Rapid Predictive Modeler (a component of SAS Enterprise Miner), SAS High-Performance Analytics Server (using Hadoop), R and Apache Mahout. Results were compared in terms of model quality, modeler effort, scalability and completeness.
This paper compares the performance of the HPGENSELECT procedure with results cited for the RevoScaleR package by using data that are similar to the insurer's data. The paper also demonstrates the scalability of the HPGENSELECT procedure by using two sizes of data sets and three different computing environments.
This paper discusses the options and methods available for use in High- Performance Data Mining and uses real data for performance benchmarks.