Using the Sample Data

About the Sample Data

Several sample projects and data sets are included with SAS/GRAPH Network Visualization Workshop.
The samples are located in the Samples directory of your installation. This directory contains the following subdirectories:
Projects
Contains the sample project files. When you open a project file, SAS/GRAPH Network Visualization Workshop automatically opens the data sets and all graphs that have been associated with the project.
Data
Contains the sample data sets. You can load individual data sets without opening a project.
You can explore the following samples:

Credit Card Fraud

This sample enables you to investigate patterns of credit card fraud. The data contains transactions, both valid and fraudulent, related to customers with at least one fraudulent credit card transaction.
This sample uses the following files:
Project File:
ccFraudData.nvw
Link Data Set:
cclinks.sas7bdat
Node Data Set:
ccnodes.sas7bdat
For descriptions of the variables and an example use case, see Example Use Case: Credit Card Fraud.

Credit Card Fraud (Simplified)

This sample is similar to the previous sample but contains less data and is easier to examine. For this sample, the node data is generated from the link data set.
This sample uses the following files:
Project File:
Fraud.nvw
Link Data Set:
fraud.sas7bdat
Node Data Set:
auto-generated
The following table summarizes the variables in the link data set:
Link Variables for the Credit Card Fraud Sample
Variable
Description
Customer ID
Serves as the From variable that originates the link.
Transaction ID
A value that identifies every transaction.
Merchant ID
Serves as the To variable that terminates the link.
Fraud
Indicates whether the transaction is fraudulent (1) or not fraudulent (0). This variable also determines link colors.
SAS/GRAPH Network Visualization Workshop generates the node data from the link data. The following table summarizes the variables in the node data set.
Node Variables for the Credit Card Fraud Sample
Variable
Description
Node
Serves as the Node ID variable that identifies all customers and merchants.
Label
Determines the text that appears when you apply a label to nodes in a network graph.
In Arcs; Out Arcs; Total Arcs
Created by SAS/GRAPH Network Visualization Workshop to keep track of the links that are associated with each node.
Here are suggestions for exploring this data:
  1. Open the project. The project loads the two data sets and opens a hierarchical network graph.
  2. Use the label tool to identify customers and merchants in the graph. Observe that the fraudulent transactions correspond to different customers that have transactions with a common merchant.
    Although this approach is not proof of fraud, it identifies specific merchants that had access to customer credit card numbers that have fraudulent transactions associated with them. These merchants warrant further investigation.

Computer Grid

This sample represents a mock physical layout of computer circuitry that is undergoing operational testing. This sample is useful for exploring fixed position and hexagonal network graphs. The sample can also be used to demonstrate the power of local selection.
This sample uses the following files:
Project File:
ComputerGrid.nvw
Link Data Set:
gridlink4.sas7bdat
Node Data Set:
gridnodes.sas7bdat
Position File:
gridnodes.sas7bdat
The following table summarizes the variables in the link data set:
Link Variables for the Computer Grid Sample
Variable
Description
From
Serves as the From variable that originates the link.
To
Serves as the To variable that terminates the link.
Failures
Provides the number of times that the connection failed.
Tests
Provides the number of times that the connection was tested.
Weight3
Indicates an arbitrary variable that is available for the purpose of exploring the data.
Pcnt_fail
Indicates failure as a percentage of the number of failures divided by the number of tests.
The following table summarizes the variables in the node data set:
Node Variables for the Computer Grid Sample
Variable
Description
Node
Serves as the Node ID variable that identifies all computers in the data. This variable also determines the text that appears when you apply a label to nodes in a network graph.
Weight;Weight2;Weight3
Indicates arbitrary variables that are available for the purpose of exploring data that has multiple variables.
X
Provides the X coordinate for a fixed position network graph.
Y
Provides the Y coordinate for a fixed position network graph.
Here are suggestions for exploring this data:
  1. Open the project. The project loads the two data sets and the graphs that are associated with the project, including a histogram and a fixed position network graph.
  2. In the histogram of failure percentages, select the bars with an X coordinate greater than six. The network graph changes to display only those links that correspond to a significant failure rate (more than six percent).
  3. Explore multiple points of potential failure by using local selection mode:
    1. Change to local selection mode. See Configure Local Selection Mode.
    2. Change both tables to the observer-union role. The network graph already has the observer-union role.
    3. In the histogram of tests, select the first two bars.
    4. In the histogram of failure percentages, select the bars with an X coordinate greater than six percent.
    The network graph now shows which links warrant closer scrutiny either because they have high failure percentages or because they have not received sufficient testing.

Board of Directors

This sample enables you to investigate relationships among boards of directors for Fortune 100 companies. In particular, you can detect directors who belong to several boards.
This sample uses the following files:
Project File:
BoardOfDirectors.nvw
Link Data Set:
bdlinks.sas7bdat
Node Data Set:
bdnodes.sas7bdat
The following table summarizes the variables in the link data set:
Link Variables for the Board of Directors Sample
Variable
Description
Individual
Serves as the From variable that originates the link.
Corporation
Serves as the To variable that terminates the link.
Board Chairman
Indicates whether the individual is (1) or is not (0) the board chairman of the corresponding corporation.
CEO
Indicates whether the individual is (1) or is not (0) the chief executive officer of the corresponding corporation.
Number of Boards
Provides the number of boards of which the individual is a member. This variable also determines link colors.
Board Size
Provides the size of the board.
The following table summarizes the variables in the node data set:
Node Variables for the Board of Directors Sample
Variable
Description
Category
Classifies the corporations into categories such as retail, energy, manufacturing, and others. For individual people, this variable identifies the node as an individual and also indicates which individuals are chief executive officers.
Name
Serves as the Node ID variable that identifies all individuals and corporations.
Full Name
Provides the first and last name for individuals and the full company name for corporations.
Type
Indicates whether the node is an individual (I) or a corporation (C). This variable also determines the shapes of node markers.
Number of Boards
Provides the number of boards of which the individual is a member. For a corporation, this variable gives the size of the corporation's board.
Revenue
Provides the revenues for corporations. For individuals, this value is ignored.
Here are suggestions for exploring this data:
  1. Open the project. The project loads the two data sets and opens associated graphs, including a hierarchical network graph.
  2. Create a histogram based on link data, and select NUMBER OF BOARDS as the X variable.
  3. In the histogram, select the bars with an X coordinate greater than or equal to four. The network graph changes to display only those links that correspond to multiple boards (four or more).

Web Path Data

This sample enables you to investigate Web path data. You can explore frequently visited sites and frequently followed links.
The sample uses the following files:
Project File:
WebPathData.nvw
Link Data Set:
webpath_links.sas7bdat
Node Data Set:
webpath_nodes.sas7bdat
The following table summarizes the variables in the link data set:
Link Variables for the Web Path Sample
Variable
Description
ID1
Serves as the From variable that originates the link.
ID2
Serves as the To variable that terminates the link.
LinkID
Represents the ID for a link.
Count
Indicates the number of times the particular path (ID1 to ID2) is followed. This variable acts as a weight for the links.
The following table summarizes the variables in the node data set:
Node Variables for the Web Path Sample
Variable
Description
ID
Provides a unique ID for each Web page. This variable serves as the Node ID variable in the Edit Data Attributes dialog box.
Count
Represents the number of hits for the particular Web page.
Value
Represents the actual Web page name.
For an example use case, see Example Use Case: Web Path Data.