About Data Sources

This section describes the format and structure of network data and provides some examples of real-world network data. This section also explains how you can use SAS/GRAPH Network Visualization Workshop to explore non-network relational data.

Network Data

SAS/GRAPH Network Visualization Workshop uses SAS data sets to define a network. In simple terms, a network is a system of interconnected items, and network data is the information that describes such a system.
Network data consists of two types of data:
Network Data
Data Type
What the Data Contains
Node
Information about the items that are connected
Link
Information about the connections between nodes
Many real-world problems can be represented by using a collection of nodes and links. Common examples include supply chains, Web sites, database schema, communication networks, and software module dependencies. For a supply chain, the nodes might represent manufacturing plants, warehouses, and customer locations. The links might represent the flow of goods or products between the locations. For a communications network, the nodes might be switches, routers, and other hardware devices with attributes such as capacity, device type, traffic volume, and number of dropped packets. The links might represent transmission facilities or media connecting the nodes with attributes such as failure rates, error rates, and traffic volume.
To create a network graph, SAS/GRAPH Network Visualization Workshop requires two SAS data sets: a link data set and a node data set. Together these data sets constitute the network data. Each link connects two nodes, though a node can have multiple connecting links. There can be thousands of links connecting thousands of nodes.
SAS/GRAPH Network Visualization Workshop also enables you to create statistical graphs that are based on either node data or link data. For more information, see Non-Network Data .

Node Data

A node data set defines the nodes in a network. In this data set, each row represents one node in the network. At a minimum, the node data set must contain a node identifier variable. You specify the name of this variable when you set data attributes. For details, see Specify Data Attributes.
SAS/GRAPH Network Visualization Workshop provides the capability to automatically create a minimal node data set from the link data set that you provide. However, you might want to provide your own node data set. To be truly useful, the node data set also includes one or more variables for attributes that describe the nodes. These variables can be useful in several ways:
  • When you set data attributes, you can specify one of these variables to use for node colors and the same or a different variable to be used for node shapes. You can also specify a variable to be used for text that appears when you apply labels to a network graph.
  • You can use these variables in conjunction with statistical, non-network graphs to filter data.
  • You can use these variables in data tables to sort the data.

Link Data

A link data set defines the links in the network. In this data set, each row represents one link between two nodes in the network.
The data set must contain at least two variables to identify the link. The values of these variables must be node identifiers; that is, the values for these variables must exist in the node identifier column in the corresponding node data set. The variables perform a From and a To role. The From variable lists the node at which the link originates, and the To variable lists the node at which the link terminates. You specify the names for these variables when you set data attributes. For details, see Specify Data Attributes.
Other variables in the link data set can be used to store attributes related to the links. These variables can be useful in several ways:
  • When you set data attributes, you can specify one of these variables to use for link colors.
  • You can use these variables in conjunction with statistical, non-network graphs to filter data.
  • You can use these variables in data tables to sort the data.

Non-Network Data

In addition to using SAS/GRAPH Network Visualization Workshop to explore network data, you can also explore standard relational data that does not constitute a network. The data is based on a node or a link data set.
You can investigate relational data in the following ways:
  • explore and manipulate data in the data table
  • create one or more statistical graphs based on node or link data
  • use a combination of statistical graphics with the data table to selectively view and filter your data
  • use different selection modes on your data table and statistical graphics to graphically subset data at multiple levels