This example uses Zachary’s Karate Club data (Zachary 1977), which describes social network friendships between 34 members of a karate club at a U.S. university in the 1970s. This is one of the standard publicly available data sets for testing community detection algorithms. It contains 34 nodes and 78 links. The graph is shown in Figure 1.144.
Figure 1.144: Zachary’s Karate Club Graph
The graph can be represented using the following links data set LinkSetIn
:
data LinkSetIn; input from to weight @@; datalines; 0 9 1 0 10 1 0 14 1 0 15 1 0 16 1 0 19 1 0 20 1 0 21 1 0 23 1 0 24 1 0 27 1 0 28 1 0 29 1 0 30 1 0 31 1 0 32 1 0 33 1 2 1 1 3 1 1 3 2 1 4 1 1 4 2 1 4 3 1 5 1 1 6 1 1 7 1 1 7 5 1 7 6 1 8 1 1 8 2 1 8 3 1 8 4 1 9 1 1 9 3 1 10 3 1 11 1 1 11 5 1 11 6 1 12 1 1 13 1 1 13 4 1 14 1 1 14 2 1 14 3 1 14 4 1 17 6 1 17 7 1 18 1 1 18 2 1 20 1 1 20 2 1 22 1 1 22 2 1 26 24 1 26 25 1 28 3 1 28 24 1 28 25 1 29 3 1 30 24 1 30 27 1 31 2 1 31 9 1 32 1 1 32 25 1 32 26 1 32 29 1 33 3 1 33 9 1 33 15 1 33 16 1 33 19 1 33 21 1 33 23 1 33 24 1 33 30 1 33 31 1 33 32 1 ;
The following statements use the RESOLUTION_LIST= option to represent resolution levels (1, 0.5) in community detection on the Karate Club data. For more information about resolution levels, see the section Resolution List.
proc optgraph data_links = LinkSetIn out_nodes = NodeSetOut graph_internal_format = thin; community resolution_list = 1.0 0.5 out_level = CommLevelOut out_community = CommOut out_overlap = CommOverlapOut out_comm_links = CommLinksOut; run;
The data set NodeSetOut
contains the community identifier of each node. It is shown in Output 1.7.1.
Output 1.7.1: Community Nodes Output
Column community_1
contains the community identifier of each node when the resolution value is 1.0; column community_2
contains the community identifier of each node when the resolution value is 0.5. Different node colors are used to represent
different communities in Figure 1.145 and Figure 1.146. As you can see from the figures, four communities at resolution 1.0 are merged to two communities at resolution 0.5.
Figure 1.145: Karate Club Communities (Resolution = 1.0)
Figure 1.146: Karate Club Communities (Resolution = 0.5)
The data set CommLevelOut
contains the number of communities and the corresponding modularity values found at each resolution level. It is shown in
Output 1.7.2.
Output 1.7.2: Community Level Summary Output
The data set CommOut
contains the number of nodes contained in each community. It is shown in Output 1.7.3.
Output 1.7.3: Community Number of Nodes Output
The data set CommOverlapOut
contains the intensity of each node that belongs to multiple communities. It is shown in Output 1.7.4. Note that only the communities in the last resolution level (the smallest resolution value) are output in this data set.
In this example, Node 0 belongs to two communities, with 82.3% of its links connecting to Community 0, and 17.6% of its links
connecting to Community 1.
Output 1.7.4: Community Overlap Output
node | community | intensity |
---|---|---|
0 | 0 | 0.82353 |
0 | 1 | 0.17647 |
9 | 0 | 0.60000 |
9 | 1 | 0.40000 |
10 | 0 | 0.50000 |
10 | 1 | 0.50000 |
14 | 0 | 0.20000 |
14 | 1 | 0.80000 |
15 | 0 | 1.00000 |
16 | 0 | 1.00000 |
19 | 0 | 1.00000 |
20 | 0 | 0.33333 |
20 | 1 | 0.66667 |
21 | 0 | 1.00000 |
23 | 0 | 1.00000 |
24 | 0 | 1.00000 |
27 | 0 | 1.00000 |
28 | 0 | 0.75000 |
28 | 1 | 0.25000 |
29 | 0 | 0.66667 |
29 | 1 | 0.33333 |
30 | 0 | 1.00000 |
31 | 0 | 0.75000 |
31 | 1 | 0.25000 |
32 | 0 | 0.83333 |
32 | 1 | 0.16667 |
33 | 0 | 0.91667 |
33 | 1 | 0.08333 |
2 | 0 | 0.11111 |
2 | 1 | 0.88889 |
1 | 0 | 0.12500 |
1 | 1 | 0.87500 |
3 | 0 | 0.40000 |
3 | 1 | 0.60000 |
4 | 1 | 1.00000 |
5 | 1 | 1.00000 |
6 | 1 | 1.00000 |
7 | 1 | 1.00000 |
8 | 1 | 1.00000 |
11 | 1 | 1.00000 |
12 | 1 | 1.00000 |
13 | 1 | 1.00000 |
17 | 1 | 1.00000 |
18 | 1 | 1.00000 |
22 | 1 | 1.00000 |
26 | 0 | 1.00000 |
25 | 0 | 1.00000 |
The data set CommLinksOut
shows how the communities are interconnected. It is shown in Output 1.7.5. In this example, when the resolution value is 1, the link weight between Communities 0 and 1 is 7, and the link weight between
Communities 1 and 2 is 4.
Output 1.7.5: Community Links Output