The OPTGRAPH Procedure

Example 1.13 Transitive Closure for Identification of Circular Dependencies in a Bug Tracking System

Most software bug tracking systems have some notion of duplicate bugs in which one bug is declared to be the same as another bug. If bug A is considered a duplicate (DUP) of bug B, then a fix for B would also fix A. You can represent the DUPs in a bug tracking system as a directed graph where you add a link $A \rightarrow B$ if A is a DUP of B.

The bug tracking system needs to check for two situations when users declare a bug to be a DUP. The first situation is called a circular dependence. Consider bugs A, B, C, and D in the tracking system. The first user declares that A is a DUP of B and that C is a DUP of D. Then, a second user declares that B is a DUP of C, and a third user declares that D is a DUP of A. You now have a circular dependence, and no primary bug is defined on which the development team should focus. You can easily see this circular dependence in the graph representation, because $A \rightarrow B \rightarrow C \rightarrow D \rightarrow A$. Finding such circular dependencies can be done using cycle detection, which is described in the section Cycle. However, the second situation that needs to be checked is more general. If a user declares that A is a DUP of B and another user declares that B is a DUP of C, this chain of duplicates is already an issue. The bug tracking system needs to provide one primary bug to which the rest of the bugs are duplicated. The existence of these chains can be identified by calculating the transitive closure of the directed graph that is defined by the DUP links.

Given the original directed graph G (defined by the DUP links) and its transitive closure $G^ T$, any link in $G^ T$ that is not in G exists because of some chain that is present in G.

Consider the following data that define some duplicated bugs (called defects) in a small sample of the bug tracking system:

data DefectLinks;
   input defectId $ linkedDefect $ linkType $ when datetime16.;
   format when datetime16.;
   datalines;
D0096978 S0711218 DUPTO 20OCT10:00:00:00
S0152674 S0153280 DUPTO 30MAY02:00:00:00
S0153280 S0153307 DUPTO 30MAY02:00:00:00
S0153307 S0152674 DUPTO 30MAY02:00:00:00
S0162973 S0162978 DUPTO 29NOV10:16:13:16
S0162978 S0165405 DUPTO 29NOV10:16:13:16
S0325026 S0575748 DUPTO 01JUN10:00:00:00
S0347945 S0346582 DUPTO 03MAR06:00:00:00
S0350596 S0346582 DUPTO 21MAR06:00:00:00
S0539744 S0643230 DUPTO 10MAY10:00:00:00
S0575748 S0643230 DUPTO 15JUN10:00:00:00
S0629984 S0643230 DUPTO 01JUN10:00:00:00
;

The following statements calculate cycles in addition to the transitive closure of the graph G that is defined by the duplicated defects in DefectLinks. The output data set Cycles contains any circular dependencies, and the data set TransClosure contains the transitive closure $G^ T$. To identify the chains, you can use PROC SQL to identify the links in $G^ T$ that are not in G.

proc optgraph
   loglevel        = moderate
   graph_direction = directed
   data_links      = DefectLinks;
   data_links_var
      from         = defectId
      to           = linkedDefect;
   cycle
      out          = Cycles
      mode         = all_cycles;
   transitive_closure
      out          = TransClosure;
run;
%put &_OPTGRAPH_;
%put &_OPTGRAPH_CYCLE_;
%put &_OPTGRAPH_TRANSCL_;

proc sql;
   create table Chains as
   select defectId, linkedDefect from TransClosure
      except
   select defectId, linkedDefect from DefectLinks;
quit;

The progress of the procedure is shown in Output 1.13.1.

Output 1.13.1: PROC OPTGRAPH Log: Transitive Closure for Identification of Circular Dependencies in a Bug Tracking System

NOTE: ------------------------------------------------------------------------------------------
NOTE: ------------------------------------------------------------------------------------------
NOTE: Running OPTGRAPH version 14.1.                                                            
NOTE: ------------------------------------------------------------------------------------------
NOTE: ------------------------------------------------------------------------------------------
NOTE: The OPTGRAPH procedure is executing in single-machine mode.                               
NOTE: ------------------------------------------------------------------------------------------
NOTE: ------------------------------------------------------------------------------------------
NOTE: Reading the links data set.                                                               
NOTE: There were 12 observations read from the data set WORK.DEFECTLINKS.                       
NOTE: Data input used 0.01 (cpu: 0.00) seconds.                                                 
NOTE: Building the input graph storage used 0.00 (cpu: 0.00) seconds.                           
NOTE: The input graph storage is using 0.3 MBs of memory.                                       
NOTE: The number of nodes in the input graph is 16.                                             
NOTE: The number of links in the input graph is 12.                                             
NOTE: ------------------------------------------------------------------------------------------
NOTE: ------------------------------------------------------------------------------------------
NOTE: Processing cycle detection.                                                               
NOTE: The graph has 1 cycle.                                                                    
NOTE: Processing cycle detection used 0.00 (cpu: 0.00) seconds.                                 
NOTE: ------------------------------------------------------------------------------------------
NOTE: ------------------------------------------------------------------------------------------
NOTE: Processing the transitive closure.                                                        
NOTE: Processing the transitive closure used 0.00 (cpu: 0.00) seconds.                          
NOTE: ------------------------------------------------------------------------------------------
NOTE: ------------------------------------------------------------------------------------------
NOTE: Creating cycle data set output.                                                           
NOTE: Creating transitive closure data set output.                                              
NOTE: Data output used 0.00 (cpu: 0.00) seconds.                                                
NOTE: ------------------------------------------------------------------------------------------
NOTE: ------------------------------------------------------------------------------------------
NOTE: The data set WORK.CYCLES has 4 observations and 3 variables.                              
NOTE: The data set WORK.TRANSCLOSURE has 20 observations and 2 variables.                       
STATUS=OK  CYCLE=OK  TRANSCL=OK                                                                 
STATUS=OK  NUM_CYCLES=1  CPU_TIME=0.00  REAL_TIME=0.00                                          
STATUS=OK  CPU_TIME=0.00  REAL_TIME=0.00                                                        
NOTE: Table WORK.CHAINS created, with 8 rows and 2 columns.                                     



The data set Cycles contains one case of a circular dependence in which the DUPs start and end at S0152674.

Output 1.13.2: Cycle in Bug Tracking System

cycle order node
1 1 S0152674
1 2 S0153280
1 3 S0153307
1 4 S0152674



The data set Chains contains the chains in the bug tracking system that come from the links in $G^ T$ that are not in G.

Output 1.13.3: Chains in Bug Tracking System

defectId linkedDefect
S0152674 S0152674
S0152674 S0153307
S0153280 S0152674
S0153280 S0153280
S0153307 S0153280
S0153307 S0153307
S0162973 S0165405
S0325026 S0643230