White Paper

Resource Standard Metrics™

Source Code Baseline Differential Metrics

 

 

 

 

 

 

 

 

Document Version 1.2

RSM Version 6.63

Date: 04/30/2005

© 2006 M Squared Technologies™ LLC

 

 


Table of Contents

Introduction. 3

Windows Quick Start Step By Step

Linux/Unix Quick Start Step By Step

Baseline Metrics Differentials. 3

The Baseline Tree. 3

New Files. 4

Removed Files. 4

Modified versus Differential Files. 4

Baseline Size Metrics. 4

LOC. 4

eLOC. 4

lLOC. 5

Size Differential 5

Equal Files (Size Metric Analysis) 5

Baseline Change Metrics. 5

Equal Files (Code Differential Analysis) 5

Code Differentials. 5

New and Removed Files. 5

 

Source Code Differential Analysis. 5

Code Differentials Requirements. 5

RSM Differential Algorithm.. 7

Differential Files. 7

Differentials of Characterized Files. 8

Detailed View Using the RSM Differential Algorithm.. 9

 

Case Study. 11

Concept of Operation. 11

Work Files. 11

Work File Locations. 12

Baseline Size Change Work Files. 12

Baseline Source Change Work Files. 14

The dir command yields the following contents of the metrics directory. 14

Extracting Work Files – Baseline Differentials. 15

Work File Extraction – Size Metrics Differential 15

Work File Extraction – Code Change Metrics Differential 16

Comparing the Size versus Code Change Differentials. 17

Work File RSM Configuration Settings. 18

Other Helpful RSM Configuration Settings for Baseline Processing. 18

 

RSM Report Options. 19

Report Output Format 19

Example HTML output 20

Example CSV Output for Spreadsheet Import 20

Work File Glossary. 22

Work Productivity Estimates. 22

Work File Deterministic Mode. 24

Showing Source File Differences. 25

 

Reference. 28

 

Contact Information. 28

 


Introduction

A software product baseline is a set of source code files organized in a hierarchical tree of folders which represent the product components.  Each product release is derived from a specific baseline.  Software product managers are interested in the "QOWP" Quantity of Work Performed" by their team.  The quantity of source lines of code is a metric that can be directly measured from the product baseline.  This is not so of Function Points (FP).  Function points are useful as a forward predictor in software estimation by the metrics can not be objectively measured from the physical product.   RSM is a tool which can measure source lines of code and derive function point metrics from SLOC for a software baseline.  The difference in the growth of this metrics between baselines is called source code differential metrics.

 

When a product is released, the set of files within these folders comprise a product.  This set becomes a version controlled baseline.  Version control systems allow for the identification of the files in a baseline and they provide the ability to view a specific baseline version.  There is a historical or older baseline and a current or newer baseline. Product managers need the ability to assess the source code metrics between these two baselines.  This task is not practical using a manual process because of the volume of folders, files and lines. 

 

Resource Standard Metrics has been an industry leading metrics tool since 1996.  This product has the ability to automatically perform source code baseline differentials.  There are two modes for baseline differentials, size and change. 

 

A size differential addresses the question; How much larger is baseline 1 from baseline 2 without regard for individual line changes?  This metric is useful to quantify the additional growth of a product baseline or when assessing the quantity of work performed against the labor hours expended.

 

A change differential of the baseline source code addresses the changes made between the baselines.  This includes changes to existing code, new code and deleted code.

 

Versions of Resource Standard Metrics (RSM) prior to version 6.40 performed a size metric differential.  Each file was assessed for is LOC metrics and a net differential is performed between baselines.  RSM version 6.40 of RSM adds source code change metric differentials and RSM 6.60 enhances the code differential algorithm for speed and accuracy.

 

This white paper explores these two baseline differential modes when using Resource Standard Metrics.  It presents the technology, processes and reports provided by Resource Standard Metrics.

 

Windows Quick Start Step By Step

 

Linux/Unix Quick Start Step By Step

Baseline Metrics Differentials

This paper will use two baselines, a older baseline and a newer baseline.  The example baseline is composed of a single file so that the reader can follow how RSM processes the baseline differentials.

 

RSM can process an unlimited number of files in a baseline.  The number of files processed by RSM is only limited by the hardware resources of the machine.  RSM is designed to be extremely efficient with these resources so a ten thousand file baseline containing million lines of code can be processed on a Pentium III class machine with 256 megabytes of memory and available disk space at least as large as the processed baseline if change differential metrics processed.

The Baseline Tree

A baseline is a tree of folders or directories which contain files that change over time.  It is assumed that the “top” directory or root of the tree is a constant.  When processing baseline differentials it is very important that RSM know this location as this becomes the start point for file discovery in the recursive descent of the baseline tree. 

 

The following is an example of a baseline tree.

 

/project

  +- src

       +- file1.cpp

       +- file1.h

 

If the root of the news baseline has a different root (“/project_new”) then RSM will assume the subdirectories and files with that root are different because their literal path name (“/project/src/file1.cpp”) is different than the older baseline (“/project_new/src/file1.cpp”). 

 

However, if you select the “src” directory as the starting root of your source tree then the file path/name becomes identical and RSM will assume the files are the same and perform differentials rather than declare new or deleted files.

 

Choosing the proper baseline root for metrics differentials

 

/project

  +- src

       +- file1.cpp  =>  File: /project/src/file1.cpp

       +- file1.h

 

/project_new

  +- src

       +- file1.cpp  =>  File: /project_new/src/file1.cpp

       +- file1.h

 

/project/src/file1.cpp != /project_new/src/file1.cpp

 

However if you choose the ./src folder as the root the two files can be compared because of their matching names

 

 

  +- src

       +- file1.cpp  =>  File: src/file1.cpp

       +- file1.h

 

 

  +- src

       +- file1.cpp  =>  File: src/file1.cpp

       +- file1.h

.

src/file1.cpp == src/file1.cpp

New Files

New files are those files that exist in the newer baseline but do not exist in the historical or older baseline.  These files comprise the new modules that have been added to the system.  The new system features can be usually traced directly to the files.  All metrics in these files become a positive addition to the overall baseline differential metrics.

Removed Files

“Rem” or Removed files are those files that are present in the historical baseline and not in the current baseline.  These files typically represent the modules that have been deprecated from one release to another.  Removed files can be either a positive or negative addition to the overall baseline metrics.  Some baseline engineers feel that removal of code is just as important as the addition of code and choose to make these files a positive impact.  This choice is made by modifying the “rsm_workdiff.cfg” file.  The contents of the configuration file are self explanatory.  This file turns on and off metrics within the metrics report and makes select metrics positive or negative in their impact to the baseline.  These settings only affect the metrics for files that have been removed from the current baseline.

Modified versus Differential Files

Files that exist in both baselines can be either “Mod” or “Dif”.  “Mod” stands for size modified and “Dif” stands for internal code differentials.  These modes are selected when processing a baseline for metrics differentials.  Therefore a file that is equal in both baselines depends on the baseline analysis mode.  These modes will be described in detail in the following paragraphs and are the focus of this white paper. 

Baseline Size Metrics

Resource Standard Metrics Version 3.0 implemented baseline differentials based upon a size metric analysis.  This technique analyzes the LOC metrics for a file and compares the LOC metrics between the historical baseline and current baseline.  This technique is suitable when a baseline engineer is interested in the size change between two baselines.  The following metrics are defined as they are implemented in RSM.

LOC

A line of code is any line that contains source code.  This code is not a comment or white space.  Its possible that a physical line can have both source code and a comment.  In this case the physical line would logically have both a line of code and a comment line.  LOC can account for many lines that are just for format and not for the intrinsic nature of the program.  This issue is addressed with the effective lines of code metrics.

eLOC

An effective line of code (eLOC) was innovated by M Squared Technologies.  This metric best represents the true magnitude of the quantity of work performed when implementing a software program.  An effective line of code is any source line which is not a standalone brace or parenthesis.  The following example illustrates this concept.

 

       Source code line              LOC   eLOC  lLOC  Comment  Blank

       --------------------------------------------------------------

       if (x<10)   // test  range     x      x            x

       {                              x

         // update y coordinate                           x

                                                                  x

          y = x + 1;                  x      x     x

       }                              x

       --------------------------------------------------------------

 

This example illustrates the possible over estimation of the quantity of work performed if the LOC metrics is used and a possible underestimation if the lLOC metric is used.  eLOC effectively eliminates the effect of style and includes the work required to create symbols, names and classes that may not be captured by a code statement.  

lLOC

lLOC or logical lines of code which is defined as code statements that end in a semicolon, where the “for” loop is one logical line of code.

Size Differential

When a file has different LOC, eLOC, lLOC, Comments or Blank line metrics it is considered “Mod” or size modified from the historical baseline.  This type of differentials will indicate how a baseline has grown without regard to changes to the code.  “Mod” is probably not the best indicator for this metric, but it is retained to serve backward compatibility with all other versions of RSM.

Equal Files (Size Metric Analysis)

A file is considered equal (Equ) under the size metric analysis when all of the file LOC metrics are equal.  This mode does not consider the changes to an individual line of code within the file, just the number of lines within the file.  If the file is larger or smaller then it is size “Mod” or modified and If it is the same size then it is “Equ” or equal.

Baseline Change Metrics

Resource Standard Metrics, version 6.40+ enables a code characterization and code differential analysis mode to determine the code differential metrics between baselines.  This differential provides the magnitude of code changes and is activated when using the –wd RSM option.

Equal Files (Code Differential Analysis)

An equal file under the code differential analysis mode indicates that every line in the historical file matches every line in the current baseline file. 

Code Differentials

Lines that exist in the historical baseline but not in the current baseline file are considered a differential.  This differential can be added or ignored to the differential file metric with a setting in the RSM configuration file.  Lines that exist in the current baseline but not in the historical baseline file are considered a differential metric.  Lines that are the same between the files are considered equal in the differential metric.

New and Removed Files

These files contribute to the baseline metrics in the same manner as in the baseline size metrics analysis.  Removed files can be either a negative or positive impact depending on the RSM configuration file setting.  New files will always contribute positively.

 

Source Code Differential Analysis

The following section describes the technology and theory of RSM code differential analysis.

Many techniques for performing code differentials were considered.  The following requirements drove the design for the RSM code differential algorithm.

Code Differentials

 

RSM creates work files for the baseline such that a baseline can be characterized as of a specific date where metrics can be derived at any time in the future.  Once the characterization is complete the original source file is no longer needed.  The differential algorithm must characterize a file to meet this scenario.

 

A characterized source code file must be smaller than the original source file.  This conserves hardware resources.

 

The two step process of characterization and differentiation spreads the processing load across two points in time thus mitigating the time for extremely large baselines.

 

The determination of code change is very subjective.  The Gnu diff algorithm is a widely used code differential algorithm.  It matches lines as they exist in the file.  If a line in the historical baseline matches a line in the current baseline file there is no differential.  If a line is not equal, and no match in the historical file is found the line is flagged as a differential.  A match in source code between the historical file and current file the files are realigned and are equal.  New lines in the current file are flagged as a diff with space inserted into the historical file.

 

This behavior can be readily seen with the graphical differential tool called WinMerge. (http://sourceforge.net/projects/winmerge).  The following figure shows our baseline files ‘diffed’ using WinMerge by the Gnu diff algorithm.  It is important to note the changes that were made between the two files and to assess the accuracy of the Gnu diff algorithm.

 

Changes to the example baseline files:

1.       The function Ball::get_number was removed from the historical file.

2.       The function Ball::some_newfunc was added to the file

 

The Gnu differential algorithm did not correctly identify the removal of the Ball::get_number function.  It did not match the function Ball::Reset correctly as equal lines in a block match thus it did not identify the addition of the new function.

 

Code differential metrics must identify lines that exist in the historical baseline but not in the current baseline.  The algorithm must match up equal lines when they exist and identify lines that exist in the present baseline and not in the historical baseline.  A line differential is any line of code in these baselines that meet these criteria.

 

The RSM differential algorithm must perform differential analysis to these criteria.  The algorithm must be capable of ignoring white space difference between two source lines.  The algorithm must be capable of performing line differentials independent of blank line differences between each file.

 

The differential algorithm must integrate with RSM so that LOC, eLOC, LLOC, Comment and Blank differences can be identified.

 

M Squared Technologies developed its own proprietary source line characterization algorithm and differential algorithm to meet all these requirements.  M Squared technologies claims copyright on this body of work.

 

Figure 1: Gnu Diff Algorithm as implemented in WinMerge

RSM Differential Algorithm

The RSM differential algorithm characterizes a source code file into a separate differential file.  These differential files are stored in a directory co-located with the resulting RSM work file.  The RSM work file is composed of all the file names and size metrics for the differential baseline files processed.

Differential Files

RSM processes a file, parses a line for tokens and metrics, and then creates a unique signature that reflects each character and its position in the line.  The line is identified as a line of code (L), an non-effective line of code (N), a logical line of code (G), a comment line (C) or blank line (B).  The L, N or G indicator takes precedence over comment C indicator.  If a line of code also has a comment, is it identified as a line of code?  Each line is written to the file by the line number from the original source code file, the indicator of the line type and the signature of the line.

Differentials of Characterized Files

Each baseline set of files is captured in a RSM work file.  RSM uses two work files, each of which contains the list of files that were characterized for metrics.  When RSM extracts two work files and code differential metrics mode is selected, a differential is performed between each characterized source file. 

 

RSM matches the type and signature for each line.  When a mismatch is identified, a block matching algorithm looks forward in the characterized source files.  If a block is identified to match the differentials are logged and processing continues at the matched block of code.

 

The RSM configuration file includes options that control the operation of both the characterization process and the differential process.  RSM provides a method to analyze how the differential algorithm works with your source files.  These options will be covered in the following case studies.

 

The following figure illustrates how the RSM differential algorithm processes the example differential file.  Compare these results with the WinMerge results.

 

One can see the correct identification of the function removal from the historical baseline, highlighted in red.  Equal lines are shown in green and added lines to the current or present baseline are shown in blue.  This illustration was created by using the RSM report (-ws) to show the code differentials between the baseline files.

 

The –ws report is shown below with the RSM configuration file set not to show equal lines in this report.  RSM characterizes the source code and does not retain the original source line, therefore a diff showing actual source code is not available.  This report displays the effected line number in the original source file, a minus sign for historical baseline removal, a plus sign for a present baseline addition and the differential indicator.  This information is used to generate the metrics change reports.

 
  File: ball.cpp
  Historical                  Present
  ------------------------------------
  Line Number    Diff  Type   Line Number
           10     -     L                
           11     -     L                
           12     -     N                
           13     -     C                
           14     -     G                
           15     -     N                
           17     -     C                
                  +     L     18         
                  +     L     19         
                  +     N     20         
                  +     C     23         
                  +     G     24         
                  +     N     25         
                  +     C     27         
                  +     L     28         
                  +     G     34         
           32     -     G

 

 

The following detailed view with the relative source code line illustrates the improvements of the RSM differential technology over the Gnu diff algorithm.  This figure was created manually by aligning the source code to the –ws report for the associated file between the two baselines.


Detailed View Using the RSM Differential Algorithm

 

Ln
#

Historical File – ball.cpp

Diff

Present File – ball.cpp

Ln
#

Source

Type

Code

 

Code

Type

Source

1

  //-------------------------------------------------

C

186.72770105182500

Equ

186.72770105182500

C

  //-------------------------------------------------

1

2

  void

L

157.32142857142800

Equ

157.32142857142800

L

  void

2

3

  Ball::set_number(int n)

L

258.69119441475900

Equ

258.69119441475900

L

  Ball::set_number(int n)

3

4

  {

N

89.00000000000000

Equ

89.00000000000000

N

  {

4

5

    // set the number of the ball

C

228.91944213409300

Equ

228.91944213409300

C

    // set the number of the ball

5

6

    number = n;

G

180.85475635475600

Equ

180.85475635475600

G

    number = n;

6

7

  }

N

89.66666666666660

Equ

89.66666666666660

N

  }

7

8

 

B

0.00000000000000

Equ

0.00000000000000

B

 

8

9

  //-------------------------------------------------

C

186.72770105182500

Equ

186.72770105182500

C

  //-------------------------------------------------

9

10

  int

L

133.69999999999900

Rem

 

 

 

 

11

  Ball::get_number(void) const

L

276.95656062359600

Rem

 

 

 

 

12

  {

N

89.00000000000000

Rem

 

 

 

 

13

    // return the number of the ball

C

242.90830374056700

Rem

 

 

 

 

14

    return(number);

G

215.68659644878600

Rem

 

 

 

 

15

  }

N

89.66666666666660

Rem

 

 

 

 

16

 

B

0.00000000000000

Rem

 

 

 

 

17

  //-------------------------------------------------

C

186.72770105182500

Rem

 

 

 

 

18

  void

L

152.75000000000000

Equ

152.75000000000000

L

  void

10

19

  Ball::reset(void)

L

231.49790040111200

Equ

231.49790040111200

L

  Ball::reset(void)

11

20

  {

N

89.00000000000000

Equ

89.00000000000000

N

  {

12

21

    // return the picked state of the ball to false

C

274.16039735386100

Equ

274.16039735386100

C

    // return the picked state of the ball to false

13

22

    picked = false;

G

201.74416983944800

Equ

201.74416983944800

G

    picked = false;

14

23

  }

N

89.66666666666660

Equ

89.66666666666660

N

  }

15

24

 

B

0.00000000000000

Equ

0.00000000000000

B

 

16

25

  //-------------------------------------------------

C

186.72770105182500

Equ

186.72770105182500

C

  //-------------------------------------------------

17

 

 

 

 

Mod

152.75000000000000

L

  void

18

 

 

 

 

Mod

266.21080012347100

L

  Ball::some_newfunc(void)

19

 

 

 

 

Mod

89.00000000000000

N

  {

20

 

 

 

 

Mod

0.00000000000000

B