Let the Data Speak

By Mike Vorster, Contributing Editor | September 28, 2010

Mike Vorster

David H. Burrows Professor of Construction Engineering and Management at Virginia Tech. See Construction Equipment. com for full archives of "Equipment Executive."

The linear trend line has been added to the chart using the "add trend line" routines. It is calculated and plotted by Excel using standard but complex routines to produce a "best-fit" representation of the data.

A best-fit trend line is free to curve downward so that it can better represent the data. The green line shows the result of adding a polynomial trend line.

Equipment managers tend to be data rich but information poor. They have lots of facts and figures carefully lined up in a multitude of reports, but seldom have the concise, clear information needed to make a decision and move forward. Many may even have reached the point of diminishing returns: too much data, too much analysis. and too little understanding of the underlying message behind the data. They need a way to understand what the facts and figures are trying to tell them.

There is a well-established technique that relies on some complex algebra, but normal spreadsheet tools can crunch the numbers and present results in a format that is easy to understand and easy to use. The technique has numerous names, but we will simply call it "the trend-line process." The following four steps illustrate how it works.

1. Define the study

In most cases, managers are concerned about the relationship between two factors and about how things grow or change over time. They worry about undercarriage wear or fuel consumption as a function of hours worked, or about residual market value as a function of the age of the machine in years. The first step, therefore, is to determine what relationship to study and what data to support the analysis. For our example, let's study equipment utilization and understand the relationship between machine age in years and the total number of hours worked. Age in years is the independent variable — it marches on regardless — and age in hours is the dependant variable — it depends on the age in years.

2. Define the population and collect the data

Utilization varies by machine type, geographical location, and many other factors, which means that the relationship between age in years and age in hours cannot be accurately defined for all types of machines in all applications. There will be too much scatter in the results, and it will be impossible to come up with anything other than meaningless averages.

This step in the process gives the study a meaningful, specific focus by defining a population or group of machines with similar characteristics. It is not a simple thing to do. Too much breadth in the population produces data that are too scattered to be of value, and an extremely select population is too narrow. Dividing the fleet into suitable categories and classes helps, so we will use for our example all excavators in the 80,000- to 110,000-pound class.

Defining the study and the population of machines to use makes it possible to achieve uniformity and consistency in the data. The accompanying table gives the age in years and the total hours worked for 10 excavators in the stipulated class.

3. Plot the data and insert a trend line

The data in the table are the facts. Substantial experience indicates an average utilization of 1,719 hours per year, which is a valid number. The average does not, however, give additional insights regarding variability in the data or underlying trends that may be hidden. It uses the data but does not look below the surface.

Chart 1 shows a plot of the data using an Excel scatter plot with hours worked plotted on the vertical, or Y axis, and the age in years plotted on the horizontal, or X axis. A blue linear trend line passing through the origin has been added to the chart using the "add trend line" routines built into Excel. This line is calculated and plotted by Excel using standard but complex routines to produce a straight line that is a "best-fit" representation of the plotted values.

The numbers in the top right hand corner of the chart give two important insights. First, the equation of the best-fit linear trend line is y = 1672.8x. This means the number of hours worked (Y) increases by 1,672.8 hours for each year (x). This is the annual utilization which, according to the trend line, is 1,672.8 hours per year and not 1,719 as given by the average. Second the "R2" value for the best-fit line is 0.8671. This is a standard statistical measure which indicates that the best-fit linear trend line represents 86 percent of the variability in the data.

Both the average and the best-fit linear trend lines miss an important piece of information that only becomes apparent when you look carefully at the plotted data and notice that utilization clearly declines as the machine ages.

The only way to quantify the change in utilization is to add a best-fit trend line that is free to curve downward so that it can better represent the data. The green line in Chart 2 shows the result of adding a polynomial (or "free to curve") trend line. It is clearly a better fit to the data. The "R2" value has improved from 0.8671 to .9513, and the line now represents more than 95 percent of the variability in the data. The curved trend line has been influenced by the lower utilization of the older machines and provides an important piece of additional information.

4. Use the trend line

The trend-line process enables managers to understand the data and use it to its fullest extent. In this example, there is a small but not critical difference of 46 hours per year (1,719 — 1,672.8) between the average utilization and the slope of the best-fit linear trend line. There is, however, a real difference between these methods of converting data into information and the curved trend line shown in Chart 2. Utilization is not constant at 1,700 hours per year. As can be seen from the green line, the data clearly tell us that annual utilization is 2,000 hours for the first three years; 1,330 hours for the next three; and about 750 hours for the next two years.

These values, taken directly from the graph or calculated by substituting in the slightly more complex polynomial equation given in Chart 2, will dramatically change costs, budgets and expectations for the excavators in question. They clearly show how the trend-line process enables managers to permit the data to tell its full story.

The example has focused on the relationship between age in years and hours worked. Similar nonlinear relationships exist in many areas of equipment management and similar mistakes are easily made by assuming that the future is a linear extrapolation of the past. The growth in repair parts and labor or the decline in residual market value with age are excellent examples where nonlinear trend lines can be used to let the data tell its full story.