Updated Holt-Winters function

Due to the complexity of the Holt-Winters function I’ve changed the way it works in the JavaScript Library.

Now instead of passing arguments to the function you chain functions to the Holt-Winters object.

So to give a concrete example, let’s take the monthly UK fish landing data from the ONS. I’ve stitched data from 2008 to 2015 together and filtered on just the amount landed in the UK by both UK and foreign vessels.

To perform the Holt-Winters exponential smoothing we first create a foreHoltWinter() object. Then we pass the parameters and data to the object via some methods/functions.

var fhw = ssci.fore.holtWinter()
                    .data(data)
                    .period(12)
                    .level(.14)
                    .trend(0)
                    .season(0);

After this the function needs to be called:

fhw.call();

To get the output you can then use:

fhw.output();

This will return an array of forecast points. To forecast further you use the forecast(value) function where value is the number of time periods ahead that you wish to forecast for. More detail is on the page for the function itself, including the other functions that can be called to control more aspects of the analysis.

Putting this all together, here is an example using the data specified previously. I’ve added the forecast data on this chart and predicted forwards for 12 months.






SurveyScience JavaScript Library

I’ve been converting some of the functions from the Excel add-in to Javascript functions within a library. It’s finally ready for release and can be found here.

It contains functions to:

  • Smooth data
  • Deseasonalise data
  • Perform least squares regression
  • Exponential smoothing
  • Create auto-correlation and partial auto-correlation plots
  • Market research functions
  • Utility functions to modify arrays

There’s still some work to do on it but it can be used as is – go to the above link for more details.

I originally started it to created some smoothed lines for some D3 charts.

Hopefully it will prove useful.

Excel Autocorrelation

In my Excel add-in I’ve inadvertently been using a different formula to calculate auto-correlation than that used by everyone else. I’d been calculating the correlation as if the mean and variance weren’t stationary i.e. without the simplification of the auto-correlation only depending on the lag.

So I had:
auto-covariance = E{[X(t1) – m(t1)][X(t2) – m(t2)]}

whereas R etc. has:
auto-covariance = E{[X(t) – m][X(t+l) – m]}

where X(t1) = the value of the function at time 1,
X(t2) = the value of the function at time 2,
m(t1) = the mean at time t1,
m = the overall mean,
l = the lag

I’ve now amended the AutoCor function to use the simplified formula. The code is below but the add-in won’t be updated until I’ve tracked through the changes for all the other functions.

Public Function AutoCor(dataRange As Range, Optional lag As Variant, Optional diff As Variant) As Variant
'@desc;Function to calculate the auto-correlation of a series of data. An array function
'@param;Data range;Range;The range holding the data series
'@param;Lag;Variant;The lag to calculate the auto-correlation for. Can be a range for the array formula.
'@param;Difference;Variant;The number of times to difference the data
'@return;Variant;The auto-correlations

Dim x() As Double           'Array to hold data values
Dim lags() As Integer       'Array to hold lag values
Dim sx As Double
Dim sy As Double
Dim s1 As Double
Dim s2 As Double
Dim s3 As Double
Dim i, k As Integer         'Loop variables
Dim outputRows As Long
Dim outputCols As Long
Dim output() As Variant     'Temporary output array
Dim vertOutput As Boolean
Dim vertInput As Boolean
Dim vertLag As Boolean
Dim numLags As Integer
Dim a As Integer
Dim b As Integer
Dim dataSize As Integer

'How many lags are there to calculate for?
With Application.Caller
outputRows = .Rows.Count
outputCols = .Columns.Count
End With
ReDim output(1 To outputRows, 1 To outputCols)

'Vertical or horizontal output?
If outputRows > outputCols Then
vertOutput = True
numLags = outputRows
'Set all output() to "#N/A"
For i = 1 To outputRows
output(i, 1) = CVErr(xlErrNA)
Next
Else
vertOutput = False
numLags = outputCols
'Set all output() to "#N/A"
For i = 1 To outputCols
output(1, i) = CVErr(xlErrNA)
Next
End If

'Vertical or horizontal input?
If dataRange.Rows.Count > dataRange.Columns.Count Then
vertInput = True
a = 1
b = 0
dataSize = dataRange.Rows.Count - 1
Else
vertInput = False
a = 0
b = 1
dataSize = dataRange.Columns.Count - 1
End If
ReDim x(dataSize)

'Check that lag is there
If IsMissing(lag) Then
ReDim lags(numLags)
'Populate lags
For i = 1 To numLags
lags(i) = i
Next
ElseIf TypeName(lag) = "Range" Then
'Horizontal or vertical lag
If lag.Rows.Count > lag.Columns.Count Then
vertLag = True
'Check that lag range matches output range in size
If lag.Rows.Count <> numLags Then numLags = lag.Rows.Count
ReDim lags(numLags)
'Populate lags
For i = 1 To numLags
lags(i) = lag(i, 1).Value
Next
Else
vertLag = False
'Check that lag range matches output range in size
If lag.Columns.Count <> numLags Then numLags = lag.Columns.Count
ReDim lags(numLags)
'Populate lags
For i = 1 To numLags
lags(i) = lag(1, i).Value
Next
End If
Else
'Should I check for array/single values?
ReDim lags(1)
'Need to check if integer
lags(1) = lag
'Set numLags to 1 just in case
numLags = 1
End If

'Check that diff is there
If IsMissing(diff) Then
diff = 0
End If

'Fill data array x()
If diff > 0 Then
For i = 0 To dataSize - 1
If vertInput Then
x(i) = dataRange(i + 1, 1).Value - dataRange(i + 2, 1).Value
Else
x(i) = dataRange(1, i + 1).Value - dataRange(1, i + 2).Value
End If
Next
Else
For i = 0 To dataSize
If vertInput Then
x(i) = dataRange(i + 1, 1).Value
Else
x(i) = dataRange(1, i + 1).Value
End If
Next
End If

If diff > 1 Then
For k = 1 To diff
For i = 0 To dataSize - 1 - k
x(i) = x(i) - x(i + 1)
Next
Next
End If

'Calculate autocorrelation
For i = 1 To numLags
sx = 0
s1 = 0
s2 = 0

For k = 0 To dataSize - diff
sx = x(k) + sx
Next
sx = sx / (dataSize + 1 - diff)

For k = 0 To dataSize - diff
If k < (dataSize + 1 - lags(i) - diff) Then
s1 = s1 + (x(k) - sx) * (x(k + lags(i)) - sx)
End If

s2 = s2 + (x(k) - sx) ^ 2
Next
If vertOutput Then
output(i, 1) = s1 / s2
Else
output(1, i) = s1 / s2
End If
Next

AutoCor = output

End Function

And here’s the output for an example data series:
autocor

A Warm December

It’s been a warm December here in the UK. I thought I’d have a look at the data from the Met Office to see quite how warm it’s been. It’s also interesting to see how the temperature has changed since the monthly statistics began in 1910.

First I downloaded the data for UK mean temperatures and had a look in Excel.

Below is chart of mean temp for the month of December from 1910 onwards. dec-mean-temp

As you see, 2015 is easily the warmest. It’s roughly 3-4 degrees Celsius warmer than the average. In fact, calculating the average (3.99 Celsius) and the standard deviation (1.45 Celsius) it’s roughly 2.7 times the standard deviation higher than the average. If it were distributed normally 99.6% of the months should be colder. I’ve no idea what the expected distribution is but it’s probably not normal. In any case, it was abnormally warm.

For comparison, I’ve plotted the averages and standard deviations for each month below.
avg-mean-temp-month

Interestingly the spread of temperatures is higher for the winter months.