Change to Kernel Smoother

Introduction

I recently changed a whole load of the functions in the ssci javascript library. One of these changes was the way that the kernel smoothing function worked.

The previous function was fairly slow and didn’t scale particularly well. In fact, the main loop of the function would suggest that it scales as O(n^2).

I therefore decided to make a change to the function. Instead of looping over every point and then calculating the weight at every other point, I’ve changed it so that:

  • It loops through every point similarly to the previous function
  • Then for point n it calculates the weight at the central point (i.e. point n)
  • It then loops through every point lower (in the array) than n and calculates the weight. If the weight of this point is lower than a certain threshold to the central point, then the loop ends.
  • It then loops through every point higher (in the array) than n and calculates the weight. If the weight of this point is lower than a certain threshold to the central point, then the loop ends.

The default setting of the threshold is 0.001 (i.e. 0.1%). The way the function operates though does mean two assumptions have been made.

  • The data has already been ordered in the x-coordinate within the array.
  • The kernel function being used must always decrease from the central point beyond this threshold.

Example

Issue #43 of D3 shape gives an example of some data that has been smoothed. It’s the observed rate of tweets for Eric Fischer’s Twitter feed over the period from January 8, 2015 through November 30, 2015. I’ve taken the data and charted it below.

Performance

I’ve put together some figures detailing the performance of the various routines on my computer. To get the times of the function and how it scaled I used the following:

var iterations = 10;
for(var j=0; j<10; j++){
var temp2 = temp1.slice(j*temp1.length/10);

console.time('Function #1');
for(var i = 0; i < iterations; i++ ){
ssci.smoothKernel("Gaussian", temp2, 2000000);
};
console.timeEnd('Function #1')

console.time('Function #2');
for(var i = 0; i < iterations; i++ ){
var data4 = ssci.smoothKernel2()
.scale(2000000)
.kernel("Gaussian")
.diff(0.001)
.data(temp2);

data4();
};
console.timeEnd('Function #2')
}

In the code above temp1 is the array holding the points of data. The code basically takes the data, chops it into progressively smaller arrays and then runs the function 10 times.

This leads to the following times (in milliseconds) per run i,e, divided by ten:

Rows #1 (ms) #2 (ms) #1/#2
144 10 10 0.94
288 38 26 1.45
432 84 42 1.99
576 149 58 2.55
720 234 76 3.08
864 336 91 3.69
1008 457 107 4.26
1152 597 124 4.82
1296 756 140 5.41
1440 949 160 5.95

Let’s plot these numbers.

The differences are anything from the same to six times faster. With more data it becomes even faster than the corresponding O(n^2) function. It would appear to scale linearly.

Of course there is also the question of what this change does to the accuracy of the smoothing. It may be a slightly odd thing to say given that no smoothing algorithm can be said to be accurate but I’m looking at the differences here.

As you can see from the graph the differences in this example are of the order of 0.004% with a maximum of 0.03%. I think we can live with that.

The new function can be found on www.surveyscience.co.uk