Rim Weighting Question

I recently had a question about rim weighting and how to set the values for the maximum iterations and the upper and lower weight caps.

I’ve reproduced my answer, though I’ve adjusted it slightly:

Maximum Iterations

  • The value to set here will depend largely on how many rims you have, how small the cells are and how close the actuals are to the targets. The only way to tell for sure is to see what difference it makes to the weights when you run the program again with one more iteration. If it makes no difference to the weights then you’re ok to leave it as it is. Non-convergence in this case will be down to either the rims having conflicting targets (i.e. one rim causes the weights to go up and another causes them to go down) or the weight cap bringing the weights back down (or up).
  • In terms of an actual value, 25 is generally ok for small number of rims (of the order of 5-20). However I’ve seen weighting schemes that required more than 200 iterations to converge. These had hundreds of interlaced rims.
  • Potentially I could add a metric to the program to check for a minimum weight change so that the program ends if all weights change by less than this figure. It would, of course, affect performance though and is not a trivial change.

Upper Weight Cap

  • A good starting point for this figure is to divide the actual proportions (or base sizes) by the targets for each cell and look at the largest. So, if for example, you had 20 percent males in the sample but the target was 45 percent and this was the biggest difference, then the biggest initial weight would be 0.45/0.2=>2.25. Given the way the algorithm works, it will not stay at that but it should be of that order. It will depend on the other rims.
  • One consequence of lowering the upper weight cap is that it will reduce the WEFF – the weighting efficiency. A higher WEFF means that you will have lower precision in your estimates i.e. it increases the standard error. However lowering the weight cap can also increase the number of iterations and also potentially lead to non-convergence.
  • I’d set a value that allows the procedure to converge and gives a reasonable WEFF. Generally a value of 5 or 6 is fine for proportional targets and a multiple of 5 or 6 above the total base size divided by the total number of panellists for base size targets (e.g. if there are 1000 panellists and a total base size of 4500, then set a value of 4.5*5=22.5).
  • A WEFF above 1.5 – 1.6 is high and is an indication of poor representation within the panel.

Lower Weight Cap

  • I’d leave this at 0 unless the WEFF needs to be lowered. A good indication of problems with the targets or with the panel is whether all the weights drop to near zero.

So, the basic answer to how to set them is that it depends on any lack of convergence and how high the WEFF goes. The above should give some indication of where to set them though.

Thanks to Bryan for the question.

Excel Autocorrelation

In my Excel add-in I’ve inadvertently been using a different formula to calculate auto-correlation than that used by everyone else. I’d been calculating the correlation as if the mean and variance weren’t stationary i.e. without the simplification of the auto-correlation only depending on the lag.

So I had:
auto-covariance = E{[X(t1) – m(t1)][X(t2) – m(t2)]}

whereas R etc. has:
auto-covariance = E{[X(t) – m][X(t+l) – m]}

where X(t1) = the value of the function at time 1,
X(t2) = the value of the function at time 2,
m(t1) = the mean at time t1,
m = the overall mean,
l = the lag

I’ve now amended the AutoCor function to use the simplified formula. The code is below but the add-in won’t be updated until I’ve tracked through the changes for all the other functions.

Public Function AutoCor(dataRange As Range, Optional lag As Variant, Optional diff As Variant) As Variant
'@desc;Function to calculate the auto-correlation of a series of data. An array function
'@param;Data range;Range;The range holding the data series
'@param;Lag;Variant;The lag to calculate the auto-correlation for. Can be a range for the array formula.
'@param;Difference;Variant;The number of times to difference the data
'@return;Variant;The auto-correlations

Dim x() As Double           'Array to hold data values
Dim lags() As Integer       'Array to hold lag values
Dim sx As Double
Dim sy As Double
Dim s1 As Double
Dim s2 As Double
Dim s3 As Double
Dim i, k As Integer         'Loop variables
Dim outputRows As Long
Dim outputCols As Long
Dim output() As Variant     'Temporary output array
Dim vertOutput As Boolean
Dim vertInput As Boolean
Dim vertLag As Boolean
Dim numLags As Integer
Dim a As Integer
Dim b As Integer
Dim dataSize As Integer

'How many lags are there to calculate for?
With Application.Caller
outputRows = .Rows.Count
outputCols = .Columns.Count
End With
ReDim output(1 To outputRows, 1 To outputCols)

'Vertical or horizontal output?
If outputRows > outputCols Then
vertOutput = True
numLags = outputRows
'Set all output() to "#N/A"
For i = 1 To outputRows
output(i, 1) = CVErr(xlErrNA)
Next
Else
vertOutput = False
numLags = outputCols
'Set all output() to "#N/A"
For i = 1 To outputCols
output(1, i) = CVErr(xlErrNA)
Next
End If

'Vertical or horizontal input?
If dataRange.Rows.Count > dataRange.Columns.Count Then
vertInput = True
a = 1
b = 0
dataSize = dataRange.Rows.Count - 1
Else
vertInput = False
a = 0
b = 1
dataSize = dataRange.Columns.Count - 1
End If
ReDim x(dataSize)

'Check that lag is there
If IsMissing(lag) Then
ReDim lags(numLags)
'Populate lags
For i = 1 To numLags
lags(i) = i
Next
ElseIf TypeName(lag) = "Range" Then
'Horizontal or vertical lag
If lag.Rows.Count > lag.Columns.Count Then
vertLag = True
'Check that lag range matches output range in size
If lag.Rows.Count <> numLags Then numLags = lag.Rows.Count
ReDim lags(numLags)
'Populate lags
For i = 1 To numLags
lags(i) = lag(i, 1).Value
Next
Else
vertLag = False
'Check that lag range matches output range in size
If lag.Columns.Count <> numLags Then numLags = lag.Columns.Count
ReDim lags(numLags)
'Populate lags
For i = 1 To numLags
lags(i) = lag(1, i).Value
Next
End If
Else
'Should I check for array/single values?
ReDim lags(1)
'Need to check if integer
lags(1) = lag
'Set numLags to 1 just in case
numLags = 1
End If

'Check that diff is there
If IsMissing(diff) Then
diff = 0
End If

'Fill data array x()
If diff > 0 Then
For i = 0 To dataSize - 1
If vertInput Then
x(i) = dataRange(i + 1, 1).Value - dataRange(i + 2, 1).Value
Else
x(i) = dataRange(1, i + 1).Value - dataRange(1, i + 2).Value
End If
Next
Else
For i = 0 To dataSize
If vertInput Then
x(i) = dataRange(i + 1, 1).Value
Else
x(i) = dataRange(1, i + 1).Value
End If
Next
End If

If diff > 1 Then
For k = 1 To diff
For i = 0 To dataSize - 1 - k
x(i) = x(i) - x(i + 1)
Next
Next
End If

'Calculate autocorrelation
For i = 1 To numLags
sx = 0
s1 = 0
s2 = 0

For k = 0 To dataSize - diff
sx = x(k) + sx
Next
sx = sx / (dataSize + 1 - diff)

For k = 0 To dataSize - diff
If k < (dataSize + 1 - lags(i) - diff) Then
s1 = s1 + (x(k) - sx) * (x(k + lags(i)) - sx)
End If

s2 = s2 + (x(k) - sx) ^ 2
Next
If vertOutput Then
output(i, 1) = s1 / s2
Else
output(1, i) = s1 / s2
End If
Next

AutoCor = output

End Function

And here’s the output for an example data series:
autocor