Wednesday, July 23, 2008

Calculate a median on a group in SSRS

Today i needed to do something that turned out to be impossible to do in the conventional manner. I was doing a report on some items that are measured in days, each row in the report was actually a group, and i needed to calculate a median for that group. Step by step, here is how i did it.

First, here is the report layout i started with:

as you can see i have two row groups, with the detail row of the second group hidden (because i don't want to list all the details, there are hundreds of them).

Initially this seemed quite simple - for each detail row i would call a custom function which would add the value to an array, and then in the group footer i could just call another custom function called CalculateMedian() to get the median of the group. But the problem is, when i called CalculateMedian(), my array was empty, so i had no values to calculate with. I double checked, and yes the AddToMedian() function was being called for every detail row, so wtf? After some digging around and some Googling, i found one or two blog posts that mention group header and footer expressions get evaluated before the group detail expressions - this meant i was trying to calculate the median before the values had even been added to the array! What the???!!!! Who the fuck came up with that brilliant design, and why? I can understand processing group headers before the details (for sorting puroses, etc), but surely footers should be done last? I mean, for 99.9% of cases, footers will contain aggregate or summary information, right? (I don't care about the other 0.1% of cases and whatever they are trying to put in the footer, if they want to do non-standard stuff then they should be prepared to feel the pain of getting it to work).

So, i couldn't use the group footer to show my summary information, a different approach was required. To achieve my median calculation, three things need to happen:
- at the start of each category group the array used to accumulate the values needed to be emptied.
- i needed to iterate the detail rows of each category group, as there is no other way of passing the group values through to a custom function.
- at the end of each category group i need to calculate and display the median value.

To begin with, i replaced the table that had the division and category row groups on it with a list container for the division group (call it the DivisionList). Above that i put a rectangle with a textbox for each column title, this replicates what was in the heading of the table:

After that, i nest another list inside the DivisionList container, this has a grouping set to group by category and is called the CategoryList:

Note that immediately inside each list container i have a textbox which shows the name of the grouped item. The CategoryList is now going to perform the same function as each row did in the original table, so i add a few more textboxes to hold the values for that "row":

Note that i have already added the expressions, the important one is the median textbox, which has the expression:


So this leaves the big question: i have the required grouping set up, and my aggregate/summary expressions in place, but how am i going to achieve objectives 1 and 2 from above, which was to reset the array holding the values, and to iterate rows in the category group so that i can place the individual values into the array the median is calculated from?

Hmmm, how can we iterate the values? I know - let's bring back the table, just give it one column:

if you are paying attention, you will see that it is inside the CategoryList container, with all the textboxes. Check out the expression, as the values get placed into the table rows they also get lodged in the array that is used for the median calculation. That's objective #2 resolved, now for objective #1, resetting the array. It's time for sly trick #1, using the knowledge that the table header expressions are processed before the table detail expressions, we can do this:

see what i did? I used an expression in the table header to reset the array. I also set the visibility of the table to hidden - i don't want this to show on the rendered report, remember? Sweet, now i have just one little detail left to take care of - how can i ensure that the table rows are filled before i call CalculateMedian() from my median textbox?

To achieve this, i employ sly trick #2: reports are rendered top to bottom, left to right. I've not seen this documented anywhere, it's just what i've observed, i've depended upon that behaviour in the past and it hasn't let me down yet. So, make the table 0.1cm wide, and make it the same height as the textboxes that are in the CategoryList container:

see that orange kind of spot on the report there? I added some colour to the table just so you could see what i've done. Now that it is resized, i move it to the left and above of the category label (IOW move it to location 0,0 of the CategoryList container, look for the orange spot). This ensures that it gets filled/rendered before the rest of the textboxes in that list container:

And that's all. Now, i just adjust the height of my list containers, and i run the report. This is what the final result looks like:

As you can see, the final result is almost the same as if i had used a table, but with this method i get to calculate the median on grouped rows - what a mission that was :(

For those of you who need it, here is the code for accumulating the median values and calculating them. It's pretty sandard stuff, nothing too exciting.

Dim Public Shared MedianArray(0) As Integer

Public Function ResetMedian()
ReDim MedianArray(0)
End Function

Public Function AddToMedian(fieldValue As Integer)
Dim i As Integer
i = UBound(MedianArray) + 1
ReDim Preserve MedianArray(i)
MedianArray(i) = fieldValue
AddToMedian = fieldValue
End Function

Public Function CalculateMedian() as String
Dim arraySize as Integer
Dim ii as Integer
Dim jj As Integer
Dim itemMoved As Boolean
Dim temp As Integer

'sort it & calculate it
arraySize = UBound(MedianArray)
If arraySize = 1 Then
CalculateMedian = CStr( MedianArray(0) )
Exit Function
Else If arraySize > 1 Then
For ii = 0 To arraySize - 1
itemMoved = false
For jj = LBound(MedianArray) To UBound(MedianArray) - 1
If MedianArray(jj) > MedianArray(jj + 1)
temp = MedianArray(jj)
MedianArray(jj) = MedianArray(jj + 1)
MedianArray(jj + 1) = temp
itemMoved = True
End If
If itemMoved = False Then Exit For

'calculate it
If arraySize Mod 2 = 0 Then
'average the two middle values
CalculateMedian = CStr( (MedianArray(arraySize / 2) + MedianArray((arraySize / 2) + 1)) / 2)
'get the middle value
CalculateMedian = CStr( MedianArray(Floor((arraySize / 2)) + 1) )
End If
End If
End Function

keywords: SSRS, SSRS2005, median, row group, calculate, custom aggregate