Solving the 'Events in Progress' Problem in DAX, Part Two: Aggregating Dates and Multiselect

Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive
 

Reposted from Chris Webb's blog with the author's permission.

You may recall that by the end of last week’s post on solving the ‘events in progress’ problem in DAX, I’d managed to get the right values out of PowerPivot at the Date level with good performance. BUT when, for example, I selected a Calendar Year I got no values out, and I hadn’t even thought of how to handle situations such as when Date was on filter and the user had selected three different, non-consecutive dates. Well, that was last week when I was a DAX newbie – this week, well, I’m slightly less of a newbie and I’ve got the solution.

First, let’s restate the problem a bit more clearly. What I want is this: when I have a date on columns or rows, I want my calculated measure to display the number of rows in the FactInternetSales table that have an OrderDate less than or equal to that date, and have a ShipDate that is greater than or equal to that date. This way I see the number of rows in the table where something has been ordered but not yet shipped. When I select a year, I want to see the number of rows where that OrderDate-ShipDate range intersects with the dates in that year. So if I selected the year 2002, I want all the rows where either the OrderDate is in 2002, or the ShipDate is in 2002, or when the OrderDate is before 2002 and the ShipDate is after 2002. Finally if I make a multiselect when Date is on a filter, for example if I multiselect the 1st January 2002, the 5th of January 2002 and the 10th January 2002, I want to count the number of rows where the OrderDate-ShipDate range includes any of those three dates.

I spent several hours thinking how I could do this in DAX using just the columns I had and I came up with the following formula, which gave me the results I was expecting… when it ever returned, because it was horrendously slow:

=CALCULATE(
COUNTROWS('FactInternetSales')
, FILTER('FactInternetSales'
, COUNTROWS(
FILTER(
VALUES('DimDate'[DateKey]),
('FactInternetSales'[OrderDateKey]<='DimDate'[DateKey])
&&
('FactInternetSales'[ShipDateKey]>='DimDate'[DateKey])
))>0))

It’s using the Calculate function to evaluate the expression CountRows(FactInternetSales); what it’s trying to do in the outermost filter is to filter all of FactInternetSales, and find all the rows where there is at least one Date in the current selection that is between the OrderDateKey and the ShipDateKey. But, as we learned last week, doing a filter on all the columns in a table is very inefficient, and what we actually want to do is to filter on just the columns we’re interested in: OrderDateKey and ShipDateKey. However what I found was that I really needed all of the distinct combinations of OrderDateKey and ShipDateKey, but I could only get either all of the columns in a table using the table name, or the distinct set of values in a single column using the Values() function. Therefore I couldn’t actually filter on the distinct combinations of OrderDateKey and ShipDateKey where OrderDateKey>=DateKey>=ShipDateKey. This seems like a limitation of DAX from what I can see, though it might turn out that there is a way of doing it – if there is I will of course blog about it.

Anyway, the solution was to create a new column that did contain the distinct combinations of OrderDateKey and ShipDateKey. I went into the PowerPivot designer and created a new column called OrderShip on FactInternetSales with the following expression:

=CONCATENATE('FactInternetSales'[OrderDateKey],'FactInternetSales'[ShipDateKey])

As you can see, it simply concatenates the OrderDateKey and ShipDateKey values into a 16 character string:

image

I was then able to use that column in a filter as follows, extracting the OrderDateKey and the ShipDateKey from the string using the Left and Right functions, and turning them back into numbers using the Value function:

=Calculate(
CountRows(FactInternetSales)
, FILTER(
VALUES('FactInternetSales'[OrderShip])
, COUNTROWS(
FILTER(
VALUES('DimDate'[DateKey]),
('DimDate'[DateKey]>=VALUE(LEFT('FactInternetSales'[OrderShip], 8)))
&&
('DimDate'[DateKey]<=VALUE(RIGHT('FactInternetSales'[OrderShip],8)))
))>0))

The logic is the same as with the previous expression, with the all-important exception that the outer Filter only works on the distinct values of OrderShip. It works and it’s fast, so I’m relatively happy although I would like to find a more elegant solution.

It not only gives the correct values for individual dates:

image

but also for Years (and other aggregations of Dates – also notice how the sum of the years is greater than the same as the Grand Total, because of course an order can be in progress in two different years and so needs to be counted in each):

image

and also for multiselect (where again the value is not the sum of the values of the three individual dates):

image

image

I think this illustrates one area where DAX is superior to MDX. MDX is notoriously bad at handling multiselect, but as Marius explained to me in DAX multiselect support is built-in: the Values() function is rather like a CurrentSet function, returning the set of all selected (‘existing’ in MDX terms) values in a column in a particular context.

So, another problem solved… time to think up the next challenge!


chris-webb

Chris has been working with Microsoft BI tools since he started using beta 3 of OLAP Services back in the late 90s. Since then he has worked with Analysis Services in a number of roles (including three years spent with Microsoft Consulting Services) and he is now an independent consultant specialising in complex MDX, Analysis Services cube design and Analysis Services query performance problems. His company website can be found at http://www.crossjoin.co.uk and his blog can be found at http://cwebbbi.spaces.live.com/


Tags: dax, excel, date

 

2007-2015 VidasSoft Systems Inc.