SQL Server 2008R2 Self Service BI - Gemini (PowerPivot) - how it performs with bigger tables

User Rating: 2 / 5

Star ActiveStar ActiveStar InactiveStar InactiveStar Inactive
 

Gemini was released about a week ago and I was playing with it almost every evening. I am still trying to understand what it is, how it works, what its limitations are, how to use DAX, etc. And I can say that I like what I see so far. Kasper de Jonge and Chris Webb already posted their initial Gemini reviews. In this post I will share my experience with Gemini so far.

First of all I want to point that for Gemini tests all you need is Excel 2010 and the Gemini Add-in file (it is just about 30MB). You do not need SQL Server 2008R2. I am pointing this out because as I was setting up my testing environment I installed SQL Server 2008R2 and later realized that it is not required. It is quite amazing that this 30MB add in file contains such powerfull software – the Gemini front-end, and the in memory Analysis Services server. Of course for Enterprise level installation you will need SQL Server 2008R2 and Sharepoint 2010, but such setup is not part of my current tests.

 After my initial tests with a few thousand records I finally decided to do tests on bigger tables. I took the AdventureWorksDW database and expanded the FactInternetSales table to 22mln records by simply duplicating the included records. This table size grew to 3.2GB. My goal was to see how well Gemini can handle millions of rows. I ran my tests on my Laptop with 3GB of RAM and an Intel Duo CPU 2.5GHz processor. On the same laptop I have my SQL Server 2008 database with my expanded data, Excel 2010 and the Gemini add-in. I limited SQL Server memory to 500MB.

My first tests to load 22mln records into the Gemini add-in failed. After loading over 21mln records I got an error message:

Memory error: Allocation failure : Not enough storage is available to process this command. .
Server: The operation has been cancelled.
Server: The operation has been cancelled.

I tried to stop some services, close other applications and then repeated the same table load into Gemini, but always at the very end I was getting the same error message. Finally I started to reduce number of records in my FactInternetSales table and that helped. I found that on my laptop I was able to load just about 20mln rows. Considering that this laptop also runs SQL Server, I can live with this limitation.

It took me about 5min to load into Gemini 20mln rows and as I repeated my FactInternetSales table load multiple times, I saw consistent Gemini data load speed – about 4mln records per minute. Of course, load type depends on how many fields there are in the table, machine power, etc. If  your SQL Server will be on the network, your load speed will depend on the network’s speed too. But this gives you an idea – when working with Gemini and bigger data sets it will take time to get data into Gemini. This actually comes as no surprise, just I am impressed with this speed. During the load I was checking the task manager and noticed that Excel was using 80-90% CPU time and SQL Server was using 10-20% CPU time. Total CPU usage during the data load into Gemini was almost always 100%.

Data loading was most time consuming operation. As soon as I had 20mln rows in Gemini I wanted to see if sorting and filtering operations really works as fast as I saw in demos before. I always doubted that Gemini can work with millions of rows that fast. But I was not disappointed – filtering on any field or sorting worked in sub-seconds. I tried to do multiple filters, sorting, filters again but Gemini responded so fast that I always had to track the displayed record count in the left bottom corner just to confirm that the operation was executed. I was very impressed. This will please many Gemini users and I expect to hear many wows  during demos.

Then I loaded a few dimensions and manually added relationships to the fact table. After that I created a pivot table and started to browse data. Again, I was blown away by Gemini’s speed – millions of records on my laptop and a response to any query comes back in milliseconds. How do they do that? I still cannot believe I am getting this power from the 30MB add-in.

After that I hit “Save” button. It took about 40 seconds for Excel to save this new excel file. This file contains all of the data (!) and metadata from all tables and pivot definitions. The saved Excel file size was 72MB. Just to remind you - I loaded over 3.2GB SQL Server table into Gemini that compressed and saved that data in the 72MB Excel file. I probably got such a great compression because my table contained a lot of duplicate data. I noticed that saving Excel file after any changes with such amount of data always was on a slow side – it took about 20-40 seconds.

Opening the Excel file took just a few seconds and pivot table took few more seconds to return results from a changed query. Opening the Gemini add-in with 20mln rows saved took just a few seconds too.

Overall I am very impressed with Gemini and the way it handled my fact table with 20mln rows very well on my laptop.

Part 2

Yesterday I posted about my tests working with Gemini and bigger tables. I realized myself and Chris Webb also suggested that my method of generating new records by simply duplicating them probably affected my results. So I ran more tests with different data.

I generated a new table by adding random number to the existing numeric fields (dimension keys and amounts). From the new table I was able to load just about 17mln rows into Gemini. I was getting the same memory error message when I was trying to load more records. My load speed was about 3.5mln rows per minute. Saving the Excel workbook took 75 seconds this time, and the xlsx file was much larger - 448MB. While working with my data set, Excel was using 740MB of RAM. This time opening an existing Excel workbook took me 85 seconds.

But although with random data some operations were slower, I still was able to confirm that after loading data, all filtering/sorting operation were very fast and all pivot queries were returning results almost instantly. So duplicate or not duplicate data, if you are able to fit it into memory, then Gemini will handle it with amazing speed.

During my tests I realized that the amount of data you will be able to load into Gemini will depend entirely on your data. During my initial data “randomization” attempt I did not rounded my numeric results and I had numbers like 1.234567890. With such data I was able to load into Gemini just 4mln rows and the size of Excel workbook was about 580MB. After applying rounding to the same fields I was able to load 4 times (!) more data – 17mln rows. So when you will build you Gemini models, make sure that for bigger fact tables you load just the fields that are necessary for your analysis and make sure you round your numeric values for any calculations. There are no miracles – every character uses memory space and you need to minimize usage of that space as much as possible.

I am still learning Gemini and I am still impressed with results.

Tags: performance, dax, load, excel

 

2007-2015 VidasSoft Systems Inc.