PowerShell Script - Calculate how much space each table uses in the PowerPivot workbook

User Rating: 3 / 5

Star ActiveStar ActiveStar ActiveStar InactiveStar Inactive
 

In the post "How can I find how much space each table uses in the PowerPivot workbook" I explained how PowerPivot creates temporary folder/file structure when PowerPivot workbook is opened. I wrote simple PowerShell script that reviews this temporary PowerPivot folder and reports how much space each table is using in the PowerPivot workbook. You can run this script after you open your PowerPivot workbook and made some interaction with PowerPivot data. Interaction with PowerPivot data can be made when you visit PowerPivot window or when you refresh Pivot table that uses data from PowerPivot.

Here is my PowerShell script:

# ################################################################
# Author: Vidas Matelis, http://powerpivot-info.com
# Explanation how this script works and latest version of this script:
http://powerpivot-info.com/post/367-q-how-can-i-find-how-much-space-each-table-uses-in-powerpivot
# ################################################################
# To run this script you need first to open PowerPivot workbook and go to PowerPivot window.
# This will force Powerivot to create <temp\Vertipaq*> folder with all data in subfolders.
# Then execute this script.
# This script iterates through these subfolders and reports how much space each table is using.
# Example how to run:
# PowerShell c:\PS\vertipaq.ps1 -ShowColumnSizes "Yes"
# ################################################################

param([string]$ShowColumnSizes = "No"); # Shows column sizes when parameter value is "Yes"

cls;

$tmpFolder = $Env:Temp #"C:\Users\VMatelis\AppData\Local\Temp"
if (-not [System.IO.Directory]::Exists($tmpFolder)) {"Folder does not exists: " + $tmpFolder; break;}

# From temp folder get all subfolders that start with "vertipaq".
# Note: normally you should see one subfolder per open PowerPivot workbook
#       but you might also see "leftover" vertipaq subfolders, that should be ignored
$folders = (Get-ChildItem -path $tmpFolder -Filter "vertipaq*" | where-object {$_.PSIsContainer} | sort CreationTime -Descending)
if (-not $folders) { # Vertipaq folders not found
Write-Output ("*** No PowerPivot folders found. Aborting script. Temp folder: '" + $tmpFolder+"'.");
break;
}

foreach ($subFolders in $folders)  # Go through all subfolders starting with vertipaq*. Each represents separate workbook
{
# ############## Block to calculate folder size = workbook data size
[uint64]$size=0;
Get-ChildItem -path $subFolders.FullName -Recurse | % {$size=$size+$_.Length}; #Get size of folder
# ##############

# Print Vertipaq folder (= PowerPivot workbook) name, date created and size.
Write-Output ("Folder : " + $subFolders.Name + ". Created: " + $subFolders.CreationTime + ". DB Size: " + ([int]($size/1024)).ToString("#,###") + " KB")

# *.db subfolder is where all table info is stored.
$dbFolder = Get-ChildItem $subFolders.FullName -Filter "*.db" | Where-Object {$_.PSIsContainer} | Select-Object -First 1; #Get *.db folder (just one)
if (-not $dbFolder) { Write-warning "*** no DB folder found."; continue;}
$tblFolders = Get-ChildItem $dbFolder.FullName -Filter "*.dim" | where {$_.Name -like "*.dim"} # get all *.dim folders
foreach ($table in $tblFolders)
{
if (-not $table) {continue;}

# ############## Block to calculate subfolder size
[uint64]$size=0;
Get-ChildItem -path $table.FullName -Recurse | % {$size=$size+$_.Length};
# ##############

# Get Guid from file name - between 0 position and first "."
$tblGuid = $table.name.substring(0,$table.Name.IndexOf("."));
# Find matching XML file. <GUID>.<Version1>.dim matches <GUID>.<Version2>.dim.xml
$xmlFile = Get-ChildItem $dbFolder.FullName -Filter ($tblGuid + "*.dim.xml") | Select-Object -First 1; # Should return just 1 file
if (-not $xmlFile) {Write-Warning ("*** Matching table xml file not found. Table: " + $table.FullName); continue;}
# Reading xml file and find value for: <Load><ObjectDefinition><Dimension><Name>...
$tblName = "";
[xml]$file = Get-Content $xmlFile.FullName;
$tblName = $file.Load.ObjectDefinition.Dimension.Name;

Write-Output (" Table : " + $tblName.PadRight(30) + ", " + ([int]($size/1024)).ToString("#,###").PadLeft(10) + " KB")

if ($ShowColumnSizes -eq "Yes")
{
# Deal with columns in each table
# Column file example: H$2841eb0c-6ea4-4b19-89b8-1a23cc7d3fe7$Product.0.tbl.xml, pattern is *$*$*.tbl.xml
# Exclude *.Index.<version>.idf files as they are used for relationships
$ColFiles = Get-ChildItem $table.FullName -Filter "*$*$*.tbl.xml" ;
foreach ($colFile in $colFiles)
{
# Column name is after last "$" before first "."
$ColName = $colFile.Name.substring($colFile.Name.LastIndexOf("$")+1, $colFile.Name.indexOf(".") - $colFile.Name.LastIndexOf("$")-1)

[uint64]$col_size = 0; # Total size of the column. Two file naming patterns, include both
# *<TableGuid>*$ColumnName.*
Get-ChildItem -path $table.FullName -filter ("*"+$tblGuid+"*$"+$ColName+".*") | % {$col_size=$col_size+$_.Length;};
# *<TableGuid>.$ColumnName.*
Get-ChildItem -path $table.FullName -filter ("*"+$tblGuid+"."+$ColName+".*")  | % {$col_size=$col_size+$_.Length;};

if ($ColFile.Name.Substring(0,2) -eq "R$")
{
Write-Output("   Relationship: " + $ColName.PadRight(40) + ", " + ([int]($col_size/1024)).ToString("#,###").PadLeft(10) + " KB");
}
else
{
Write-Output("         Column: " + $ColName.PadRight(40) + ", " + ([int]($col_size/1024)).ToString("#,###").PadLeft(10) + " KB");
}
} # foreach ($colFile in $colFiles)

# calculate size of extra "helper" files - <Guid>.1.tbl.xml and also file info.*.xml
[uint64]$tbl_extra_size = 0;
Get-ChildItem -path $table.FullName -filter ($tblGuid+"*"+".tbl.xml") | % {$tbl_extra_size=$tbl_extra_size+$_.Length};
Get-ChildItem -path $table.FullName -filter "info*.xml" | % {$tbl_extra_size=$tbl_extra_size+$_.Length};
Write-Output("   Tbl metadata: ".PadRight(57) + ", " + ([int]($tbl_extra_size/1024)).ToString("#,###").PadLeft(10) + " KB");
}
} # foreach ($table in $tblFolders)
""
""
} # foreach ($subFolders in $folders)

You can download this script from here. (Please change downloaded files extension from .txt to .ps1)

Here is sample command how you would execute this script:

PowerShell c:\PS\vertipaq.ps1

Here is sample output of execution results (No column sizes included):

Folder : VertiPaq_E28ED0E5A4FB40D986F3. Created: 05/09/2010 19:30:44. DB Size: 366,144 KB
Table : DimSalesTerritory             ,        273 KB
Table : DimStore                      ,        426 KB
Table : DimProduct                    ,        908 KB
Table : DimDate                       ,        726 KB
Table : FactExchangeRate              ,        188 KB
Table : DimGeography                  ,        172 KB
Table : FactSales                     ,    123,536 KB
Table : FactInventory                 ,    238,368 KB
Table : DimPromotion                  ,        218 KB
Table : DimCurrency                   ,        111 KB
Table : DimProductCategory            ,        104 KB
Table : DimProductSubcategory         ,        136 KB

Here is sample output (just first few lines) of execution results when column sizes are included (command: PowerShell c:\PS\vertipaq.ps1 -ShowColumnSizes "Yes"):

Folder : VertiPaq_E28ED0E5A4FB40D986F3. Created: 05/09/2010 19:30:44. DB Size: 366,144 KB
Table : DimSalesTerritory             ,        273 KB
Column: EndDate                                 ,         10 KB
Column: ETLLoadID                               ,         12 KB
Column: GeographyKey                            ,         11 KB
Column: LoadDate                                ,         12 KB
Column: SalesTerritoryCountry                   ,         10 KB
Column: SalesTerritoryGroup                     ,          9 KB
Column: SalesTerritoryKey                       ,         11 KB
Column: SalesTerritoryLabel                     ,         14 KB
Column: SalesTerritoryLevel                     ,         14 KB
Column: SalesTerritoryManager                   ,         11 KB
Column: SalesTerritoryName                      ,         15 KB
Column: SalesTerritoryRegion                    ,         11 KB
Column: StartDate                               ,         12 KB
Column: Status                                  ,         13 KB
Column: UpdateDate                              ,         12 KB
Relationship: 55c7a64a-fbc3-4706-ba2c-9cf670476dd6    ,          4 KB
Tbl metadata:                                         ,         92 KB
Table : DimStore                      ,        426 KB
Column: AddressLine1                            ,         20 KB
Column: AddressLine2                            ,         20 KB
Column: CloseDate                               ,          9 KB

If you are working with bigger data sets, I would strongly suggest that you check sizes of columns and confirm that larger columns really need to be included into your PowerPivot workbook. While working on this script, I used ContosoRetailDW database data. For my test I loaded into PowerPivot workbook fact tables FactSales and FactInvenory and all related dimension tables. Size of the PowerPivot workbook was 366MB and that is very close to the limit of how much data you can load into 32bit PowerPivot. But after I run this script, I realized that 2 biggest columns in my PowerPivot workbook where taking over 60% of space. These columns where FactSales.SalesKey (87MB) and FactInventory.InventoryKey (183MB). As for my tests I did not really needed these columns, I deleted them from my PowerPivot workbook thus reducing its size from 366MB to 95MB.

 

Some known issues:

  • Before running PowerShell scripts you will need to setup Execution Policy. Please read this great article about options you have. If you choose to sign scripts, make sure you do this for all scripts you find on our website.
  • You might get following error message when you try to run "Set-ExecutionPolicy" command on Windows Vista:

Set-ExecutionPolicy : Access to the registry key 'HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\PowerShell\1\ShellIds\Microsoft
.PowerShell' is denied.
At line:1 char:20
+ set-executionpolicy  <<<< unrestricted

To fix this start your powershell environment using "Run As Administrator" option.

  • You might see that this script reports about PowerPivot workbooks that currently are not opened. This could happen when you had PowerPivot workbooks that were closed inproperly (example crashed). Inproperly closed PowerPivot workbooks leave their temporary structure in the temporary folder and this PowerShell scripts reports based on temporary folder data.

Tags: internals, excel, script

 

2007-2015 VidasSoft Systems Inc.