Handling Large Volumes of Data in Power BI
Data fuels today’s business and Microsoft’s Power BI tool helps you make sense of that data. Power BI is a suite of business analytics tools to analyze data and share insights. There are two licensing options for Power BI: Power BI Pro and Power BI Premium.
Among other differences between the two options, data storage is a main factor – depending on the data requirement, you can choose which option of the tool to use.
With a Power BI Pro license, you can upload up to 10 GB of data to the Power BI Cloud. But with a Power BI Premium license, you get to store BI assets on-premises and receive a 50 GB cap on dataset size and up to 100 TB data storage. So, you can choose to use Power BI Pro if you are a heavy business analytics user using it regularly for creating and consuming data using dashboards, data, and reports. But Power BI Premium would be a better choice if you have a large enterprise that needs many people across the business to use the data, and view reports and dashboards.
Power BI challenges in handling large data volume
For all data sources used in the Power BI service, the following considerations and limitations apply. These are the challenges and limitations specific to the data handling and storage:
- Dataset size limit – there is a 1 GB limit for each dataset in the Power BI service.
- Row limit – the maximum number of rows in your dataset (when not using DirectQuery) is 2 billion, with three of those rows reserved (resulting in a usable maximum of 1,999,999,997 rows); the maximum number of rows when using DirectQuery is 1 million rows.
- Column limit – the maximum number of columns allowed in a dataset, across all tables in the dataset, is 16,000 columns. This applies to the Power BI service and to datasets used in Power BI Desktop. Power BI uses an internal row number column per table included in the dataset, which means the maximum number of columns is 16,000 minus one for each table used in the dataset.
- Power BI Premium supports uploads of Power BI Desktop (.pbix) files that are up to 10 GB in size. Once uploaded, a dataset can be refreshed to up to 12 GB in size.
Techniques to large data handling
Power BI uses import models that are loaded with data, which is then compressed and optimized and then stored to disk. When source data is loaded into memory, it is possible to see 10x compression, and so it is reasonable to expect that 10 GB of source data can compress to about 1 GB in size. Further, when persisted to disk an additional 20% reduction can be achieved.
Although this may achieve some level of optimization, it is important that you strive to minimize the data that is to be loaded into your models. Especially when handling large data volumes, it becomes important to optimize the way data is loaded to the data models and storage.
There are some techniques that you can use to improve the data handling and the responsiveness of your Power BI. Some of these are outlined below:
- Optimize rows/Filter source data – Import only rows you need for your analysis. This will ensure that you only keep the required data in memory and therefore use it optimally. For example, you can set date filter to import only transactions for the last two years and not the entire sales history.
- Optimize columns – Remove all columns that are not relevant to your analysis, such as primary keys not used in relationships, or columns that can be calculated from other columns, or description columns that are not needed.
- Decrease granularity/ Group by and summarize – Detailed datasets have several rows of data, with information that is at a granular level. The more the granularity, the more rows of data you will have. So keep the datasets less granular and use grouping where possible to make the data more concise. For instance, if you are analysing monthly or yearly data, you could group your data on a monthly basis so that the granularity is reduced.
- Optimize column data types – Reduce the cardinality for all columns stored in large tables, such as a fact table. To do this, round numbers to remove obsolete decimals; round time to remove milliseconds/seconds; separate text columns into two or more parts; split DateTime into date and time columns, etc. Also, avoid calculated columns since they consume memory. Make sure all columns have the correct data type.
- Disable load – When you import data from a source, you apply transformations, such as merging and appending queries. As a result, you may end up with queries that are only used as intermediate transformation steps. By default, all queries from Query Editor are
loaded into the memory of Power BI model. It is crucial to disable load for all queries that are not required in the final model.
- Disable Auto Date/Time – Power BI automatically creates a built-in date table for each date field in the model to support time intelligence DAX functions. Those tables are hidden, they consume memory, and there is no flexibility to add custom columns. To remove all hidden date tables from your model, in Power BI Desktop select File / Options and Settings / Options / Data Load and untick the Auto Date/Time.
- Transform data at the right place – Most data transformations generally take place in Query Editor in Power BI Desktop. Query Editor is a powerful and user-friendly tool that keeps track of all applied transformation steps which is useful for traceability and future maintenance. However, you may obtain improved performance when you apply transformations directly at the source database. For example, grouping your sales data by month in your transactional database will increase the source query execution times and as a result, only grouped data will be sent over the network to Power BI.
- Consider using DirectQuery or a mixed model – You should import data to Power BI wherever possible, however, if your goals cannot be met by importing data, then consider using DirectQuery. In DirectQuery mode, you don’t have to import the data. You can get the data directly from the data source and so there are no limits on data volume on the Power BI side. However, report performance would be slower, and not all functionality would be available. So you can choose to have a mixed or composite model where you can store some of the tables in import mode and others in DirectQuery.
- Move calculations to the backend – Think thoroughly about how you can move calculations on the back-end as much as possible For instance, creating new fields in the data source that allow you to reduce calculations effort of PBI.
Remember, memory is the biggest asset in Power BI. Techniques represented in this post will reduce memory footprint which has a direct impact on the performance of your reports and dashboards