How to Improve SQL Server Bulk Data Import Performance?
I am working on a project where we are migrating Oracle databases to SQL Server for SAP applications. We are using SAP migration tools to migrate these databases and we are not using SSMA for Oracle in this migration. These databases were very big in size so we faced some performance issues during data load. Here I will explain how to avoid such performance issues and speed up SQL Server Bulk Data Import Performance.
Let me first give you some background about SAP migration tool like how it works. SAP Migration tool first captures every related information along with data and creates an export file that needs to be copied to the target location or server and then we perform bulk data import with the help of SAP migration tool. Bulk data import or data load will be done locally and not over network.
When we ran SAP migration tools to capture all details, a dump file was created that we have copied to the SQL Server box. We had approx. 900GB of data that needs to be loaded into SQL Server database. When we performed SQL Server data load using SAP migration tools with standard settings, it took around 14 hrs. to complete the bulk data import. As 14 hrs. was very long time so we decided to work on reducing the total time taken by import process.
We made some changes from both sides application as well as SQL Server database side. SQL Server data load time has reduced drastically to around little more than 6 hours post these change implementations. Here, I am explaining each parameter that we have configured from database side just for this bulk data import.
NOTE: Make sure to revert these changes as per your application/vendor best practices post this bulk data import for OLTP/normal operations. We are changing value considering Data Load only.
It is important to understand the factors that affect bulk data import performance. To speed up the data load, you need to look into below configuration options from DBA standpoint.
Autogrow settings can be used to automate SQL Server database file growth. This is very useful property but this can degrade your bulk import performance or data load operations. Read attached article to understand SQL Server Autogrow setting in detail.
Autogrow setting increment must be large enough to avoid the performance penalties. If you set default or small value during bulk import or data load, database engine will have to perform data file autogrow operation every time the file will be filled with data till maximum size limit. More autogrow operation will take more resources like IO, CPU etc and time to complete the data load transaction.
The exact value for this setting should be based on data growth. You should not be left autogrow in percentage value rather you should use some value in MBs. A general rule of thumb is to set your autogrow setting to about one-eighth the size of the file. Please read attached article to understand how to manage transaction log files during huge data load.
I would suggest keeping database size as much as you are expecting your database will grow to avoid autogrow operation. If you are aware that your database size will be 900 GB post data load, you should set your database size to 900 GB before starting the data load. This will improve the bulk import performance. Make sure to enable IFI (Instant File Initialization) for your SQL Server instance to reduce total time SQL Server takes to set the database size.
Instant File Initialization
As I suggested in above point, Turn IFI (Instant File Initialization) ON to speed up AutoGrow process. If you set a considerable large value to autogrow settings, it will take some time to increase the size by zeroing the disk address. IFI speeds up this process and allow us to create enormous size of data files in quick way. Read attached article to learn more about IFI (Instant File Initialization) and how it will speed up autogrow or data load operation. You will also learn how to enable IFI for your SQL Server instance in this attached article.
AutoShrink is very costly operation that should be always OFF even during your daily and normal operations. Turn it OFF immediately. Make sure your disk has enough space to sustain during bulk operations or data load. Repeatedly shrinking and growing the data files will cause file-system level fragmentation, which can slow down performance. It also takes lot of CPU and IO. Read attached article to know why you should always keep Auto_Shrink off.
Change recovery model to SIMPLE or BULK-LOGGED to reduce the logging operations. The simple recovery model minimally logs most bulk operations. For a database under the full recovery model, all row-insert operations that are performed during bulk import are fully logged in the transaction log.
For large data imports, this can cause the transaction log to fill rapidly. For bulk-import operations, minimal logging is more efficient than full logging and reduces the possibility that a bulk-import operation will fill the log space. To minimally log a bulk-import operation on a database that normally uses the full recovery model, you can first switch the database to the bulk-logged recovery model. After bulk importing the data, switch the recovery model back to the full recovery model.
Auto_Update_Statistics is very useful property that should be set to ON during normal business operations but here I would suggest turning it off during huge data load or bulk import operations. Updating the statistics also takes resources from the system. This can lead to performance issues when you are doing bulk data. It’s important to keep up-to-date statistics for the optimizer to use, but here we will turn it off and once data load or bulk data inserts will be performed, we will again enable it.
This should be OFF because creating statistics also takes resources from the system that can degrade the overall bulk import performance. Make sure to enable it post data load or bulk inserts operation.
Parallelism – MAXDOP
SAP suggests setting MAXDOP value to 1 during normal business operation so we had this value during first data load that took around 14 hrs. We had assigned half of the CPU cores for the parallel processes during data load and bulk import operations. I believe MAXDOP has contributed significant role in reducing total data load time. Make sure to revert this change to its original value as per your application best practices.
You can change MAXDOP settings either using sp_configure command or using GUI in SSMS by launching SQL Server Instance property window.
Make sure to use an optimum batch size that will increase the bulk data import performance. Importing a large set of data as a single batch can be problematic, so bcp and BULK INSERT let you import data in a series of batches, each of which is smaller than the data file. Each batch is imported and logged in a separate transaction, and after a given transaction is committed, the rows imported by that transaction are committed. If the operation fails, only rows imported from the current batch are rolled back, and you can resume importing data starting at the beginning of the failed batch rather than at the beginning of the data file.
Latest posts by Manvendra Deo Singh (see all)
- Understanding Hybrid Buffer Pool in SQL Server - December 30, 2018
- Fix:VIEW SERVER STATE permission was denied on object ‘server’, database ‘master’ - August 30, 2018
- SQL Server Error 5184: Cannot use file ‘I:\Path\Techyaz.mdf’ for clustered server - August 29, 2018