What is the Recommended way to import large xlsx files?

Discussion in 'ASP.NET / ASP.NET Core' started by gangreen, Aug 21, 2014.

  1. I suppose I should say up front that this all works fine on my development environment, but doesn't behave correctly in production. I opened a support ticket thinking that my issue may be caused by a DASP memory restriction and triggering the Application Pool to be automatically recycled. DASP told me that this does not look like a memory issue and after a good effort on their part, I have since been referred to the forum.

    Anyway, I am trying to import a ~24MB xlsx flat file into a SQL Server temp table so that I can run a stored procedure on that temp table.

    The flat file has 119 columns, and right now happens to have 68865 rows of data (but that can and will increase over time).

    Currently, I am uploading the file to a temporary folder. This upload is successful every time. I am then obtaining the path to the file and passing it into the method below as a string.

    Using the method as written below, I can import up to 4761 rows successfully. At 4762+ rows an error is thrown: System.InvalidOperationException: Column 'MyNonNullableDatabaseColumn' does not allow DBNull.Value. Row 4762 does not have a null/empty value in that column and it is the same data type as the rest of the values in that column (except the header).

    I have tried modifying the select statement below from "Select * from [Final$]" to "Select * from [Final$A1:DO11]" to see if I could batch import the flat file. This works for up to 50776 rows, but fails for 50777+ rows. The error that is thrown is different than the error above - it is: System.Data.OleDb.OleDbException (0x80004005): This table contains cells that are outside the range of cells defined in this spreadsheet.

    I have tried several different things to try to get this to work in addition to this current path, but rather than list all my dead ends I was hoping someone would be able to point me in the right direction.

    public IValidationState ImportFlatFile(string filePath)
    {
    String strConnection = ConfigurationManager.ConnectionStrings["NameOfConnection"].ConnectionString;
    using (var context = new DataContext(strConnection))
    {
    context.ExecuteCommand("DELETE FROM MY_TEMP_TABLE");​
    }​

    String excelConnString = String.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0\"", filePath);
    //Create Connection to Excel work book
    using (OleDbConnection excelConnection = new OleDbConnection(excelConnString))
    {
    //Create OleDbCommand to fetch data from Excel
    using (OleDbCommand cmd = new OleDbCommand("Select * from [Final$]", excelConnection))
    {
    excelConnection.Open();
    using (OleDbDataReader dReader = cmd.ExecuteReader())
    {
    using (SqlBulkCopy sqlBulk = new SqlBulkCopy(strConnection))
    {
    sqlBulk.BulkCopyTimeout = 60;
    //Give your Destination table name
    sqlBulk.DestinationTableName = "MY_TEMP_TABLE";
    sqlBulk.WriteToServer(dReader);​
    }​
    }​
    }​
    }
    using (var dbContext = new FmssDataEntities())
    {
    dbContext.usp_ImportProjects();​
    }​

    return _validationState;​
    }
     
  2. RayH

    RayH DiscountASP.NET Lackey DiscountASP.NET Staff

    You could try SQL Server Management Studio's Import/Export Wizard.
     
  3. This needs to be done fairly regularly by non-technical users, so I was really hoping to do this via a web interface. Having users doing imports and executing stored procedures via SSMS seems risky to me (unless I set up a bunch of permissions). But, that is definitely an option.

    On another note, I don't want to create a mapping file for it...but do you know if it is possible to execute the BCP utility from .NET in order to import the file?
     
  4. RayH

    RayH DiscountASP.NET Lackey DiscountASP.NET Staff

    BCP is not supported as far as I know. Have you looked at this on StackOverflow? It might be a problem with the mapping.
     
  5. martino

    martino DiscountASP.NET Staff

    Last edited: Aug 21, 2014
  6. Here's an observation related to the use of SqlBulkCopy and the first reported exception: "System.InvalidOperationException: Column 'MyNonNullableDatabaseColumn' does not allow DBNull.Value". This exception indicates that the structure of the data present and modelled in the source spreadsheet doesn't match the schema requirements of the target db table.

    As I'm sure you're aware the nice thing about SqlBulkCopy is it's very fast but it is unfortunately unforgiving for these types of conditions and it has quite limited error handling facilities. You might want to continue to use SqlBulkCopy for performance but if that's the case and we make the assumption that the quality of the source spreadsheet data is likely to be questionable going forward, IMO you're going to need to harden the method you're using to read the data from spreadsheet and "massage" it into shape before performing the SqlBulkCopy operation.

    One way you might do this is populate a System.Data.DataTable rather than the OleDbDataReader, perform some sort of cleanse operation on the data in the populated DataTable and then perform the SqlBulkCopy operation with this cleansed DataTable.

    Another option is: pass the entire populated uncleansed System.Data.DataTable to a SPROC as a parameter of type TABLE where this parameter is a user defined table type that has a schema that matches the data modelled in the source spreadsheet, then perform the cleanse on the data in the SPROC.
     
    RayH likes this.
  7. Thanks for all the suggestions. I've responded to each in kind.

    I have one untested idea based on information I learned from martino's post (described below). I'll post back once I try it out. Also, I need a nudge on how to process my xlsx file into a DataTable before trying CrystalCMS's suggestions (see below).

    @RayH – It appears that the OP was having issues mapping a System.Data.DataTable to the SQL Server destination table. His solution was to use a DataAdapter to map between the two http://msdn.microsoft.com/en-us/library/system.data.common.dataadapter.tablemappings(v=vs.110).aspx Since I’m not using a DataTable (at least not yet…), this doesn’t directly apply. Also, his code failed on the first row, mine fails based on the size of my uploaded file.

    However, as I’ve speculated below, you may very well be right and this could fundamentally be a mapping issue. Except the mapping is more or less out of my hands (depends on registry settings and/or OleDb driver versions). This might account for the reason this is successful in development and not production.

    @martino – I followed the directions in your first link. I created a brand new Excel Workbook and pasted the data from the original source into the new Workbook. Surprisingly, this did improved the total of number of records I am able to process using the method as seen in my original post. I went from being able to process 4760 to over 10,000 before failure! By doing this, one thing I noted is that the size of the xlsx file size changes drastically:

    1. The original source xlsx flat file I receive has ~99,000 records and it is ~14MB

    2. After I scrub the flat file using VBA, the records decrease to ~69,000, but the file size jumps to ~24MB

    3. Pasting the records into a brand new Workbook produces a ~10MB file

    I am able to use the comparatively small ~10MB file, delete all but 10,000 rows and process the file successfully, but the outcome is ultimately the same – system resource exceeded.

    As for the second link, it sounds like the invalid range error was caused by the OP attempting to read/write on a column outside the range of the originally imported spreadsheet. The source data I have has all the columns, but the only time I receive this particular error is when I am trying to process a subset of a much larger set. However, by combining what I learned from your first link with the method (selecting a smaller range of the total sheet) that produces the error in the second link, I wonder how much data I can have in a single sheet before an error is thrown.

    @CrystalCMS – based on what I learned above, I think you are on the right track – the quality of the data is questionable. Your comment about SqlBulkCopy being unforgiving is probably true, unfortunately, my code fails before that part.

    However, I think you are on to something when you talk about the forgiveness of a method, specifically I’m thinking of the OleDb driver.

    I have found forum posts somewhat related to my issue and the cause can be with different registry settings/entries on the production server vs the development environment. There is apparently some data type “sniffing” that occurs on the Excel workbook based on these settings. This leads me to wonder whether this process of determining data types is “glitchy” with a certain version of the driver and somehow the data type is being chosen incorrectly (ultimately a mapping issue…) and when there is a datatype mismatch the data becomes DBNull.Value. This theory may be far-fetched, and I don’t know how to test it anyway, but wanted to throw it out there.

    Regarding the use of System.Data.DataTable, is there a preferred method to parse the data from an xlsx file into a DataTable? I tried two methods I found on the internet without success before going down this rabbit hole.
     
  8. The DataTable.Load method takes an IDataReader instance and your OleDbDataReader implements that interface so you already have one of those; I've not tried it myself because I don't do anything with OleDb but it looks like conversion will be a single line of code added to what you already have: http://msdn.microsoft.com/en-us/library/7x8ccbsb(v=vs.110).aspx
     
  9. Oh, that sucks. Opening the using block that creates the OleDbDataReaderData is the line of code that causes the exception.

    Btw, I tried using the new workbook instance I referred to in my last post and selecting a subset of the total rows in the select statement, but it won't work on a file that size.
     
  10. Well, guys, with your help, I think I figured it out.

    The main issue appears to have been with the Excel connection string. I changed it -
    From: String excelConnString = String.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0\"", filePath);
    To: String excelConnString = String.Format("Provider=Microsoft.ACE.OLEDB.12.0; Data Source={0};Extended Properties='Excel 12.0 Xml; HDR=YES; IMEX=1';", filePath);

    I had changed extended properties several times throughout this process, but I don’t think I ever tried this exact version (which happens to be the second connection string on http://www.connectionstrings.com/excel/)

    I think this goes back to the OleDb driver musings I had. For whatever reason the extra hints/properties are not needed on my development environment, but they are on DASP.

    I feel it is important to note that the connection string alone wouldn’t have solved the problem. I tried to import the original, scrubbed 24MB flat file and I believe it caused my site’s application pool to be recycled (unconfirmed, but I had to re-login after it failed). I then used the knowledge gained from @martino’s post about being able to reduce the file size, without reducing the amount of data and voila it works….At least until I have to process more data in a single request. At that point, maybe I'll just opt for some VBA to split the data into several sheets and batch import each sheet one at a time.

    Thanks for all the help. I’ve been banging my head against the wall on this all week.
     
    RayH and martino like this.
  11. martino

    martino DiscountASP.NET Staff

    Very cool thank you for posting what worked for you. I'm sure it will help someone else out :)
     

Share This Page