Need a solution for importing daily data

Discussion in 'Databases' started by matthewshar, May 18, 2009.

Thread Status:
Threads that have been inactive for 5 years or longer are closed to further replies. Please start a new thread.
  1. I have a sql 2005 database and every day, I need to import about 100,000 records from a flat file. The most obvious solution to this would be to use DTS or BCP via a stored proc, etc. However DASP doesn't support any bulk insert methods. I tried to create a web page that would get called by the scheduled task tool daily that used ADO.Net to open the file, iterate through the records and do inserts one at a time; but the 100,000+ calls to the DB server are far too slow as one would expect. I need to somehow get this to work and it has to work in an automated fashion without me having to run any scripts or applications on my client machine. I have a thought as to a hack that might work; but before I invest time in testing them out, I wanted to see if anyone has found a good solution for this problem yet? My idea is:

    1. Perhaps make one call to the server and pass the entire flat file to a stored proc as a single varchar(Max) and then parse the file in a cursor and do the row inserts one row at a time; but at least all the processing will be taking place on the DB server and I'll avoid the 100K+ round trip calls between servers. The file has about 125 fields in each row of varying data types; so it would be a real pain to write the code that will properly parse and insert in T-SQL. I thought of writing a manged code stored proc since Sql 2005+ supports that; but wouldn't you know it - DASP does not allow that either. However, it can be done with T-SQL as painful as that would be to write.

    It looks like no matter what, DASP will force me to insert the data 1 row at a time - I'm just looking for the fastest way to do that and perhaps someone can point me to existing pre-written code snippets that I can reuse for this purpose without having to re-invent the wheel.
     
  2. Hi,
    This is rather incredible. . .
    I do not have a solution for you but let's look at this in typical flow brain-storming;
    Why is there a need for pumping 100,000 flatfile records daily? Can that part change?
    I used to transfer the PeopleSoft SQL records and for a company that large 100,000 daily would be a shock.
    Just brain-storming with ya...
    Mark
     
  3. mjp

    mjp

    I'm afraid that I can't help you with a solution to your problem, but I can explain why no shared host (that I know of) allows bulk inserts.

    Your 100,000 record insert/update - even as a single bulk insert - is going to have a temporary negative effect on server performance. Now imagine a couple dozen other users on the database server doing the same thing. And just to make it interesting, let's say that one of them doesn't exactly know what they are doing (I know, unbelievable, but it happens ;)), and causes a loop or some other catastrophic snag. You get the picture.

    As a daily occurrence on your own database it could be bearable - you have a period of high resource use on the db server, but you expect it and deal with it. But on a shared server it's not only potentially disastrous, but there are also a couple hundred other users on that server whose performance is going to be degraded during your insert. And some of them are going to notice even a temporary slowdown and be very unhappy about it.

    Shared hosting necessarily requires some limitations, and bulk database operations are among the first things to be sacrificed to assure smooth operation for all the database server users.
     
  4. Reason For 100,000 Record Daily Import

    The data that I'll be importing daily is data obtained from my local real estate MLS about properties for sale. Every day, listings expire, new listings are added, prices and other information about the properties changes, etc. The MLS provides me with FTP access to a zipped file that contains a single flat file with all of the data in an unnormalized fashion. My plan is to load it exactly as they give it to me into a table each day and then run an advanced query that takes that data and updates my properly designed relational database with the latest updates. Worse yet, what they give me in the flat file is the latest information for all active properties - not just the ones that had changes; so it's up to me to take that and compare it to what I've got so I can determine which records need to be deleted, added, or updated accordingly. I have no control over how they provide me the data or how the MLS's database is set up.

    I'm just trying to work with what they give me and turn it into something more well thought out so that I can use it to build a better house hunting experience for visitors to my real estate web site.

    Once I get their raw data into sql server, I know how to normalize the records in the most efficient data into my tables using set operations wherever possible. Hopefully, that explains why I have the unavoidable need to import this much data every day.

    I'm experimenting with the concept of having my asp.net page that is called by the scheduled task process break the file into chunks and passing each chunk into a stored procedure as a varchar(max). Then, using a recursive call to a UDF to split out the records and fields and perform the insert for each record - thereby reducing the need for so many round trips to the DB. So far, a crude proof of concept test is proving that it might just work.
     
  5. i had a friend who had to execute thousands of sql statements against a mysql web site. the statements were the create, insert, etc. statements to recreate the db schema and data from scratch.

    his problem was that the host would not let him execute or upload a file as large as it was.

    so i whipped up a quick app that would let him break it up into xxxkb files and it worked flawlessly (i think all sql statements were on one line). it also renamed files with an incremental number and each file had a few hundred insert statements in it.

    if you want it, its yours, just pm me (i'll have to dig for it) but it will solve your problem so that you can execute smaller bunches of statements at a time instead of having to execute one huge ass statement all at once and timeout.
     
  6. Thanks,
    After further research it seems that some other hosting companies do allow SSI packages to be run as long as they are reviewed and scheduled by one of their operators. I may have to switch to a different hosting company as SSI is the right way to do this sort of thing. I'm building the package and getting it all ready. Then, I'll sign up for an account and test it out on the other hoster. Once it all works, I'll switch my production DB over.
     
  7. cool well i found the application and its yours if you want it plus the vb.net code.
     
  8. I'm not so sure it will help. It will still require me to run a separate insert and call to the DB for each record that I want to import. Breaking the file into chunks doesn't reduce the amount of calls to the database. It would simply spread out the calls over a longer period to perhaps avoid the timeout; but the calls would still put a crazy and unnecessary load on the server and take forever to finish. Plus, doing it within a transaction is more of a challenge to keep the transaction open for the duration of all calls.
     
  9. well i'm not 100% certain about this but i 'think' if you execute many statements at one time as compared to running individual statements it is a quicker execute time. meaning that if you have 100,000 sql statements in one file and then you split the 100,000 commands into 1,000 files, each file will execute the statements faster than if you did them one at a time.

    i think there are settings in sql server that control this setting. but i've had to do this before and i believe there was a noticeable lack of time involved in executing a multi-statement file than doing them one at time.

    what i think goes on in the background is that when you connect and execute the statements, you're utilizing one data pipe to the db.

    if you use multiple files and an automated method of splitting the file like the software does, you could automate the addition of any necessary statements to ensure transactional processing or any other optimization methodology.

    anyhow, just a thought.
     
Thread Status:
Threads that have been inactive for 5 years or longer are closed to further replies. Please start a new thread.

Share This Page