Best way to upload over 200,000 files?

Discussion in 'General troubleshooting' started by Gurduloo, Dec 23, 2010.

  1. I usually use FileZilla to upload content to my site and it works fine. I'm now looking at adding a Bing Maps mashup which will entail uploading over 200,000 tiny little .png files (each one is very small - about 2kb). Based on loading the first 5,000 of these files, this could be a very time consuming process with FileZilla. It took close to an hour to do this first batch. It seems like the ideal thing would be to put them all in a zip file, upload the zip file and then extract that on the server - but as far as I know, I can't do that.

    Anyone have ideas how I could accomplish this any easier?
     
  2. You could actually zip / ftp / unzip on the server but you'll have to develop a little app to do the unzipping. Check out sharpziplib - it works on the dasp web servers. I know this because I use it for programmatically zipping up database .bak files on the server.
     
  3. mjp

    mjp

    200,000 files? I hope those are broken up into multiple directories. Your performance can really suffer when a directory has too many files. How many is "too many" is hard to say, there are a lot of variables. But a directory or folder with 200,000 files - even small ones - is definitely pushing the limits.
     
  4. Believe it or not, all in the same folder. Right now I have about 65,000 of the files on my local machine and performance is fine. I can't browse the folder with windows explorer or anything, but IIS is serving them out just fine. In theory the type of map service I'm making could have many millions of these files, but I'm trying to keep things "small" and "manageable".
     
  5. mjp

    mjp

    My only concern would be I/O issues when that folder is under load. File size isn't as much of a problem as the potential server resources you could require to serve up large numbers of small files. You know - I say that not knowing really what you're doing, I'm just imagining that you are rendering maps using thousands of different "pieces." But I could be way off in left field, I don't know.

    What does your CPU usage look like locally when you run the app that assembles all these bits (if indeed that's what it's doing)?
     
  6. It shouldn't be a problem at all. I'm storing thousands and thousands of these files, but rendering a single map should only use up to 16 of these files. And they're such tiny files that it's nothing to the server to send out 16 at a time. The reason that it's so many files to store is that the map itself covers the entire USA, at many different zoom levels. But a user would only look at a small piece of the overall dataset at once. On my local machine, the CPU usage doesn't even register when I run the map. I'm also not expecting very many users (though more would be nice).
     
  7. Interesting

    This is an interesting thread; I like it when an app pushes the server's limits.... do keep us posted as to how it works out.

    I did have an ASP.NET app a while ago that 'went exponential' and started consuming too many server resources and it was automatically throttled by DASP's servers. I ended up rewriting it so it was no longer so resource hungry and everything worked out.

    PJ
     
  8. I side with mjp on this one. Assuming the file system on the server is NTFS, the theoretical max number of files in a single folder is 4.3 billion. I would not recommend anyone attempt to get anywhere near this unless doing it on a cannon fodder dispensable test box just for fun / experimentation.

    In the not so distant past I witnessed a Win2008 file server and its' NTFS file system grind to a complete halt in a production environment because enterprise business systems had been allowed (due to bad code and not mine for once ;-)) to create and dump new files into a single folder on the file server unchecked for years.

    The folder contained >5M small files and the file system had become inaccessible and effectively cut off from Windows Explorer and the shell. To fix the mess we had to develop a clean up application to repair the file system by moving files into a year / month / day / project type folder structure. I learnt that it definitely makes sense for applications to aim to protect the file system and general reliability of the OS by spreading the load of files into sub folders.

    In this particular scenario it sounds like the application is working however upload is slow which to some degree will be due to file system load on the server. Maybe the design needs attention if it's dependent on all files being in a single folder and I can't help thinking that there's a danger this is a future sys admin nightmare waiting to happen for someone.
     

Share This Page