Another one that will seem to many like “programming 101″, but warrants talking about anyway.
Here’s a helpful tip for you. Do not allow any folder on your Windows server(s) to contain 800,000 files.
In the last two weeks we’ve been attempting to support a customer with a Windows server that is having a very difficult time with her web site. Her difficulties arise from several longstanding coding and management issues, but the major disaster right now is that the core functionality of the site has broken. One key reason that it has broken is that much of it revolves around a directory (folder) that has been allowed to grow without bounds for 10 years or more. At this point, this one directory contains over 800,000 image files. Yes, eight hundred thousand files in one folder.
A directory containing 800,000 files is going to be a management problem no matter what platform you’re dealing with. For Windows, this is particularly sticky as the folder long ago became completely untouchable from within the Windows GUI. You can’t open it. You can’t scroll through the list of flies. You can’t rename the folder. You can’t drag it to the recycle bin. You can’t move it. Copying the folder and all the files contained within took over 36 hours running xcopy on the command line (since you can’t drag and drop it). In short, when a folder gets this large your options become severely limited.
The customer’s current website consultants/developers have been unable to manipulate the folder using ColdFusion scripts. When they try, a server with four CPU cores and 4 GB of RAM running one site is brought to its knees. At one point these guys asked if we had an 8 core 16 GB server just hanging around that they could use to run the scripts they wrote to clean up the folder from hell. Sure thing. We generally keep $12,000 servers just hanging around waiting for just such a request. Oh, and I should also mention that they needed it to be fully outfitted with Windows 2003 server, SQL 2005 and ColdFusion Enterprise. Another few thousand dollars worth of software needed to solve the problem.
Of course, there’s more than one way to skin a cat, especially if one is an expert in cat skinning. I’m happy to report that our customer maintains a relationship with the person that originally wrote her site 10 years ago while he was dabbling, and learning how to code. Luckily he was willing to help, and managed to put together a C Sharp application that chopped the folder into smaller more manageable folders organized by file name. There are still folders in the mix with 60,000 files in them, but that’s a far cry from 800,000 and it makes things at least manageable albeit slowly.
This happens to be a real estate site, where ads for home sales are posted on a daily basis. The problem is that the site was somewhat poorly coded from the start (dumping all image files into one folder), and that the site included no file management routines designed to allow the end-user (who has no knowledge of how to run a server) to purge old ads and delete old files on a regular basis. The original designer, who by his own admission knew very little about web application development back then, never imagined in his wildest dreams that the site he cobbled together from “Cold Fusion for Dummies” would still be in use 10 years later. He never considered the possibility that his customer was too technically challenged to actually think about performing basic server management tasks. On the customer’s side, she never considered the idea that a website might need some basic level of attention to run properly, much the same way you change the oil in your car on a regular basis or take your dog to the vet every year for a checkup.
The moral of the story is that management matters. RAM and disk space, though cheap, are not limitless. It pays to spend some time planning for how all the data your site generates is going to be organized, stored and managed over the years. If you’ve done a good job, odds are that site will be used for many years, so be sure that you take into account the impact of time, and the clue level of your end users, on your applications.
—-