So, real quick as I just got the idea to actually post this little story, I was writing a little script. This script dealt with ids I'm using for a project. Basically I knew there was a lot to begin with, but wanted a flat file, not a database because I figured "Hey, I'm dealing with a lot, why pay the overhead every id of doing a bunch of queries?" Among other things, this script couldn't list an id more than once. So I (stupidly) chose an array to store the ids and sort through them. At first, no issues, I could process a few ids in a second and get all my work done on them. Well, after I hit around 15,000 I had issues. It was now taking about 5 seconds an id. It only got worse, and worse, and worse. Around 19,000 ids I finally CTRL-C's the script, to retool it. I changed it over to throwing all the ids into a database, accepting a tad more overhead per id, but overall this becomes linear time. I stupidly forgot my Big-O notation classes, and that sorting through a large list of numbers takes longer than a very short one. *sigh*.

Bottom line: For large amounts of data accept the higher overhead per unit when it'll lessen the overall overhead. Now my script runs happily all day without decreasing my speed. I only post so you geeks out there can laugh at me, and I'll remember this handy little lesson later on.

(I feel the need, as I do freelance programming, to point out I'm normally quite smart regarding how I design my code, and I'm posting this because I can't believe I did that!)

Share and Enjoy:
  • Slashdot
  • del.icio.us
  • digg
  • Technorati
  • Facebook