It pays to digg — the deathmatch that wasn't

November 7, 2006

It pays to digg — the deathmatch that wasn't

Whereas spam is the leading cause of lost productivity for many people, digg is my Achilles' heel. I spend more time on digg (and slashdot and fark) than I'd like to admit, so it's nice to see it finally paying off.

Last Friday I found this gem—a local programming contest with a $10K grand prize. This was the first I'd heard of it, even though it had received quite a bit of local media coverage. So I went to mozy.com and registered for the competition, and the next morning my $10K quest began.

Round 1 took place online and began at 10 AM, with six 10-minute programming challenges. Nothing overly complex, but the time limit left little room for error. Fortunately I got all of my solutions in, and shortly after was informed that I made the cut for Round 2.

Round 2 was also online, with 4 questions spread across an hour. Here's where things got a little more complicated and the technical difficulties began. One of the tasks required you to open a socket to the mozy.com server and read some data. Seems like everyone tried connecting at once, and a mini-DDOS ensued. In the end, I was able to connect, read the data and produce the correct output, but others weren't so lucky. At this point, I thought I had done pretty well, but I wasn't sure if I had made the final cut. Much to my surprise, I got an email shortly thereafter informing me I was one of 8 finalists.

We gathered at the mozy.com offices at 4:00 PM for the Ultimate Showdown of Ultimate Destiny. I was nervous—being a lapsed undergrad, I was going up against college faculty and professionals with far more experience and knowledge than me.

I should also point out that I had done all of the exercises in PHP, even the one with the O(2^n) algorithm (shudder). Everyone seemed a bit surprised that a PHP programmer had made it this far, and I can't say I blame them. PHP is looked down upon by many—it lacks the elegance of Java and the speed of C, and the language itself looks something like the mutant spawn of C++, Perl and Retarded. You might call it the short bus of programming languages, in large part due to the caliber of the average PHP programmer. Nevertheless, good programmers are rarely constrained by languages, and PHP is no exception. It kicks ass if you know how to use it.

Our final task was to write a server that would accept 10,000 connections. Each connection would send 9K of data - over 2000 4-byte integers. We would have to then sort the 20+ million integers and return the 100th, (n-100)th, and (n/2)th ones from the list. Whoever could do this the fastest would win the $10,000 prize.

And so it began. Sockets are easy, and there is only so much optimization you can do there, so I saved that for last. My main concern was making the sort as fast as possible. Rather than try to sort all 20 million numbers, I stored them in a 4-level tree—one level for each byte, meaning each node had at most 256 children. Memory was not a concern, only speed—I didn't actually sort anything unless I had to. That means I sorted at most 1024 (4*256) elements for the first lookup and 784 elements for subsequent lookups. Each node cached how many numbers it contained, making it easy to find a number in the tree once the relevant nodes had been reordered. The sorting itself incurred no real cost—the only consideration was the time to receive the numbers and populate the tree, which was constant relative to n. O(n) is nothing compared to O(n log n).

Once I had that squared away, I set up my simple server to handle the requests. I had overheard other people running into difficulties, and I soon discovered why. Windows doesn't like too many simultanous connections, and everyone hit a wall right around 300. Even with a 20 minute time extension, nobody was able to get it 100% working. Things were not looking good.

We had a quick powwow to figure out what to do. Several of us wanted to try running our programs on a Linux machine to see if they would fare any better. That helped a bit, but none of us were able to get much past a couple thousand connections—nowhere near the 10K target.

So after a brief debate, we decided there was no clear winner and the best thing to do would be to split the prize money 8 ways. Thus died the deathmatch, but everyone got something out of it in the end ($1,250 to be precise).

True, it was a bit of a letdown, but I really can't complain. The whole experience definitely piqued my interest in learning more about how TCP/IP connections are managed by the OS. It was especially interesting to see the working low-level prototype written by a mozy employee. It handled all 10K connections effortlessly, something none of our C++/Java/PHP/Ruby servers could do, due to shortcomings in our implementations and/or limitations of the OS.

I really would have liked to see how my sorting algorithm stacked up to everyone else's. I suppose there's always next year's competition. Guess I'll keep digging in the meantime.

Posted by jon at November 7, 2006 1:12 AM

Comments

Well, I am Peter Jensen, but not one of the winners. I am the proud father of Jon Jensen, and that makes me a winner too!

Posted by: Peter Jensen at November 7, 2006 8:32 PM

So what did the Mozy prototype do that no one else could figure out? It seems to me like the whole problem hinged on one piece of trivia, i.e., how to accept more connections than what the OS will allow by default.

Posted by: Dave at November 8, 2006 4:03 PM

You know, I wish I had written down what they said... IIRC it was some library that allowed low-level access. Unless I'm way off base, it let them bypass or increase the value of SOMAXCONN (the max number of simultaneous socket connections).

Even though our various prototypes specified a listen queue/backlog size of 10,000, I guess we hit the wall imposed by the kernel and any additional connections were dropped.

So I don't think they meant for it to hinge on trivia—they probably thought up the problem and their prototype just worked, so it wasn't ever a consideration.

Posted by: jon at November 8, 2006 4:49 PM
Post a comment









Remember personal info?