Hi theory-edge,
First, thanks for giving us an opportunity to chat with you all.
Since there are three of us, we'll try to give a composite answer to
your questions...
> a. can you give a general idea of what your algorithm is based on eg
> SVD etc
We can't reveal too many secrets right now with the deadline for the
Progress Prize so close, but we have implemented and tried to
incorporate all of the popular algorithms you hear mentioned on the
forums: SVD, KNN, etc. I think it's pretty safe to say that most of
the leading teams are using many, many different approaches. We try
pretty much any idea that we can think of or learn about.
> b. how much time have you spent on it? did you hear the hungarian team
> (as I recall re NYT article) estimate they have spent 8 hrs per day
> since the beginning of the contest?
That has varied for all of us individually. When we first started, we
were meeting a few nights a week to plan out the basic code just for
handling the dataset, and then coding different parts individually. As
we started getting more and more ideas, things ramped up quite a bit.
The amount of time we spend definitely comes in waves. There have been
weeks when we would spend 5 or 6 hours every day working on it. There
have also been weeks (and months) of dry spells where we wouldn't work
on Netflix at all.
> c. what is your feeling on competing against the worlds greatest phds
> on this problem & coming up 2nd?
David W: It's very cool, but I also think it's important to point out
that a lot of what we've done would have been impossible without the
many papers and textbooks published by lots of the other top
contestants. It's not like the three of us have independently invented
a super algorithm that beats the best of what has been invented so
far; rather, we've spent a lot of time reading existing papers, trying
to implement them, and eventually understanding them well enough to
tweak and optimize them for our particular implementation.
David L: Yeah, I agree pretty much with what David Weiss said. A lot
of the work published by other teams is really great and our solution
uses a lot of them, with of course several of our own secret ideas.
Lester: It's also clear that some teams want to focus on a particular
algorithm or family of algorithms, to demonstrate how they stack up to
other existing machine learning approaches on this massive data set.
We have no such single-algorithm loyalty -- we love em all.
> d. are you guys working right now? have you all graduated?
We all graduated from Princeton Univ. in June, and now...
Lester: I've just started as a grad student at UC Berkeley in the EECS
(Electrical Engineering and Computer Science) department. The contest
has encouraged me to continue exploring machine learning.
David W: I'm back at Princeton working as an RA with my neuroscience
advisor, Ken Norman. (I majored in Computer Science, but with a
certificate in Neuroscience). I'm applying to grad school now. During
my morning commute (I live in Philadelphia), I'm also working with my
older brother on his startup company ( www.medforward.com).
David L: I just started a job in New York trading interest rate
derivatives and am pretty excited about that.
>
> e. do you have home pages anywhere?
David W: Not yet...but I'm working on one now.
Lester: Not yet -- that's a good idea though.
> f. do you think the netflix contest is winnable? any estimate on when
> it will be awarded?
Lester: I definitely think the Grand Prize threshold is attainable --
I predict that someone will reach it in another year or two.
David W: I also agree. Every time we think we've hit the ceiling,
another idea/optimization comes along, and we're back in the race
again. It could get to be a pretty agonizing crawl before the end,
though.
David L: I'll have to differ and say I don't think a 10% improvement
is attainable, but I still have hope. There is only so much
information that can be crunched out of this dataset.
> g. any complaints about the contest?
>
David W: Not really -- I'm just surprised that Netflix hasn't been
more communicative with the contestants.
David L: I think Netflix has been doing a great job and that the
contest is extremely well designed and well run. My only complaint is
with that "added" data they posted (the results from the KDD cup) that
gives you a few thousand more ratings and the number of ratings each
movie seen in 2006. Probably running all our algorithms on those
additional ratings won't help that much, but it's definitely a huge
pain. I don't think they should have changed the avaliable data after
the contest started.
Lester: The contest was very well designed -- my only minor complaint
was that late release of extra data.
> h. any advice?
To do well in the contest, I think you need to read a lot of papers
and implement anything you come across. The number of ideas we have
tried that didn't help is pretty ridiculous.
>
> i. do you have background in this area? are you students at the top of
> your class, or average? won awards etc?
We've taken a few courses in AI, data-mining, etc. but we're all
pretty new to the machine learning realm.
We're learning more every day.
David W: Aside from one or two graduate
courses, we don't have any particular background. I didn't even know
the term "collaborative filtering" when we first started working on
it. However, we have all won some sort of awards at some time or
another.