COMPUTERWORLD - Nov 8 - The system that powers eHarmony's matchmaking operation relies on four products: Oracle Corp.'s database, the open-source MySQL database, an open-source data-crunching app called Hadoop, and Netezza Corp. data warehousing appliances. Joseph Essas is VP of engineering and operations at eHarmony. 236 of its 20M members get married every day, the site claims. That's just one of "hundreds of metrics" that we "deeply care about," Essas said. eHarmony uses Oracle's database software to do much of the initial matching. But for hard-core data processing, the company relies on a speedy 50-node Hadoop cluster. Speed is important, because eHarmony updates the scores of its relationship matches whenever new members sign up, as well as when existing members update their profiles.
As matchmaking algorithms advance, the computations and metrics compound and become more complex. Moore's Law is still staying nicely ahead of these advances. A new conference later this year will more directly address the science of matchmaking algorithms. I'll be announcing it on OnlinePersonalsWatch.com soon.
Posted by: Mark Brooks | Nov 09, 2011 at 06:57 AM
Cool, but use another analogy. Moore's Law is about processor density and speed and the problem with online dating is that its still really difficult to come up with 5 people out of a million that are good for a specific person. The problem is the algorithm itself, or lack thereof. CPU speed is not the issue anymore.
Questions vs. algorithms vs. behavioral vs. some unknown, thats the inquiry right there.
Does IntroAnalytics or another other new-ish system make a dating site an extra $50 million a year? Of course not, its all general white papers and marketing at this time, although some co's are much farther along than others. some promising results shaping up, but still to early to tell.
We're a decade away from algorithms that work, at least. Settle in for the long haul, because while Match and OKC and eHArmony lead the charge, its still just ok matching, even with over 1,500 attributes in some systems.
I spent time with the Match algorithm guy and a few others in the space this year. I need to get those slides and share them, it will make you gasp what they are doing.
Believe me, anyone reading this doesn't have the background or capacity to grok what needs to happen with matching systems at the core level. Either you are building for Match, selling services like IntroAnalytics to tier-2 players, or talking a good game.
Most dating sites don't want to buy this stuff, its too expensive to pay for the API for most sites, and the top sites do it themselves.
And what the heck, you want to get rid of your customer faster? #businessfail
Maybe Amazon will build a dating analysis cluster like AWS/EC2 so we can all use it.
Posted by: Datinginsider | Nov 10, 2011 at 07:11 PM
2 tips:
1) Matching algorithms do not need to be complex or sophisticated, they need to be effective.
2) When you evaluate a matching algorithm for the Online Dating Industry, you FIRST need to estimate its resources consumption, the power calculation it requires.
Both eHarmony and Match will need to acquire a Cray, Fujitsu or Hitachi supercomputer OR over a 1,000 high speed servers arrangement due to the high volume of floating point operations you need to calculate similarity between quantized patterns using the 16PF5 test. (In this case compatibility is equal to similarity)
The new eHarmony's CEO had ordered eHarmony's Team to develop a new high precision compatibility matching algorithm by strict personality similarity using the NORMATIVE 16PF5 or similar test and launch it in a new site before February 2012.
Match's Team is testing a new compatibility matching algorithm using the 16PF5 test, to replace Chemistry, a 6+ years old and obsolete site based on an IPSATIVE personality model, which also has a low success rate and high level of false positives.
In the "first run" you not only need to calculate similarity between each and every women compared to men, but also men to men similarity to avoid women seeing men as all the same, and women to women similarity to avoid men seeing women as all the same.
How to estimate power calculation:
For N clients, needs [N * (N-1)] / 2 nearly equal to (N * N) /2 comparisons.
For a 100,000 daters database; 50,000 men and 50,000 women you need 5,000,000,000 of matricial comparisons.
For 1,000,000 clients needs over 500,000,000,000 of matricial comparisons.
For a 1,000,000 persons Database.
Number of Comparisons == (1,000,000)raisedto2 / 2 == 500,000,000,000 each one to calculate similarity between persons.
If the supercomputer (or high speed server arrangement) calculates similarity between quantized patterns as fast as 1,000,000 per second, it will require 500,000 seconds, nearly 139 hours!!! and that is only for the "first run"
I had challenged eHarmony and Match to offer Compatibility Distribution Curves for each and every dater, i.e. how compatible you are with the rest of the daters.
Breaking "the online dating sound barrier" is to achieve at least:
3 most compatible persons in a 100,000 persons database.
12 most compatible persons in a 1,000,000 persons database.
48 most compatible persons in a 10,000,000 persons database.
100 times better than Compatibility Matching Algorithms used by actual online dating sites!
The only way to achieve that is:
- using the 16PF5 normative personality test, available in different languages to assess personality of members, or a proprietary test with exactly the same traits of the 16PF5. The ensemble of the 16PF5 is: 10E16, big number as All World Population is nearly 7.0 * 10E9 (estimated OCT 2011)
- expressing compatibility with eight decimals, like The pattern 6.7.6.8.9.6.7.7.8.7.2.5.8.7.3.4 is 92.55033557% +/- 0.00000001% similar to the pattern 7.7.6.8.8.7.6.5.8.7.4.5.7.7.3.4
Using a quantized pattern comparison method (part of pattern recognition by cross-correlation) to calculate similarity between prospective mates.
That is the only way to revolutionize the Online Dating Industry.
All other proposals are .............. NOISE
Posted by: Fernando | Nov 10, 2011 at 07:44 PM