Modern Software Experience

2010-03-01

MyHeritage Statistics

Average Size Month

Last year, I did a series of articles on average database size. The average size of GenCircles databases was almost 2.000, the average size of WebTree was close to 12.500, the average FamilyTreeSeeker size was close to 2.500. When GENDEX shut down in 2004, the average was more than 2.500. The TNG Network had an average size close to 7.000. RootsWeb had an average of about 2.500 until its was merged with Ancestry World Tree, which lowered the average to rough 1.000 individuals. In short, many sites have an average in the thousands.

That already made it rather remarkable that MyHeritage’s average was firmly below 100. Only social genealogy sites like Geni have an even lower average.

What was even more remarkable is that while most averages are slowly increasing, MyHeritage’s average was decreasing.

member average versus tree average

The numbers that MyHeritage likes to report in press releases are the number of profiles and the number of members, and uncritical blogs tend to just copy that information. For example, a blog post from 2009 April states that MyHeritage has 31 million registered users, and 330 million profiles. That suggests an average size of about 10 profiles per tree - but it ain’t so.

MyHeritage likes to claim the number of registered users, but it is a safe bet that the majority registered users are inactive. Moreover, many of the users that are active never created a tree, but merely registered for access to an already existing tree. The actual number of trees is a fraction of the number of registered users. The average size of those trees is closer to fifty than to ten.

published numbers

MyHeritage publishes site statistics on their homepage. They initially published the number of members, the number of profiles and the number of trees. Nowadays, they publish the number of profiles, the number of trees and the number of images.

The removal of the number of members from the published statistics is a silent acknowledgement that it is a number that doesn’t mean much. The number of active members is more interesting anyway, and likely to be much lower.

A site statistic that competitors would like to know is the even smaller number of paying members. We’d like to know that too, as it is an indication of the number of trees larger than the free limit, but we have to do without that information.

limit

To understand why the MyHeritage average size has been decreasing, while other averages are increasing, we need to put that its data into perspective.

MyHeritage has a limit for free use, and as MyHeritage lowering limit again points out, that limit has been lowered a few times. Limited Genealogy points out what the negative effects of such a limit are.

MyHeritage also has a strategy of growth through acquisition, a strategy which might not be necessary if they did not have that limit as constant brake on their growth.

events

dateevent
2007-08-22MyHeritage adds Pearl Street Software GenCircles
2008-09-22MyHeritage adds Kindo
2008-12-20MyHeritage lowers free limit from 1000 to 500
2009-08-01MyHeritage lowers free limit from 500 to 250
2010-02-03MyHeritage adds Verwandt
2010-02-08MyHeritage adds Zooof

MyHeritage growing through Acquisitions discusses these acquisitions. One fact that emerged from that overview is that MyHeritage has a habit of acquiring a company and transferring the data before announcing the acquisition - even when that violates terms and conditions.

 We do not know when the different companies where acquired, but we do know when the acquisitions were announced. For analysis of the site statistics, we do not really care about the acquisition date anyway, we only want to know when the acquired data was integrated into the MyHeritage site.

data

dateprofiles family treesaverage
2008-05-02229.094.0401.886.444121,442
2008-06-22236.240.1742.678.29288,206
2008-09-18263.439.6573.898.91767,567
2008-11-27285.677.4634.883.11558.503
2009-08-01368.935.3097.635.33248,383
2009-10-04378.231.2907.877.09348,017
    
2009-10-05402 M8 M50,25
2010-02-02434 M9 M48,22
2010-02-03536 M13 M41,23

MyHeritage publishes numbers on their homepage, so ideally, we would have exact numbers for these dates, but for various reasons, we do not.

Old numbers for many sites can be found in the Internet Archive Wayback Machine, but few of the archived MyHeritage home pages contains those numbers. The Wayback machine may have recent data, but it has not been indexed yet. The data that has been indexed is more than 1½ years old.

A second issue is that MyHeritage started truncating the numbers to multiple of millions from 2009 Oct 15 onward; with less exact numbers, it is a bit harder to spot an upward or downward trend and be sure of it. The number of profiles increases by a million every few days, but because of the truncation, the number of users appears to be constant for months on end.

GenCircles

The numbers in the table show a dramatically decreasing average. MyHeritage had a low average when it added GenCircles. The addition of GenCircles gave its average tree size a significant one-time boost. After that boost, the average started to trend back to its natural value, and that is what we are seeing.

It is not hard to give an estimate of that natural value; we just take the same numbers as in the table above, but minus the numbers for the GenCircles acquisition. Problem is that MyHeritage does not seem to have posted those numbers.

Pearl Street claimed 135 M profiles and 160.000 registered users when the site was put up for sale, an average of less than 850 profiles per tree.
Last year, GenCircles contained more than 100.000 databases with almost 200.000.000 profiles in total, thus averaging roughly 2.000 each. The disparity between those numbers is easy to explain: not all registered users uploaded a database. We would have to guestimate the number of the databases those 135 M profiles were contained in. We would also have guess how many million profiles there were at the time of the acquisition.

guestimate

dateprofiles family treesaverage
2008-05-02229.094.0401.886.444 
2008-06-22236.240.1742.678.292 
difference7.146.134791.8489,024

An easier and more direct guestimate can be obtained by looking at the difference between subsequent totals.
Despite an influx of GenCircles users with larger trees, the average size of trees submitted to MyHeritage between 2008 May 2 and June 22 was just nine profiles. That then was MyHeritage’s natural average at the time.

That average is considerably lower than that for Ancestry Member Trees; Ancestry.com has claimed 1,1 billion profiles in 11 million Ancestry Member Trees, which averages to just 100 profiles per tree.

The Ancestry Member Trees average is already less than 10 % of the average of many other sites, and MyHeritage’s natural average is only 10 % of that, just ten profiles per tree?

Well, I’ve calculated averages for other periods, and they do go as high as 20 or 30, sometimes 60 or 80, but rarely above 100. The MyHeritage upload average even goes as low as just 3,4 profiles for more than five thousand trees added on one day.

note: average

The straightforward average for a period as calculated here does not exactly equal to average upload size for that period. The increase in profiles over that period includes expansion of existing trees (and perhaps a few deletions), so the actual upload average is likely to be bit less. The calculated number is used here as a reasonable indicator of the actual upload size average.

For several other sites that do not limit the upload size, the calculated number is of a different magnitude: not a few dozen, but a few thousand.

The basic explanation for MyHeritage’s low average is simple: most organisations welcome all trees of any size without conditions, but MyHeritage has been throwing up a pay wall for all but small trees, has deliberately focussed on collecting lots of very small ones.

The acquisition of GenCircles boosted MyHeritage’s average way higher, but did not change MyHeritage’s growth rates, so the average immediately started dropping back down towards its natural level.

Kindo

On 2008 Sep 22, MyHeritage added Kindo. I do not know how large Kindo was at the time, but it probably wasn’t big.

The Wayback Machine does not show data for 2008 September yet. I do have some numbers for 2009 Sep 18, a few days before Kindo was added. On that day, the average tree size was down to 67,576.

1000 to 500

dateprofiles family treesaverage
2008-09-18263.439.6573.898.91767,657
2008-11-27285.677.4634.883.11558,503
difference22.237.806984.19822,595

On 2008 Dec 20, MyHeritage lowered the free limit from 1000 to 500.
It is reasonable to expect this pay barrier for small databases, this focus on tiny databases, to translate in a yet smaller average upload size than before, and thus a steeper decrease of the average size.

There aren’t many numbers available. I do have numbers for 2008 Nov 27. On that day, the average tree size was down to 58,503.

The average upload size between 2008 Sep 18 and 2008 Nov 27 is 22,595 profiles. However, that average includes the data from Kindo, so this number does not really represent the MyHeritage upload average.

500 to 250

dateprofiles family treesaverage
2008-11-27285.677.4634.883.11558,503
2009-08-01368.935.3097.635.33248,383
difference83.257.8462.752.21730,251

On 2009 Aug 1, MyHeritage lowered its free limit again, this time from 500 to 250. This increased focus on tiny databases should result in an even smaller average upload size, and thus a yet steeper decrease of the average size.

On 2009 Aug 1, the average size was down to 48,383. The average upload size between 2008 Nov 27 and 2009 Aug 1 is 30,251.

That upload average is actually more, not less, than the upload size between 2008 Sep 18 and 2008 Nov 27, when the free limit was still 1000. The influence of the Kindo data is probably not big enough to explain this. The explanation may be in improvements made to the website in response the success of Geni.com.

dateprofiles family treesaverage
2009-08-01368.935.3097.635.33248,383
2010-02-02434.000.0009.000.00048,222
difference65.064.6911.364.66847,678

The next major influence on the MyHeritage average was the addition of all Verwandt data on 2010 Feb 3. By that time, MyHeritage had started truncating the published numbers to whole millions, so the average is less exact than before.

Between 2009 Aug 1 and the addition of the Verwandt data on 2010 Feb 2, the average upload size is 47,678.

Again, the average upload size has increased, despite the fact that the free limit has decreased. That seems weird, but the explanation is very straightforward: those number ain’t right - at least not all of them.

whole millions

dateprofiles family treesaverage
2009-10-04378.231.2907.877.09348,017
2009-10-05402.000.0008.000.00050,250
difference23.768.710122.907193.632

On 2009 Oct 14, MyHeritage was still posting exact numbers.  On 2009 Oct 15, MyHeritage stopped showing exact numbers, and started showing whole millions only. The numbers are presumably truncated (always rounded down), to avoid misleading claims.

We can only guess at the reason for this change. Perhaps the MyHeritage management thinks that exact numbers are confusing to their users. Perhaps they were unhappy that it was so easy to accurately track their still decreasing average.

On 2009 Oct 14, the MyHeritage average was 48,017 and slowly decreasing from 48,5 in mid October. Yet the next day, the average had jumped up to 50,250.

There were roughly 378 million profiles and it was increasing by one million every few days, yet the next day, there suddenly were 402 million profiles.

There were 7.877.093 trees, and that count was increasing by roughly seven thousand each day, yet the next day, there were 8 million trees.

before and after

dateprofiles family treesaverage
2009-08-01368.935.3097.635.33248,383
2009-10-04378.231.2907.877.09348,017
difference9.295.981241,76138,451
dateprofiles family treesaverage
2009-10-05402.000.0008.000.00050,250
2010-02-02434.000.0009.000.00048,222
difference32.000.0001.000.00032,000

Whatever the reason for the jump, when the data shows such a strong discontinuity, it does not make much sense to treat it as if it were continuous. That’s why I calculated two other averages; the average upload size before the discontinuity and the average upload size after the discontinuity.

Because of the truncation, the average after is not very accurate, but even the average before is higher than the average for the limit 500 period.

lower limit effect

If the lower free upload limit results in a lower average upload size (and surely it should, because only tiny files can be uploaded for free), it does not show in these calculations.

One possible explanation is that MyHeritage’s remarkably tiny upload average is so much lower than the upload limit can explain already, that lowering the upload limit hardly matters until you lower it lot more.

Perhaps the negative effect is there, but overwhelmed by other (positive) effects. There are many factors that influence the average upload size, and with numbers so small, it is not hard for other factors to dominate. Or perhaps it is better to say that is hard for other factors to not dominate.

Then again, maybe there is no explanation needed at all. I have noticed - for the MyHeritage site as well as other sites - that, even when there are thousands of uploads a day, the calculated upload average still tends to fluctuate strongly. In statistical speak: there may be an average, but the standard deviation is large. It will be hard to notice the effect of lowering the upload limit if that effect is smaller than the already existing standard deviation.

24 million

dateprofiles family treesaverage
2009-10-04378.231.2907.877.09348,017
2009-10-05402.000.0008.000.00050,250
difference23.768.710122.907193.632

When MyHeritage started truncating the numbers, those numbers were not very accurate anymore, but the first set of truncated numbers was rather surprising.

 According to its own published numbers, MyHeritage suddenly added close to 24 million profiles in more than hundred thousand trees, with a uncharacteristically high average of close to 200 profiles per tree.

Some possible explanations for this sudden jump come to mind. One is that MyHeritage fouled up and some of these numbers are plain wrong. Another is that the difference represents an as-yet unannounced acquisition.

links