Modern Software Experience

2011-05-17

site search plus

Ancestry.com Logo

Ancestry.com Search

Ancestry.com naturally offers a site search that lets you search through the many databases it offers. Many users are still unhappy about the quality of the so-called new search that Ancestry introduced in 2008, but Ancestry is forging forward and has just introduced a new search feature: web search.

The announcement on the Ancestry.com blog notes that web search started in Ancestry Labs, the testing ground they announced late last year.

Ancestry used to offer site search, now they offer site search plus. It used to be a site search engine, now it is a web search engine as well.

Ancestry.com Web Search

For the Ancestry.com user, Web Search isn't a separate search engine, but a new feature of their already existing search engine. It used to be that their search engine would show you hits from Ancestry.com databases, and it still does. The difference is that it now shows hits from other databases, not owned by Ancestry.com, as well, even if those are on another site.
Ancestry used to offer site search, now they offer site search plus. It used to be a site search engine, now it is a web search engine as well.

International Biographical Collection

Although Ancestry.com kind-of announced Web Search last year already, when they introduced Ancestry Labs, and immediately stated some clear principles for Web Search, the formal introduction of Web Search has been received with mixed emotions.

In August of 2007, hot on the heels of the introduction of the spectacularly poor Family Tree Maker 2008, Ancestry.com decided to shoot itself in their other foot as well by introducing the International Biographical Collection (IBC).
This euphemistically named collection was nothing short of copyright theft. Ancestry.com has collected content from the web, and was selling access to this collection; Ancestry.com was charging their users for copyrighted content they did not own at all. Ancestry.com did not have any affiliation with the content owners, and like a good thief, Ancestry.com had not even bothered to inform the owners that they were selling access to the stolen content. Ancestry.com had often copied the copyright notice that it made it clear the content wasn't theirs to sell access to…

When you tried to view any records in the collection, even when you merely tried to follow the hard to find link to the original site, you were prompted to sign up and become a paid subscriber.
There was a huge outcry, several web site owners threatened to take legal steps, Ancestry management hurriedly backpedaled. Within days, they had pulled the collection down.

hesitant

All this would just be some past mistake irrelevant to Web Search, if Ancestry.com had not tried to excuse their actions with an attempt at revisionist history, specifically trying to spread the notion that the collection was really just a search engine…
Now that Ancestry.com has introduced a search engine, that has come back to haunt them; web masters are hesitant to have their content indexed by Ancestry.com.

What is odd is that there is no option to restrict your searches to either Ancestry.com records or web records. That should be added as soon as possible.

search engine

Ancestry.com Web Search is a search engine. That their Web Search does not appear to have a public front-end of its own, that you can only search index through Ancestry.com's site search, does not change that.

Web Search is a product of Ancestry.com's Web Crawl Team. The Web Crawl Team surely has some private Web Search interface, which they use to for development and testing, that shows Web Search records only, and it would not be hard to offer the same or a similar interface to users.
That Ancestry is not offering such a separate Web Search interface, but integrating Web Search into their existing search is not odd at all. Their site search is their all-in-one search, and it would be odd to keep the Web Search results out of it. Ancestry.com wants to offer its users just one search interface, and not bother them with multiple interfaces. This makes perfect user-friendly sense.
What is odd is that there is no option to restrict your searches to either Ancestry.com records or web records. That should be added as soon as possible.

web records

Web Search contains web records, with emphasis on records; Web Search is about indexing record collections. There are two obvious reasons for that. One is that records fit Ancestry.com's existing site search well. The other is that Ancestry is, ahem, somewhat hesitant about indexing genealogy sites or blogs. That does not mean they aren't considering the possibility, just that they are being careful, and decided to start with public record collections. Indexing blogs may or may not happen later.

principles

Back in October of last year, when Ancestry.com introduced Ancestry Labs and pre-announced Web Search, they already stated several principles for Web Search:

Across the web, the number of sites that are transcribing and publishing historical records is growing all of the time. Many of these are freely available. Person View helps you find links to sites that contain records matching your search.

However, in providing access to these, it’s very important to us that we are respectful to the publishers of these websites. We’d like to be completely transparent about how we intend to do this:

  • Our goal is to make it easy for our users to find websites that have records they may be interested in, and to make it easy for them to visit these websites.
  • To do this we will build an index of essential information in the record (e.g. the website link, the matching name, date, place), and make this available to our users through our search tools.
  • We will always strive to follow web industry standards for website crawling permissions. For example, some sites contain a robots.txt file telling search engines (such as Google) not to crawl that site.
  • We will put in place processes to remove the content from our search index if the website/content owner requests, with the goal of doing this as quickly as possible. We will clearly publish how to contact our team to do this (our contact us page has more details).
  • We may allow our users to save a reference to the record to their family trees, but whenever this information is later presented, we intend to give proper attribution, with a clear reference or link to the site from which the index data came.

The principles stated in the recent Web Search announcement are practically the same:

Across the web, the number of sites that are transcribing and publishing historical records is growing all of the time. Many of these are freely available. Person View helps you find links to sites that contain records matching your search.

However, in providing access to these, it’s very important to us that we are respectful to the publishers of these websites. We’d like to be completely transparent about how we intend to do this:

  • Our goal is to make it easy for our users to find websites that have records they may be interested in, and to make it easy for them to visit these websites.
  • To do this we will build an index of essential information in the record (e.g. the website link, the matching name, date, place), and make this available to our users through our search tools.
  • We will always strive to follow web industry standards for website crawling permissions. For example, some sites contain a robots.txt file telling search engines (such as Google) not to crawl that site.
  • We will put in place processes to remove the content from our search index if the website/content owner requests, with the goal of doing this as quickly as possible. We will clearly publish how to contact our team to do this (our contact us page has more details).
  • We may allow our users to save a reference to the record to their family trees, but whenever this information is later presented, we intend to give proper attribution, with a clear reference or link to the site from which the index data came.

The most important difference between these two quotes is that the recent blog post does not link to a generic contact page, but to the Web Search information page.

current web content

There isn't much content in Web Search yet. Ancestry started with several RootsWeb databases, which aren't owned by Ancestry.com, but are on the rootsweb.ancestry.com domain. and has just added Allen County, Indiana Deaths 1870-1920, That is the first database outside the ancestry.com domain.
You can search that database using Ancestry search, or immediately follow the link to the search page on the Allen County Public Library site itself.

Ancestry Web Search

Within Ancestry Search, web databases are prefixed with Web: to distinguish them from Ancestry.com databases. The screen shot show the Web: Allen County, Indiana Deaths 1870-1920 in Ancestry Search. Notice that the search is available without logging into Ancestry.com. From this page you can either continue searching on Ancestry.com, using the search fields on the left, or follow the link on the right to the collection on its own site.

Tip: There is no page that lists all the databases available through Web Search, but you do not need it either. You can find out what web databases are available right now from the Ancestry Card Catalog. Just type web: as the title to find all titles starting with web:.

Ancestry Web Search

Web Search bot

Some basic information on Web Search is provided on the Web Search information page. That page tells you that Ancestry.com's web spider respects the robots.txt protocol, but neither the blog post nor the Web Search information page tells you the name of their bot. The collection bot used to create the IBC was called MyFamilyBot. The search bot for Web Search bot is called ancestrybot.

Knowing the bot's name allows you to control the bot's behaviour through robots.txt commands. You could for example tell it that almost most directories are off-limits, the one exception being the directory containing your records collection.

free

Ancestry is adding the Web Search feature to better serve its current subscribers, but you do not have to be a subscriber to use it.
The best news about Ancestry Web Search is that it is really, truly free. You do not need to become a subscriber, you do not even need to register for an account and log in. The current Ancestry Search does not allow you to focus on web records, and there aren't many yet, but when you do find some, you can always access those records for free.

The major limitation of Web Search is that it indexes free record collections only. It isn't a web search, but a free web search; not a search of the web, but a search restricted to the free web only. Ancestry apparently isn't too eager have its own site search direct you to a pay-to-access collection hosted by some competitor.

links