When it comes to the modern Internet we tend to take our primary website for granted – and by primary, I mean Google. You know, the Google that you use to google just about everything, multiple times each day, sometimes without realizing it. You rely on Google to deliver you with the most relevant results for your information-seeking needs, whether they be a highly detailed article on SEO, a phone number, restaurants that are close to you, the most recent news on a subject… you expect Google to understand your query, understand the content that it has indexed, and be able to show you exactly what you’re seeking.

That’s a tall order. Taller still if you think about Google, a machine, having to “understand” the content it has indexed.

Yes, Google has made remarkable achievements in language processing, being able to isolate key phrases, types of content, blocks of content (like dates or names), but, to be frank, Google doesn’t always “see” the structure of the content and treat it ideally. It might not identify that string of numbers as a phone number, or that block of content as the address. It certainly won’t see that list of content as a recipe set of instructions.

That’s where Schema comes in – by providing definition and disambiguation.

Schema is the way by which a web developer can help Google with greater understanding of your content through providing extra definition within the page. Schema can turn a block of text, such as:

<div>
Joe Somethingorother
1234 Main Street
Somewhereville, ST 12345-9876
phone: (123) 123-4567
fax: (123)123-9876
</div>

into the understanding by Google that “this is an address block and these sections mean these things”:

<address itemscope itemtype=”http://schema.org/PostalAddress”>
Joe Somethingorother
<span itemprop=”streetAddress”>1234 Main Street</span>
<span itemprop=”addressLocality”>New York</span>, <span itemprop=”addressRegion”>New York</span> <span itemprop=”postalCode”>12345-9876</span>
phone: <span itemprop=”telephone”>(123) 123-4567</span>
fax: <span itemprop=”faxNumber”>(123) 123-9876</span>
</address>

The extra markup that you see around each bit of information in the address example is an aspect of Schema, using item properties of the specified item type to define each element. Google’s algorithms can look at a huge blob of text and, seeing this block marked up, extract this location information and make use of it.

This extra definition data also provides a unique function, that being disambiguation.

Let’s take the following examples:

  • The company is headquartered in Rome
  • The company is headquartered in Rome, New York
  • The company is headquartered in Rome, New York, and San Francisco
  • The company is headquartered in Rome and New York
  • The company is headquartered in New York (state)
  • The company is headquartered in New York (city)

Think about that Google algorithm looking at the content on the page and seeing a big blob of text. That algorithm sees “Rome” and “New York”, but as we know, those terms can refer to vastly different entities.

Where Schema helps in this matter is that it provides disambiguation – it helps provide the extra definition necessary to enable the algorithm to see “Rome”, “Rome, New York”, and “Rome and New York” and understand which entity is being referenced. It helps Google know where the company is headquartered so when someone runs a search for “Insurance company in Rome, NY” the company’s website will show if they are, in fact, headquartered in Rome NY as opposed to Rome.

While most people think of and treat Search Engine Optimization (SEO) as a means of tricking Google into ranking your content higher, Schema is instead about educating Google. Schema is your way of clarifying your content through definition and disambiguation for Google so that the search engine can understand it better and deliver your content more effectively to the people searching for it.