Semantic Web Search - A Next Generation Internet Search Engine
This document provides a high level overview of a next generation Internet search
engine called Semantic Web Search. The contents of this document are targeted at
reader that is familiar with basic Internet technologies. It presents Semantic
Web Search in the context of Internet searching on the current Web versus the
emerging Semantic Web. An emphasis is placed on highlighting the impact of Semantic
Web Search on way people and computers find and use information on the Internet.
Introduction
Semantic Web Search is a new breed of Internet search engine that is used by
computers to help people gather the information they need for work or
home. Semantic Web Search represents a revolutionary improvement over search
engines on the Web today because it can provide more accurate search results
with less human intervention. By providing this service to computers, Semantic
Web Search will power a new generation of intelligent applications that increase
the productivity of people through fast and accurate information retrieval. The
enabling technology behind Semantic Web Search is a new extension to the current
Web called the Semantic Web.
The Semantic Web
The Semantic Web is a set of Web standards that provide a common framework to allow
computers to understand the meaning of information published on the Web. On the
Semantic Web information is described in terms of well-defined vocabularies using
a simple markup language called the Resource Description Framework (RDF). Information
described using RDF (a RDF model) can be visualized as a graph of the properties of the
people, places and things it describes.
Figure 1 - On the current Web information is described for people using unstructured text.
On the Semantic Web information is described for computer using RDF. Both use well-defined
vocabularies to communicate meaning; however, computers can only understand the rigid
structure of RDF vocabularies.
RDF is based on the popular Extensible Markup Language (XML) which
has gained wide spread adoption for interchanging data between
computing systems on the Web. Although XML provides the basis for
computers to share numbers, dates, times, currencies, and blocks of text it is
RDF that gives the data meaning. The Semantic Web builds upon current Web standards,
including core network standards such as Hypertext Transfer Protocol (HTTP), allowing
it to operate seamlessly with existing Web infrastructure.
Searching the Semantic Web
Each document on the Semantic Web contains a RDF model that can be thought of as a discrete database.
Information in one document can reference information in another constructing a massive
RDF model that is distributed over the Internet. Semantic Web Search acts as index into
this distributed RDF model to help computers quickly locate the document(s) that contain
the information they need.
Figure 2 - The Semantic Web is a massively distributed database on the Internet. Each RDF
document contains a small piece of a much larger RDF model that forms the Semantic Web.
Semantic Web Search indexes the locations of the documents based on the information they contain.
Semantic Web Search crawls the Semantic Web and indexes RDF documents based on the information they
contain. It creates its own distributed RDF model that describes the contents and location of all
documents on the Semantic Web. Since the information is described using RDF vocabularies that
have well-defined meaning to computers, as well as people, search conditions can be precisely
described to Semantic Web Search using these vocabularies. Semantic Web Search translates a search
condition into an index lookup into its RDF model. It returns the locations of the documents on
the Semantic Web where the information described by the search condition exists.
Figure 3 - Search conditions are precisely described in terms of RDF models and vocabularies when
using Semantic Web Search. In this example RDF is used to describe the search condition
"What is the name of the 42
nd US president?". Semantic Web Search returns the location of the
document on the Semantic Web (a URL) that contains the name of the 42
nd US president,
"William Jefferson Clinton".
Searching the Current Web
Today's Internet search engines crawl and index documents on the current Web. Information on the
current Web is mostly described using unstructured text that is marked up using the Hypertext
Markup Language (HTML) to additionally describe its visual presentation. Current Web search
engines index these documents using a variety of techniques that are primarily based on the count
and proximity of words contained in the document. This limits search conditions to simple keyword
expressions based on the existence or absence of a word or phrase. The keyword expressions are
processed by the search engine and produce a list of documents that potentially contain the
information needed. The location of the actual information that is sought often requires the
person to interpret the meaning of each document and extract the relevant information.
Figure 4 - Keyword search conditions used with current Web search engines are often ambiguous. Search
results can contain numerous irrelevant documents requiring a person to spend a significant amount of
time to complete a search. In this example the keyword "address" is ambiguous causing the search
engine to return both speeches as well as postal addresses.
In many cases Web search engines perform well producing a sorted list of relevant documents enabling
people to quickly locate the information they need. However, there are also many cases when the
keywords used to locate information are ambiguous and therefore search engine results contain a
large number of irrelevant documents. Consequently, most of the burden of finding the information
is placed on the person.
Building the Semantic Web
The Semantic Web, like the Web, is a decentralized network that allows people and organizations to
openly share information. On the Web, people use the vocabularies of their languages to describe
information. Similarly on the Semantic Web, RDF vocabularies are used by computers to describe
information. With both, it is the popularity of vocabularies that ultimately lead to their acceptance
as a common means for sharing and understanding information.
| RDF Vocabulary |
Use |
| Creative Commons (CC) |
Content license descriptions |
| Dublin Core (DC) |
Document metadata |
| Friend Of A Friend (FOAF) |
Social networks |
| RDF Calendar |
Calendar and schedule descriptions |
| RDF Site Summary (RSS) |
News syndication |
Web Ontology Language (OWL), RDF Schema (RDFS) |
Data model descriptions |
Table 1 - A few examples of common RDF vocabularies (schemas) that are in use on the
Semantic Web today. With RDF information can be described using terms from multiple vocabularies.
The amount of information on the Semantic Web continues to grow at an increasing pace. Much of this
growth can be attributed to the popularity of new user applications and infrastructure products
that take advantage of the ease in which information can be shared on the Semantic Web.
However, the growth of the Semantic Web does not solely depend on new content being created for it.
Existing Web content can be incorporated into the Semantic Web without rewriting current Web
pages for computers using RDF. In many cases, for example an on-line catalog, the information
in the Web pages is stored in a computer database with a well-defined structure. In this case,
it is just a matter of providing a well-defined meaning to its structure using RDF in order to
publish its information on the Semantic Web. Semantic integration products such as Intellidimension
RDF Gateway are ideal for this task.
For other Web pages that are purely unstructured text (like HTML), current search engine technology can
play an important role. A variety of techniques have been reasonably successful in allowing computers
to extract meaning from unstructured text. The information extracted from these Web pages can then be
described using RDF vocabularies and published on the Semantic Web.
Semantic Web Search acts as a catalyst for Semantic Web growth by creating a demand for
information on the Semantic Web. As new intelligent applications use Semantic Web Search
to precisely locate and gather information to improve peoples daily lives then inevitably
the Semantic Web will be used to publish information for this new audience.
Intelligent Software Agents
Most people perform tasks on the current Web using a Web browser. On the Semantic Web people rely
on intelligent software agents to perform tasks on their behalf. Since the Semantic Web is designed
for computers, intelligent software agents can be used to automate many of the routine activities of
people on the Internet. For example, gathering all the latest news on specific product development
each morning. Intelligent software agents have the ability to acquire knowledge as they interact
with the Semantic Web allowing them improve their performance over time. Their knowledge is stored
as a RDF model. A RDF model is an extensible graph of interconnected pieces of information in which
new information can be seamlessly attached. A software agent's intelligence comes from its ability to
perform deductive reasoning (inference) on a RDF model. This feature of RDF allows a software agent
to infer new information based on the current information it has thereby expanding its knowledge.
Figure 5 - Intelligent software agents gather knowledge as they use the Semantic Web. Knowledge is
stored by the software agent as a RDF model. Deductive reasoning (inference) allows intelligent
agents to infer information where none exists. In this example the new information allows the
intelligent software agent to infer that a Web browser and word processor can be categorized as
either a computer program or software through deductive reasoning.
Not all intelligent software agents need to be complex and capable of deductive reasoning. Significant
benefit can be provided to people by the ease in which an agent can precisely locate, gather and
report information.
Intelligent software agents use Semantic Web Search to locate information on the Semantic Web that
they need to perform a task. Since both Semantic Web Search and the intelligent software agents
understand RDF models and vocabularies they have a common method for precisely describing search
conditions. Semantic Web Search returns to the intelligent software agents the locations on the
Semantic Web where the information they are looking for can be found. The intelligent software
agents then gather the information that is described using RDF to complete their task or perform
another search.
A platform for creating these intelligent agents exists today. Intellidimension RDF Gateway is a
Web client with a built-in RDF database that is capable of deductive reasoning. Software agents
created with RDF Gateway have all the features necessary to intelligently perform tasks on the
Semantic Web.
A Final Word
Semantic Web Search makes automated and accurate information retrieval on the Internet both possible
and practical. The Semantic Web provides the enabling technology with its precise descriptions of
information that allow computers to intelligently perform tasks done by people today. The Semantic
Web is not a futuristic technology, it is here today and ready to improve the productivity of people
by revolutionizing the way computers and people work together.