Search Engine, Semantic Web, RDF, Resource Description Framework, Semantic Web Search, RDF Search, RSS, FOAF, RDFS, OWL, RSS Search, FOAF Search

Semantic Web Search A Search Engine for the Semantic Web (BETA) Powered by Intellidimension RDF Gateway
Semantic Web Search - A Next Generation Internet Search Engine
This document provides a high level overview of a next generation Internet search engine called Semantic Web Search. The contents of this document are targeted at reader that is familiar with basic Internet technologies. It presents Semantic Web Search in the context of Internet searching on the current Web versus the emerging Semantic Web. An emphasis is placed on highlighting the impact of Semantic Web Search on way people and computers find and use information on the Internet.
Introduction
Semantic Web Search is a new breed of Internet search engine that is used by computers to help people gather the information they need for work or home. Semantic Web Search represents a revolutionary improvement over search engines on the Web today because it can provide more accurate search results with less human intervention. By providing this service to computers, Semantic Web Search will power a new generation of intelligent applications that increase the productivity of people through fast and accurate information retrieval. The enabling technology behind Semantic Web Search is a new extension to the current Web called the Semantic Web.
The Semantic Web
The Semantic Web is a set of Web standards that provide a common framework to allow computers to understand the meaning of information published on the Web. On the Semantic Web information is described in terms of well-defined vocabularies using a simple markup language called the Resource Description Framework (RDF). Information described using RDF (a RDF model) can be visualized as a graph of the properties of the people, places and things it describes.
Figure 1 - On the current Web information is described for people using unstructured text. On the Semantic Web information is described for computer using RDF. Both use well-defined vocabularies to communicate meaning; however, computers can only understand the rigid structure of RDF vocabularies.
RDF is based on the popular Extensible Markup Language (XML) which has gained wide spread adoption for interchanging data between computing systems on the Web. Although XML provides the basis for computers to share numbers, dates, times, currencies, and blocks of text it is RDF that gives the data meaning. The Semantic Web builds upon current Web standards, including core network standards such as Hypertext Transfer Protocol (HTTP), allowing it to operate seamlessly with existing Web infrastructure.
Searching the Semantic Web
Each document on the Semantic Web contains a RDF model that can be thought of as a discrete database. Information in one document can reference information in another constructing a massive RDF model that is distributed over the Internet. Semantic Web Search acts as index into this distributed RDF model to help computers quickly locate the document(s) that contain the information they need.
Figure 2 - The Semantic Web is a massively distributed database on the Internet. Each RDF document contains a small piece of a much larger RDF model that forms the Semantic Web. Semantic Web Search indexes the locations of the documents based on the information they contain.
Semantic Web Search crawls the Semantic Web and indexes RDF documents based on the information they contain. It creates its own distributed RDF model that describes the contents and location of all documents on the Semantic Web. Since the information is described using RDF vocabularies that have well-defined meaning to computers, as well as people, search conditions can be precisely described to Semantic Web Search using these vocabularies. Semantic Web Search translates a search condition into an index lookup into its RDF model. It returns the locations of the documents on the Semantic Web where the information described by the search condition exists.
Figure 3 - Search conditions are precisely described in terms of RDF models and vocabularies when using Semantic Web Search. In this example RDF is used to describe the search condition "What is the name of the 42nd US president?". Semantic Web Search returns the location of the document on the Semantic Web (a URL) that contains the name of the 42nd US president, "William Jefferson Clinton".
Searching the Current Web
Today's Internet search engines crawl and index documents on the current Web. Information on the current Web is mostly described using unstructured text that is marked up using the Hypertext Markup Language (HTML) to additionally describe its visual presentation. Current Web search engines index these documents using a variety of techniques that are primarily based on the count and proximity of words contained in the document. This limits search conditions to simple keyword expressions based on the existence or absence of a word or phrase. The keyword expressions are processed by the search engine and produce a list of documents that potentially contain the information needed. The location of the actual information that is sought often requires the person to interpret the meaning of each document and extract the relevant information.
Figure 4 - Keyword search conditions used with current Web search engines are often ambiguous. Search results can contain numerous irrelevant documents requiring a person to spend a significant amount of time to complete a search. In this example the keyword "address" is ambiguous causing the search engine to return both speeches as well as postal addresses.
In many cases Web search engines perform well producing a sorted list of relevant documents enabling people to quickly locate the information they need. However, there are also many cases when the keywords used to locate information are ambiguous and therefore search engine results contain a large number of irrelevant documents. Consequently, most of the burden of finding the information is placed on the person.
Building the Semantic Web
The Semantic Web, like the Web, is a decentralized network that allows people and organizations to openly share information. On the Web, people use the vocabularies of their languages to describe information. Similarly on the Semantic Web, RDF vocabularies are used by computers to describe information. With both, it is the popularity of vocabularies that ultimately lead to their acceptance as a common means for sharing and understanding information.
RDF Vocabulary Use
Creative Commons (CC) Content license descriptions
Dublin Core (DC) Document metadata
Friend Of A Friend (FOAF) Social networks
RDF Calendar Calendar and schedule descriptions
RDF Site Summary (RSS) News syndication
Web Ontology Language (OWL),
RDF Schema (RDFS)
Data model descriptions
Table 1 - A few examples of common RDF vocabularies (schemas) that are in use on the Semantic Web today. With RDF information can be described using terms from multiple vocabularies.
The amount of information on the Semantic Web continues to grow at an increasing pace. Much of this growth can be attributed to the popularity of new user applications and infrastructure products that take advantage of the ease in which information can be shared on the Semantic Web.
However, the growth of the Semantic Web does not solely depend on new content being created for it. Existing Web content can be incorporated into the Semantic Web without rewriting current Web pages for computers using RDF. In many cases, for example an on-line catalog, the information in the Web pages is stored in a computer database with a well-defined structure. In this case, it is just a matter of providing a well-defined meaning to its structure using RDF in order to publish its information on the Semantic Web. Semantic integration products such as Intellidimension RDF Gateway are ideal for this task.
For other Web pages that are purely unstructured text (like HTML), current search engine technology can play an important role. A variety of techniques have been reasonably successful in allowing computers to extract meaning from unstructured text. The information extracted from these Web pages can then be described using RDF vocabularies and published on the Semantic Web.
Semantic Web Search acts as a catalyst for Semantic Web growth by creating a demand for information on the Semantic Web. As new intelligent applications use Semantic Web Search to precisely locate and gather information to improve peoples daily lives then inevitably the Semantic Web will be used to publish information for this new audience.
Intelligent Software Agents
Most people perform tasks on the current Web using a Web browser. On the Semantic Web people rely on intelligent software agents to perform tasks on their behalf. Since the Semantic Web is designed for computers, intelligent software agents can be used to automate many of the routine activities of people on the Internet. For example, gathering all the latest news on specific product development each morning. Intelligent software agents have the ability to acquire knowledge as they interact with the Semantic Web allowing them improve their performance over time. Their knowledge is stored as a RDF model. A RDF model is an extensible graph of interconnected pieces of information in which new information can be seamlessly attached. A software agent's intelligence comes from its ability to perform deductive reasoning (inference) on a RDF model. This feature of RDF allows a software agent to infer new information based on the current information it has thereby expanding its knowledge.
Figure 5 - Intelligent software agents gather knowledge as they use the Semantic Web. Knowledge is stored by the software agent as a RDF model. Deductive reasoning (inference) allows intelligent agents to infer information where none exists. In this example the new information allows the intelligent software agent to infer that a Web browser and word processor can be categorized as either a computer program or software through deductive reasoning.
Not all intelligent software agents need to be complex and capable of deductive reasoning. Significant benefit can be provided to people by the ease in which an agent can precisely locate, gather and report information.
Intelligent software agents use Semantic Web Search to locate information on the Semantic Web that they need to perform a task. Since both Semantic Web Search and the intelligent software agents understand RDF models and vocabularies they have a common method for precisely describing search conditions. Semantic Web Search returns to the intelligent software agents the locations on the Semantic Web where the information they are looking for can be found. The intelligent software agents then gather the information that is described using RDF to complete their task or perform another search.
A platform for creating these intelligent agents exists today. Intellidimension RDF Gateway is a Web client with a built-in RDF database that is capable of deductive reasoning. Software agents created with RDF Gateway have all the features necessary to intelligently perform tasks on the Semantic Web.
A Final Word
Semantic Web Search makes automated and accurate information retrieval on the Internet both possible and practical. The Semantic Web provides the enabling technology with its precise descriptions of information that allow computers to intelligently perform tasks done by people today. The Semantic Web is not a futuristic technology, it is here today and ready to improve the productivity of people by revolutionizing the way computers and people work together.
Copyright 2004-2007 Intellidimension  -  Report a Problem  -  Terms of Service