Prudens Information Resources for the Internet


The Semantic Web

Knowledge Representation & Search

A Prudens E-Report

Why a Semantic Web? | Knowledge Representation Problems | The Role of Ontologies
Machine Readable Ontologies | Semantic Web Services | The Semantic Grid
Appropriate Use of the Semantic Web | Conclusions



Why a Semantic Web?

The semantic web1 addresses the current short-comings of the web as well as its limitations for future development. The semantic web emphasizes the correct interpretation and analysis of data stored throughout the Internet. The semantic web will allow human language commands to control web operations, while lessening the requirement for human intervention. Web interfaces will understand user's intentions and tastes, while at the same time correctly interpreting, analyzing and communicating data with other machines. The semantic web will become a web where machines can interpret data, thus freeing humans for other tasks.


Smart Data

If one downloaded the data stored by a database application, it would be difficult to interpret it - to know what it represented and how big it was. The data can be interpreted only through the database program. But in the semantic web, and in other knowledge representation applications, information about the data (by definition, metadata) is stored with the data. This is why the data is sometimes called smart data. Available information about the data allows it to be used correctly by any programs with access to it. Further, it can be interpreted by a machine (e.g. computer program) as well as by a human.

In order for a machine to specifically understand what a user is saying, a machine readable list, or ontology, of similar terms must be examined for the correct meaning. One approach is to train the browser, or other input device, over time as the user establishes a profile of interpretations and preferences. Another is to refer to an existing detailed definition of the term. In each case the data is linked to its detailed explanation. The browser must then be able to communicate these definitions to other machines that need to understand them.

Using a language such as Standard Generalized Markup Language (SGML) or Extensible Markup Language (XML), communicating machines recognize a common terminology as defined in a Data Type Definition (DTD) file or Extensible Markup Language Schema (Schema), which is shared on a real-time basis between the machines. However, many DTDs and schemata aren't as specific as required, except for routine tasks.

Requirements of the Semantic Web

In order to have a semantic web::

This will enable the development of new types of applications and allow many routine tasks to be automated.

According to the developers of the semantic web, these requirements will actually fix two problems found on the current web. The first is the problem of communicating the precise meaning of terms throughout the Web; the second is the problem with metadata.

Knowledge Representation Problems on the Web

The Problem of Precise Meaning

XML allows users to add arbitrary structure to their documents but says nothing about what the structures mean. This is done through a DTD or a schema. A DTD is used to define the data syntax used in SGML, or distributed XML applications. Since web publishers have found that DTDs are insufficient to define more than a grammatical outline, they are being replaced primarily by proprietary approaches.

XML schema were developed by the World Wide Web Consortium (W3C) in 2001 to replace DTDs. However the schema are proving to be complex and difficult to implement using the XML Schema Language. DTDs and schema will eventually be replaced by a new standard in order to improve the definition and interpretation of data syntax. The Document Schema Definition Language (DSDL) is currently under development to become this standard.

The Problem with Metadata

Metadata is encountered everyday when one searches for information. The yellow pages, the card catalog at the library, and a book index are information about data, or metadata. Metadata are obviously very useful. Otherwise, one would have to search through every entry in the phonebook, every book in the library, or each page in a book to find the desired information.

However, metadata systems are, in general, not interoperable - the metadata from one system isn't of much use in another. In the yellow pages, the search usually begins with the type of business, product or service. The card catalog search usually begins with the author, title, or subject. In the book index, one looks for specific words. Each metadata system is usually designed for a specific purpose, and contains different types of information. This is a problem, often not recognized, in developing categories of information, and in establishing a taxonomy of the types of information used in a metadata system.

The meaning of metadata may depend on the user. Some refer to metadata as machine readable information, while others use it to describe electronic resources. In libraries metadata is used to describe both digital and non-digital resources such as a books, photos, videos, and other forms of multimedia.

Machine readable metadata systems aren't designed to be interoperable. The metadata systems currently found on the Web are usually written as a DTD or schema, a necessary component for XML processing between sites. But these metadata systems aren't compatible with other metadata systems of other sites, which are comprised of different DTDs or schema. The inherent problem of establishing a successful metadata system is an example of the larger problem of designing a successful taxonomy. If it is to be used by others, it must be done openly and formally in order to get their support and participation.

Recognizing the problem, a group in information and library scientists and information users met to establish the Dublin Core Metadata Initiative system as a general metadata system, which describes a wide range of information and content sources on the Web. Because of its ease of use, the "Dublin Core" is being widely adopted on the web. Yet, it is criticized because it carries insufficient detail for use in scientific research, the web publishing industry, or in the archiving of electronic documents. Other metadata systems have been developed for various applications and it leads one to the conclusion that it will be impossible to develop a universal metadata system.

If a metadata system about a term is exhaustive, then it would form a taxonomy. For example, to find a complete set of information about the term "automobile", one could look at the:

Each of these metadata systems should contain every descriptive term about automobiles, and in theory should be equivalent, but they will also contain a lot of superfluous information. Therefore to reconcile different metadata systems one should create an ontology for each of the terms of interest - an exhaustive and exhausting task, if done by hand, but the sort of task that is ideal for computers.

The Role of Ontologies

Ontologies are key components of the semantic web. One part of an ontology is a taxonomy or an exhaustive list of terms/descriptors about a topic. This part of the ontology may also be called a catalog, or a schema of the terms/descriptors that comprise its domain. The second part of an ontology is comprised of inferences about the terms/descriptors that can be drawn from:

Most ontologies are listed in hierarchical form. Each ontology is presented as factual and objective, and is a form of knowledge representation. But ontologies remain subjective in the sense that both the area of interest and the language, or perspective, of the ontology must be shared by its creators and users.

Machine Readable Ontologies

Ontologies are made machine readable on the web using protocols or languages such as Resource Description Framework (RDF), OWL Web Ontology Language (OWL), and "DAML+OIL" Ontology Language. Each of these is a means of expressing structure and meaning, plus a markup language. Even in the short period since their introduction, RDF and OWL have already been used in social networking software, or Friend-of-a-Friend programs, as well as in Really Simple Syndication (RSS), and are considered forerunners in the use of semantic technology.

The basis of machine readable ontologies is the uniqueness of the URI for each descriptor. If terms from two ontologies have the same meaning or the same URI name, then the elements are combined into a single resource description, thus bridging the use of different XML DTDs or schemata, and allowing the interpretation and analysis of information from different sources. The establishment of a machine readable ontological system is aided by use of registries that show information about the elements from different data collections, including intended use and location.

In summary, machine-readable ontologies use mark-up languages to describe descriptors, terms or objects and their relationships to other descriptors, terms and objects, and:

Semantic Web Services

The underlying technologies of web services: UDDI, WSDL, and SOAP require human interaction. The semantic approach is based on the presentation of the services in a machine-readable format that enables software agents to automate these technologies, or future versions of these technologies, to:

At some point in the future, semantic web services will use encrypted messaging, and will also be able to prove that the service is trustworthy, that is authenticated, or vouched for by a trusted intermediary, as in federated identification.

Although semantic web services are not yet available, the required specifications and technologies are being developed at the W3C. In addition, private software research and development labs are beginning to develop semantic web service applications.

The Semantic Grid

The use of the semantic grid will follow the successful implementation of web services in grid computing. In an analogous manner, semantic web services will be provided on a semantic grid. Semantic web services should have an initial impact in data mining, search, and translation. It will allow a greater depth of analysis in many areas: notably in bioinformatics, in nanotechnology, and in personalization technologies as applied in e-Business. It is likely to increase the role of agents on the grid, and will probably increase software and infrastructure requirements, although to what extent isn't clear at this time.

One underlying problem in the development of the semantic grid may be communications and cooperation. This results from the grid being developed by the distributed computing community, while the Semantic Web has evolved from the knowledge representation community - two disparate groups, who rarely interact. However, a demonstration of the value proposition of semantic grid computing will drive its development, just as it has done in many other technologies.2

The Appropriate Use of the Semantic Web

Although the Semantic web addresses current and near-term short-comings of web, it may overlook factors that impact how individuals and organizations will use it. Basically, the semantic web makes more personalized, or sensitive, information available on the web.


The Semantic Scheduling Problem

In a well-known example, which attempts to show the utility of the semantic web, the schedules of three parties are updated, so that they can meet. One is a Doctor's office where it seems reasonable that schedule openings should be public knowledge listed on the web. The other two parties are private, however, and it is legitimate to consider whether their schedules should be on the web. There are questions as to whom should have access to the schedules, or which appointment might take precedence over another. These are the types of questions that arise in large organizations, where departments use software programs to schedule appointments for managers. Typically, someone - the manager or an assistant, makes scheduling decisions in real time. These are posted on an Intranet site for others to view. Sometimes the meetings are rescheduled or cancelled by, or in coordination with, the manager or the assistant. How much more difficult will the manager's job become if a software agent from outside the organization is allowed to schedule appointments and make changes!

Semantically enabled web services sites can be accessed by agents to perform a service based, perhaps indirectly, on the request of another party. How much discretion should be given to agents - in other words, how much of the decision process will be automated? A large organization could do semantic web services internally, provided the system was trusted to work "as advertised".

The full development of the semantic web, as envisioned by the W3C, depends on:

A publicly accessible semantic web will be even more enticing to criminals than the existing web.

However, the semantic web can be very useful for organizations that must deal with the public, such as:

The semantic web is more likely to be useful in controlled situations where web users are known, as in a:

The semantic web will useful for:

Tracking customer preferences through personalization and data mining and market research will also become more effective on the semantic web. Privacy and security are also likely to benefit since these activities are likely to end up on a grid, which isn't open to the public.

Conclusions

The semantic web is still being developed and may evolve further before it is recognized as a valuable asset by enterprises and other organizations. Some traditional questions are relevant for the semantic web: what problem does it solve, and what value does it provide to its users. Its risks are clear: the increased availability of personal information will pose challenges in security and privacy.

Actually it isn't the semantic web, but semantic services that will attract attention. Since there is a movement to develop open source web services, first for the Internet and now for the grid, it seems natural that open semantic web services should also be developed.

Since groups of technology developers and users, that don't usually interact, are involved in the development of the semantic web, short-term problems may arise. But in the long run, these issues may lead to innovative solutions that may have a broader application than originally foreseen.

Finally, there will be a need for semantic web services, especially as the need for personalization services increases. Semantic web services are likely to be hosted on private webs, or grids, which restrict public participation and emphasize security.


Endnotes

1. When the definition for a term is provided in the text then the term is emphasized, usually in italics. External definitions are provided for terms in bold.

2. Geldof, Marije, The Semantic Grid: Will Semantic Web and Grid Go Hand in Hand? European Commission DG Information Society Unit, June 2004. This article may be found in the documents section of the resource, Semantic Grid Community Portal (http://www.semanticgrid.org/).

3. (http://www.covisint.com)


Dr. James E. Burke is a Principal in Burke Technology Services (BTS). BTS provides business assistance to startup technology companies, or organizations planning or integrating new technologies; develops and manages technology projects; performs technology evaluation and commercialization, and assists in technology-based economic development.

Related Prudens e-Reports: Grid Computing | P2P Computing | Web Services


Home | e-Reports | Knowledgebase | Books | Glossary


This web site is maintained by Burke Business Services. Copyright © 2005-2008 PIRI. All rights reserved.