Prudens Information Resources for the Internet


INTERNET BASICS

A Prudens Basic Report

by James E. Burke, Ph.D.

This is a brief summary of introductory Internet concepts usually addressed in books and training courses. All of the terms in bold are explained in the text.


The Internet

The Internet consists of a global network of hundreds of thousands of data networks linked together at peering points with internetworking technology. Internetworking describes how one network may be connected to another, so that the two combine to form a larger network. The use of internetworking gave rise to the name "Internet". The Internet is used to transmit packets of data, that comprise a file, from one node to any other on the expanded network.

The operation of the Internet is based two principles: the end-to-end principle and packet switching. The end-to-end principle states that the behavior of the network should be determined by what is connected to it, rather than having the network design dictate what can be connected to it. In practice, however, since so many different types of servers and applications are connected to the Internet, it has evolved into a faster and more complex system than is needed for some uses.

Packet switching was a historical change from network switching, which established a circuit between two nodes. In packet switching packets of data are sent from the originating node in the general direction of the destination node, without knowledge of how it will be routed over the network. Indeed, some of the data packets that may be sent over different routes, but will be reassembled in the correct order at the destination node to form a web page, an email message, or a digital image.

Data packets are sent on the Internet via routers. A router is device that receives a data packet and determines the next network node to which it should be forwarded. The unique feature of a router is that it can translate between the operating systems of the different networks to which it is connected.

Internet Structure

The backbone of the Internet is made up of many large interconnected networks, known as Network Service Providers (NSPs). The NSPs exchange packet traffic at peering points known as Network Access Points (NAPs) or Metropolitan Area Exchanges (MAEs). At the second level of the Internet are the Regional Internet Service Providers (Regional ISPs) consisting of smaller routers and transmission links and which generally tap into a NSP. Finally, at the low end of the Internet hierarchy, users gain access to the Internet by connecting to a local ISP, although large organizations may connect at a higher level.

World Wide Web

The popularity of the Internet increased dramatically with the development of the World Wide Web (WWW), also known as the "Web". The Internet also has several other components such as email and the File Transfer Protocol (FTP). The World Wide Web consists of servers that transmit a web page to a Web browser, upon request. The browser may located on a computer, cell phone, or other Internet device. The user may link to the Internet backbone via wireless or several types of wireline Internet access.

A Web server is a computer program that provides web pages to other computer programs known as web browsers. The web server software may stored in single-purpose computer, which may also be known as a web server. There are other types of servers such as email servers and application servers. Each server has an Internet address, which allows it to receive and send web pages, email, and applications requests from or to any other server connected to the Internet. Each web server hosts one or more Web sites consisting a collection of Web pages, or screens, written in the Hypertext Markup Language (HTML), or the Extensible Markup Language (XML). HTML and XML are subsets of the Standard Generalized Markup Language (SGML), a markup language used in the printing industry to describe the layout for the printing of electronically transmitted pages. Similarly, HTML and XML are comprised of markup symbols, or tags, that describe how the browser should display the content (text, graphics, etc.) of a web page.

Data communications on the Internet are based on protocols, computer readable sets of rules that establish how network computers (e.g. servers) interact in order to transmit data. Examples are the Transmission Control Protocol (TCP), Internet Protocol (IP), and the Hypertext Transfer Protocol (HTTP).

TCP/IP is a suite of protocols that defines the Internet. The TCP protocol establishes a valid connection and exchanges streams of data between the sending and receiving nodes, or servers. TCP guarantees the delivery of data packets in the order sent even though individual packets may travel by different paths over the Internet and may arrive at different times.

The IP protocol defines the addressing of packets on the Internet. In Internet Protocol Version 4 (IPv4), an IP address is a 32-binary digit number that identifies the sender or server of the data that is sent in packets across the Internet (See the following discussion of Internet Protocol Version 6 (IPv6)). An IP address has two parts: the identifier of the sender's network and an identifier of the sender's server connected to that network. The IP address is placed on every packet. The packet is sent to the IP address that is obtained by using the domain name contained in the Uniform Resource Locator (URL) to query the Domain Name Service (DNS). A typical transmission consists of packets that contain an email message or a request from a web browser to a web server that a particular web page be transmitted back to the browser.

Requests from the browser to the server are made using TCP/IP. The Web pages, or hypertext, are transmitted using Hypertext Transfer Protocol (HTTP) and are displayed on the browser using HTML or XML, which describe how the browser should display the content of the transmitted web page file. The web page may also contain a hyperlink, which appears as highlighted text within a web page. When the text is activated, or "clicked", the user is transferred to the Web site whose location is identified in the URL associated with the hyperlink. The activity of transferring from one web site to another is called "web surfing".

The Internet and Data Networking

In 1983 (around the time that TCP/IP was being developed), the International Standards Organization (ISO) developed the Open Systems Intercommunication (OSI) model of network layers, along with accompanying standards to allow operators and vendors to link together their proprietary networks. OSI allows hardware and software operating systems to connect, exchange data, and allow networked applications and other intelligent communications to occur across networks.

Each layer of the OSI model controls the designated operations (layer name) and transfers data to the layers immediately above or below it:

There are tradeoffs in the speed of operations between the relatively "intelligent" but slow higher layers (i.e. layers 6 and 7) versus the "dumb" but fast operations in the lower layers (i.e. layers 2 and 3).

The Internet will run on any type of physical network (layer 1) independent of the network architecture, such as Ethernet or ATM (Asynchronous Transfer Mode), in layer 2. The IP (layer 3) allows packets of data to be sent between the hosts (identified by IP address) on the Internet, and TCP (layer 4) checks to see if the data is at the correct address; if not it sends it back to the IP layer for resending. It also checks for other errors, puts the data in the right sequence and sends the data to an application, such as HTTP, in a higher layer, all based on the information in the header of the data packet. In any case the OSI model is becoming less applicable to the Internet due to advances in hardware technology (e.g. routers and switches at the lower levels, and servers at the higher levels) that combine the functions of several layers.

Other Network Concerns

Version 6 of the Internet protocol is slowly replacing version 4 (there is no version 5), currently in use on the Internet. Since IPv6 uses a 128 bit address, it will provide distinct addresses for two to the 128th power, or 340 Undecillion (340 with 36 following zeros!) Internet devices, versus 4.3 billion addresses available with IPv4. IPv6 will support improved services in voice over IP (VOIP), security and other service-related features such as quality of service on the Internet.

Network administrators are striving to enforce policies concerning the operation of networks. For example, improvements in "quality of service" may one day see the Internet sending email like a small package delivery service with high priority messages traveling faster to the destination but incurring a surcharge, while low priority traffic moves slowly, but at no charge. Incredibly, all email on the Internet currently travels with the same priority. If you use your email program to set a high priority on a particular email message then it might attract someone's attention at the destination, but it won't get there any faster than any other email!

The Internet and Congestion

As in any network, there are several ways to reduce congestion on the Internet. The first is traffic management, which is implied through intelligent routing (i.e. routers with artificial intelligence capabilities) and quality of service. The second approach to reduce congestion is to increase capacity. This has been largely accomplished on the Internet backbone with the installation of a fiber optic infrastructure, to the point of significant over-capacity.

Yet, there is congestion on the Internet. One problem is that Internet traffic comes in bursts rather than the steady drone of voice traffic, and requires a larger capacity to provide an acceptable level of service.

Another problem is management-related. At peering points where Internet traffic is shifted between backbone operators and toward its destination, there is still significant congestion. The individual networks offload traffic at the peering points not destined for a site on their network. Since not every individual network is connected to every peering point, the traffic that is dumped from one tends to circulate, or churn, until it eventually finds its destination network. Needless to say, this churning disrupts the overall performance of the Internet. The problem may not be correctable since network owners have an incentive to invest in the performance of their own networks, but not in the overall performance of the Internet.

As the number of routing destinations becomes larger, the routing tables, paths that a router can use to forward a data packet to a distant destination, grow ever larger. Searching these tables, even with a high speed computer, takes longer as the Internet increases in size, further slowing the overall speed of Internet operations.

These and other congestion problems give rise to calls for the movement of content to the edge, that is for content to be more dispersed. Content could be moved further down the Internet hierarchy to various regional networks, Internet service providers (ISPs), and organizational networks. This can be done with proxy servers and caching.

Proxy Servers

Large web sites that attract a substantial number of visitors, such as Yahoo, will place proxy web servers at different locations to fulfill requests from browsers. The proxy server, or parts of its content, is updated from the original server on a periodic basis. ISPs will operate proxy servers, for example, to reduce traffic in and out of their site in order to reduce costs and improve performance. A proxy server can also serve as a filter to protect the server from virus attacks. Companies that cannot afford to own and maintain proxy servers can purchase the service from Internet infrastructure service companies.

Internet Caching

Caching is the use of temporary storage that has been used to speed up processing in servers and data networks. Internet caching refers to the interim storage of frequently accessed web pages. Frequently requested web pages may be stored in a "cache" where they are readily available to users either on a site (e.g. on the Intranet of a large corporation) or strategically placed within the network to reduce traffic on critical links.

On-site Caching may be used in an effort to make the operation of a frequently accessed web site more efficient and faster. It is often used in site management with "load balancing", which spreads the Internet traffic load between local web servers on large sites by sending page requests to the server with most available capacity. If the requested page is in cache, it is immediately sent to the requesting browser.

Network caching is used to distribute content over the Internet. Caching allows the storage of frequently used web pages on remote servers at strategic locations, such as ISPs. Using rules regarding the time sensitivity of the data on the original server the network caches are updated on a regular basis. Sometimes only the parts of web pages that don't change reside on local servers, and the changing content is updated periodically.

Bypassing the Backbone

Satellites can be used to send content from producers directly to ISPs, thus bypassing the Internet backbone. This promotes the streaming of video and audio services known as satellite webcasting, which is expected to grow although the success of the service will depend the on speed of the signal from the ISP to the home or business user.

The Use of Web Sites

Nearly every organization and many individuals have web sites. The corporate world has grown to depend on them. Companies such as Amazon, eBay, Google, and Yahoo couldn't exist without web sites. Intel, for example, manages 6,000 internal Web sites from four international locations and sells about $1 billion in computer chips each month over the Web. Many commodities markets are conducted exclusively over the web.

An interesting feature about the Web is that some web sites are designed to attract the public in order to purchase products, while others are designed to exclude the public, in particular corporate spies and online criminals. The issues for publicly accessible sites include web site design, which will be attractive and "user-friendly". Another potential issue in attracting users is scaling, which is the ability of a Web site to accommodate a rapid increase in visitors. This is accomplished again, through web site design, and with a technical strategy to rapidly increase the capacity of the site when needed.

Website Access

Regarding web site security, many organizations have established Intranets, an Internet-based network within an organization where access is restricted. This was originally accomplished with password security, but now directory-based identity management is taking hold. This approach focuses on establishing the identity of each online user, by accounting for access through a particular password-protected computer (browser), and increasingly through biometric means such as fingerprint and retinal scans.

The Intranet connects to the Internet through network security software and hardware, known as a firewall. Over the Internet, buyers and sellers link their Intranets to form exclusive networks that extend beyond corporate boundaries and include other enterprises. While corporate partners have access through a firewall to some information on another company's web site, they are isolated from sections of the Intranet by yet other firewalls, for which they don't have access.

When considering corporate web sites it is clear that much of the Web isn't accessible to the surfing public. This is also true regarding military and other government sites, and this exclusion will grow as grids, a form of private Internet, become more popular. With this approach, users will eventually become members of buying communities and other types of online communities, each of which will have its own grid. Users will be accepted and known on their own community grid, but will not be allowed on others without participating in an identity authentication process, where it is established that their online identity matches their offline identity. When the identity of a user established by one community is accepted by another community, based on the trust between communities, the user is said to have a federated identity in the second community.

There is also a technical reason that limits access to much of the data available on the Web. The deep web, invisible web, or hidden web refer to web pages generated dynamically, or on the run, from data stored in publicly accessible databases. The dynamic construction of these web pages means that they are not accessible to bots (network robots or spiders - software programs which continually search the web for content) searching for HTML pages, and are not indexed by the typical search engine. The size of the deep web is estimated to consist of approximately 100 times larger than the static or indexible web of 11.5 billion stored HTML pages. Although the 100,000 "deep" web sites are visited more frequently than the "surface" web, they are not well known to the web surfing public. The deep web refers only to public databases, and does not include the huge databases of large private companies or of sensitive government materials. It also does not include data that is shared on the Internet with the FTP, a protocol that is relatively less popular than the HTTP, but is still used to transfer documents and other types of content.

A final limitation to finding data is that it's meaning may not be clear, or that it is masked by tens of thousands, even millions of duplicate, spurious and nonsensical results from each search. This problem is being addressed by improving the specific meaning or semantics in the search through the introduction of the semantic web, and the improved use of knowledge representation, in general. The use of XML to encode or tag web site data is a first step in this direction and lays the groundwork for the emergence of the semantic web.




Dr. James E. Burke is a Principal in Burke Technology Services (BTS). BTS provides assistance to businesses and other organizations planning or integrating new technologies; develops and manages technology projects; performs technology evaluation and commercialization, and assists in technology-based economic development.

Home | e-Reports | Knowledgebase | Books | Glossary


This web site is maintained by Burke Technology Services. Copyright © 2005-2008 PIRI. All rights reserved.