The Architecture of the World Wide Web Min Song IS

The Architecture of the World
Wide Web
Min Song
IS
NJIT
Internet Architecture
 Today’s Internet




Thousands of networks
Connected by legal agreements and commercial
contracts
Uses TCP/IP protocol
Internet service providers (ISPs)




Provide most individual users with access to the Internet
Dialup connections
Modems and conventional phone lines
xDSL and cable modems provide broadband access
Packet Switching
 Most modern Wide Area Network (WAN) protocols,
including TCP/IP, X.25, and Frame Relay
 Packet switching is more efficient and robust for
data that can withstand some delays in transmission,
such as e-mail messages and Web pages.
 Circuit-switching: Normal telephone service is based
on a circuit-switching technology



a dedicated line is allocated for transmission between
two parties.
data must be transmitted quickly and must arrive in
the same order in which it's sent.
real-time data, such as live audio and video.
Use of Packets
Internet Protocols:TCP/IP
 Communications protocol suite

Packet switched protocol

Transmission Control Protocol (TCP)

Internet Protocol (IP)



No end-to-end connection is required
Each message broken down into small pieces called packets
Packets possibly routed to destination over different
paths



Breaks messages into packets
Numbers packets in order
Reorders packets at the destination

Routes packets to the proper destination
Domain Names
 Every computer connected to the Internet must have
a unique IP address

IP address format is xxx.xxx.xxx.xxx where xxx is a
number between 0 and 255
 How do we know that 207.46.245.222 is Microsoft?
 Domain Name Service(DNS)



A database of Internet names
DNS Servers convert Internet names to IP addresses
Top level domains
 Ping: to test whether a particular host is
reachable across an IP network.
 Tcpdump:to sniff network packets and make
some statistical analysis out of those dumps
The World Wide Web
 Collection of hyperlinked computer files on the Internet
 Client-server application
 Web servers
 Web browsers as clients
 WWW standards
 Hypertext markup language (HTML)



Current standard for writing Web pages
Implementation of SGML specifically for Web pages
Tags in HTML instruct the client browser how to format and
display the Web page content

Hypertext transfer protocol (HTTP)

Extensible markup language (XML)



Protocol that establishes a connection between Web server and
client
A meta-markup language
Gives meaning to the data enclosed within XML tags
Static versus Dynamic Web Pages
 HTML and XML only display and exchange data
 No interactivity; no processing of data
 Scripting languages
 Provides basic interactivity


Rollovers
Crawling text
 JavaScript
 VBScript
 Full-featured Web programming
 Java
 Client side scripting or browser side scripting
 Applets
 J2EE
 Common Gateway Interface (CGI)
 Allows passing of data between a static HTML page and a
computer program
Searching the WWW
 Most data on the Internet is part of the WWW
 Search engines – large databases that index WWW
content
 Building the search engine database

Submit a site to the search engine administrator for
listing
Spiders


Google
Yahoo


Metatags
Hypertext Transfer Protocol
 A protocol (syntax and semantics) for
transferring representations of resources
 usually across the Internet using TCP
 Design goals
 speed (stateless, cachable, few roundtrips)
 simplicity
 extensibility
 data (payload) independence
 A true network-based API
HTTP/0.9 (pre-1993)
 Absolute Simplicity
GET /url-path
<TITLE>Hello World</TITLE>
Hello World
 No Extensibility
 only one method (GET)
 no request modifiers
 no response metadata
HTTP/1.0 (1993-present)
 Simple and (mostly) Extensible
GET /Test/hello.html HTTP/1.0
Accept: text/html
User-Agent: GET/5 libwww-perl/0.40
HTTP/1.0 200 OK
Date: Fri, 12 Jan 1996 01:02:49 GMT
Server: Apache/1.0.5
Content-type: text/html
Content-length: 38
Last-modified: Wed, 10 Jan 1996 01:
<TITLE>Hello</TITLE>
Hello out there!
HTTP/1.0 Deficiencies
 No complete specification until end of `94
 No minimum standard for compliance
 Poor network behavior






one request per connection
no reliable transfer of dynamic content
no control over response caching
failed to anticipate proxies and gateways
created huge demand for vanity addresses
misuse/misunderstanding of MIME
HTTP/1.1
 Culmination of two years work, RFC2068
 with Henrik Frystyk, Jim Gettys, Jeff
Mogul
 designed at UCI and W3C; expanded in
IETF
 Improved Reliability
 chunked transfer of dynamic content
 recognition of proxy and gateway
requirements
 explicit cachability of responses
 Improved Network Behavior
 persistent connections
 virtual hosts (many names, one address)
HTTP/1.1 (1997-????)
 Less Simple, More Extensible, but Compatible
GET /Test/hello.html HTTP/1.1
Host: kiwi.ics.uci.edu:8080
User-Agent: GET/7 libwww-perl/5.40
HTTP/1.1 200 OK
Date: Fri, 07 Jan 1997 15:40:09 GMT
Server: Apache/1.2b6
Content-type: text/html
Transfer-Encoding: chunked
Etag: “a797cd-465af”
Cache-control: max-age=3600
Vary: Accept-Language
…
HTTP/1.x Deficiencies
 MIME is too verbose (overhead per message)
 Control mixed with metadata
 Metadata restricted to header or trailer
 Fixed request/response ordering can block
progress
 Incurs frequent round-trip delays due to
connection establishment.
HTTP/2.x
 Tokenized transfer of common fields
 reducing bandwidth usage, latency
 removal of MIME syntax limitations
 self-descriptive for extensions
 Multiplexing control, data, metadata streams
 reducing desire for multiple connections
 enabling multi-protocol connections
 per-stream priority or credit mechanism
 Layered streams for meta-metadata,
encryption...
XML to the rescue?
 “X” for extensible:
 self-descriptive syntax
 semantics by reference (doctype,
namespaces)
 rendering by reference (style sheets)
 An XML representation is an object turned
inside-out, with behavior-by-reference
 However, network application performance
will demand standards for domain-specific
doctypes and style sheets
Future Work
 Dynamic application architectures
 Architectural analysis and performance
bounds
 Impact of future network architectures
(ATM)
 Balancing secure transfer with firewall
visibility
 Protocol for manipulating resource mappings
 HTTP-NG (W3C/Xerox PARC)
 rHTTP
(UCI)