Saturday, January 1, 2000

Deconstructing a URL

The Uniform Resource Locator, or URL, is what makes the Web possible. It provides a globally unique identifier for everything on the Internet. The official specification for URLs is fairly complicated – don't be discouraged – the most important part to understand is the hostname.

We start with a simple example:

http://www.example.com/test_page.html

“http://” is called the scheme. It tells the browser how to talk to the server (in this case with “HyperText Transfer Protocol” or HTTP) and what sort of reply to expect. A scheme is required in a URL, but most browsers will add it for you. Try typing “google.com” in your browser – it will be changed to “http://www.google.com”.

Everything between the scheme and the first forward slash (“/”) is called the hostname. This is further divided into sections of letters and numbers separated by periods: the domain name (“example.com”) and, optionally, one or more sub-domains (“www”). The hostname identifies the server on the Internet that will handle our request. The domain name is required in a URL.

Everything after the first forward slash is called the path, which tells that server where to find the file we're looking for. In our example, we tell it to look for a file name “test_page.html”. The path is optional in a URL. If it is not specified, the server will use its default path.

Now a more complicated example:

http://www.example.com/workers/directory/lookup.php?first_name=john&last_name=doe#birthday

(Note: this example is one long string with no spaces in it, but the browser wraps it into two lines so it can be shown without running into our sidebar.)  This example has a longer path (“workers/directory/lookup.php”). As before, this tells the server what file is being requested.

The question mark (“?”) allows us to add additional information called query strings. These strings are made up of simple pairs in the form of “name= value” and multiple pairs are separated by ampersands (“&”). Query strings are optional in a URL.

The hash (“#”) specifies a fragment, or a part of the page we are interested in. In this example, “#birthday” says that we're interested in looking up John Doe's birthday. Fragments are optional in a URL.

Query strings are often used to request dynamic information.  In our example, we want a Web page for every worker in our company but writing each page by hand would be prohibitive.  “lookup.php” is a program, or script, that runs on our server which looks up worker information in a database, allowing one URL to display information about everyone in our company.  The query string tells the script who we’re interested in.

Fragments are defined points on a page.  It allows us to tell the browser to scroll down to the relevant information when they follow a link.  Rather than say “follow this link and scroll down to page 24” we can say “as noted in the document’s footnotes.”

There are several ways to create anchor points – the destination of a fragment – but blog entries are generally not long enough to necessitate creating them.  Bloggers most often find themselves using fragments in outbound links.

1 comments:

Sherry said...

Thank you, Mike. I always just quit looking when it reached the ? but now the rest makes more sense.