How the Web Works

The information in this article won’t be particularly deep or technical, and highly web-savvy purists may find that some points are simplified to the point that they are not completely accurate. The point of these articles is to help people begin to understand things, not to prepare them for writing or responding to RFCs.

With today’s tools, it’s entirely possible for a web designer, even with some coding experience in various web programming languages, to create an entire web site without really understanding what’s going on behind the scenes. This is especially true for people with a background in a non-web programming language like Java or C++ who are getting started with creating web sites. If this describes you, and you’d like to know more about how the Web really works, but were afraid to ask, this article is for you.

I created a surprising number of successful web sites before I really understood what I was doing. I learned to use HTML and CSS to create attractive (at least to me) web sites, but I had no real understanding of how the pages of those sites made it from the server to the users screen.

The URL

Since you got to this page, you probably know what a URL is. URL stands for “Uniform Resource Locator” and it’s really just an address like the address for where you live. Like street addresses, each URL specifies a unique location on the Web.

The first part of the URL is a “protocol” (http://,https, ftp:// etc.), which just specifies what kind of resource it is. For our purposes, we’ll assume that it’s http://, for a standard, non-secure web page. Next, comes the name of a “host,” (domain), which identifies the web server that has the page. Then (optionally) a subdomain, next (optionally) a port number, and finally, the full path to the “file” on the host (directories and file name). There are two slashes after the protocol to distinguish it from the slashes in the path. There may also be a reference to a named anchor at the end of the URL (preceded by a #), which points to a specific location within the page.

There’s a nice graphic for this here.

Imagine that you could communicate with a friend by snail mail using USPS, Fedex, UPS, etc., all from the same mailbox. On each letter, you’d have to specify the protocol (which service to use), the central processing center for mail where your friend lives (the city and/or zip code), and the person’s street address (the path). You’d also include a return address. That’s really almost all there is to a URL.

The protocol, and host are required for a legitimate URL. The path, if missing, is usually assumed to be ‘index’, so if the path part is missing or contains the name of a directory rather than a file, most hosts will look for an ‘index.htm,’ ‘index.html,’ or ‘index.php’ file. The order of that search can be set in the .htaccess file and you can also set the name of the default index file there. If there is no index file, the server will show a list of *all* the files, unless told not to in the .htaccess file.

There are several things relating to the index file that you can do to make your site more secure. You can put an empty index.html file in every directory, you can tell the server not to show the file list, and you can change the name of the default file the server looks for from “index” to something long and arbitrary. If you’re paranoid, you can do all three.

URLs can also contain arbitrary information at the end in a “query” section. When a URL contains a question mark after the path, everything after the question mark is the query. The query can contain almost anything, but usually it’s in the form of several variables and their values.


http://bobsguides.com/search.php?term=modx&topic=plugins

In the example above, http:// is the protocol, bobsguides.com is the host, search.php is the path to the requested page and there are two variables in the query: term has the value “modx” and topic has the value “plugins”. The separate variables are separated by the & character. Note that this is not a real URL. I just made it up.

The Request

When your browser processes a URL, it basically sends out a message, technically called a “request,” into the ether that contains the URL and some other information, some of which is optional. All the processing points between you and the specified server pass the request in the server’s direction until it gets there. Two identical requests made one after the other could take completely different routes to their destinations. For the purposes of this article, we’ll assume that all just happens by magic, since you rarely have to know the details. The request also contains the address of *your* server, so the target server knows where to send the response.

If you’re submitting a form, the process is exactly the same except that the message also contains information about the fields you’ve selected or filled out on the form. The process is essentially the same whether the server is posting to itself or to a server on the other side of the world.

Any time that a new web page is displayed in your browser, you can assume that a request has been sent and a response has been returned from the target server.

Client vs. Server

You may already be familiar with these terms, but in case you’re not, here’s a brief bit if information on them. In a typical exchange of information on the web, the web browser is the client and the computer (or collection of computers) at the host that contains the information is the server. The browser sends a message that says, “Hi, I’d like to be one of your clients.” The server does a little checking to make sure you’re not blocked and the information is available and says, “Sure, why not.”

The Web Server

The server at your web host (or your own machine if you have a localhost install) has one main job. When a user requests a page (by clicking on a link, or typing in a URL), a request is sent to the target server. If the request is an http:// request, the server looks for the requested page and (assuming that the requester is not blocked for some reason and the page is available) sends it back to the user’s browser. In the very early days of the Web, that was the whole story, and it’s still the most important part. In those days, the pages were HTML files on the server’s drive and were essentially just read from the disk and sent directly to the user’s browser for display.

These days, the process is a little more complicated, but still fairly basic. If there is an .htaccess file (or other routing file) on the site, the server will use the rules in that file that alter the page request before looking for the requested resource (or telling a banned user to get lost). The resource requested in the URL may be a physical file on the server, but in the case of a Content Management System (CMS) like MODX or WordPress), it may not be a file at all. In many CMS platforms, the HTML code for the requested page may be stored in a database. In addition, the returned page may contain images, CSS code, and JavaScript code in addition to HTML.

On modern servers, something else may happen before the page is returned. If the requested page has a file extension other than .htm or .html, (e.g, .php, aspx, cfm, rhtml, etc.) the server will recognize it as needing further processing. If the server has an installed engine for processing the particular extension, if will pass the content of the requested page to the engine, then return the result it gets back from the engine to the user’s browser. The engine most often converts the page’s code to HTML and returns it to the server, which sends in on to the client that requested it.

As we saw above, the returned page may contain images, CSS, or JavaScript, but although it can be, that code isn’t always *in* the returned page. Often, just the URL of the CSS or JS is returned in the page script, inside a tag that identifies what it is. In other words, the server is telling the user’s browser, “it’s here — get it yourself.” In that case, the browser sends another request to the included URL. Every request takes time because it involves a round-trip between the client and the server identified in the URL, which is why it speeds things up a lot to combine multiple JS or CSS files into a single file and combine images into a single “sprite image” — it cuts down the number of requests that have to be made before the page can be rendered in the browser.

Another way to speed things up is to have the pages compressed before being sent. That reduces the amount of raw data that has to be transferred in the response. The compression is done by a compression engine on the server and is usually triggered by a directive in the .htaccess file. If the server sees such a directive, it knows that just before returning the response, it should submit the response to the compression engine and return what it gets back to the user’s browser, along with information that tells the browser that the information is compressed and what compression method has been used. Hopefully, the user’s browser knows how to decompress the information.

Server Variables

In PHP, information about the request (and some other things) is available in the $_SERVER array. Some of it comes from information in the request itself and some from information the server already knows. The name of the sever itself, for example, is in the $_SERVER['SERVER_NAME'] variable. The $_SERVER['HTTP_HOST'] variable contains, in theory, the name of the server that sent the request. The $_SERVER['REMOTE_ADDR'] variable contains the IP of the server that sent the request. There are many $_SERVER variables, but not all of them are reliable — some can be spoofed by a clever programmer.

For a complete list of the $_SERVER variables, see This page. You can also see the $_SERVER variables and their values by putting this code on a page and viewing it. (In the case of MODX, you’d put the code in a snippet and put a tag for the snippet in the page):

<h3>Server Variables</h3>
echo '<pre>&#039;;
echo print_r($_SERVER, true);
echo &#039;</pre>';

The Request

Part of the information in a request the server receives is converted to a PHP array called $_REQUEST. The $_REQUEST array often contains information from a form the user has submitted and is processed by the PHP of the requested page, but it can be used for anything.

The simplest part of the $_REQUEST array is the $_GET array, which is constructed from the URL itself. Earlier, we saw an example of a URL with a query section at the end containing two variables. If a form contains those two variables and the form’s method is “get”, the browser will automatically add a question mark onto the end of the URL and then tack on those to variables and their values. When the server processes that request, it will automatically put those two variables into the $_GET array:

$_GET = array(
    'term' => 'modx',
    'topic' => 'plugins'
)

In PHP code on the requested page, you might see those variables extracted from the $_GET array like this:


$term = $_GET['term'];
$topc = $_GET['topic'];

The $_POST array is exactly the same except that the information is sent inside the request rather than in the URL. Using the $_POST array (by setting a form’s method to “post,” for example) is generally considered safer than using $_GET since it hides the information, but it’s easy enough to use tools that will display the $_POST data as it’s sent, it’s just a little less convenient than simply typing them into the browser’s address line, which is what you do to create $_GET variables.

The third member of the $_REQUEST array is the $_COOKIE array, which contains any cookie information stored on the user’s machine related to the current web site. If a server’s response to a request contains cookies, the user’s browser will store them (unless they’ve turned cookies off). The next time the user visits a page at the same site, the browser will send any stored cookies in the request. Cookies allow information to persist across visits. They can allow users to be permanently logged in, for example, or store their preferences.

One last point about the $_REQUEST array for coders. You can get $_GET, $_POST, or $_COOKIE data from the $_REQUEST array (since they are its three members), but the arrays are separate entities. In other words, if you modify the $_GET, $_POST, or $_COOKIE array yourself, the real $_REQUEST array will not change and vice versa.

Back to the Browser (client)

We’ve talked about how servers respond to requests, but what about when the response arrives back at the user’s browser? Before displaying the page, the browser makes further requests for any URLs contained in the response to get any required images, JS, or CSS. It plugs those into the page, uses the CSS as a guide, and renders the page’s HTML for viewing. It also executes any JS code it encounters and alters the page accordingly.

Unfortunately, different browsers have different ideas about how to interpret and render the data they receive, so cross-browser testing is still necessary in many cases. I hope to live to see the day when every browser renders the same code in the same way, but I’m not counting on it.

How This Relates to MODX

In MODX, requests from a browser are tied to specific Resources. When a request comes in, the MODX index.php file looks for a Resource in the database that corresponds to the one named in the request. If it doesn’t find one, it returns the content of the MODX error (page not found) page. If it does find one, it checks to see if the user is allowed to see it, and if so, it does all its MODX magic to render the page (e.g., getting the template, getting the page content and other fields from the database, processing tags, etc.), then sends it back to the browser.

Summing Up

The key to understanding how the web works is the request. Web browsers make requests, and servers respond to them. The request contains lots of information and the server uses that information to tailor the response. The process is relatively simple, though, and understanding it can help you bring your web skills to the next level.


For more information on how to use MODX to create a web site, see my web site Bob’s Guides, or better yet, buy my book: MODX: The Official Guide.

Looking for quality MODX Web Hosting? Look no further than Arvixe Web Hosting!

Tags: , , , , , , , , | Posted under Internet, MODX, MODX | RSS 2.0

Author Spotlight

Bob Ray

Bob Ray

I am the author of MODX: The Official Guide and over 30 MODX add-on components. I host Bob's Guides, a source of valuable information for MODX users, and I've been very active in the MODX Forums with over 14,000 posts.

Leave a Reply

Your email address will not be published. Required fields are marked *


− 1 = 4

You may use these HTML tags and attributes: <a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>