Tuesday, October 28, 2014

Internet Basics - HTML (HyperText Markup Language)

In the prior post, we explained how your computer finds an IP address for a website through DNS resolution.  After that, the browser connects to the web server and loads the web page.  We will save the whole process of explaining how the connection is set up for a later post.  In this one, we explain HyperText Markup Language (HTML) which is used to create web pages and allow browsers to display web pages on your browser.

HTML is a language that will give a browser instructions on how to construct and display a web page.  You can view the HTML for any web page by choosing the view page source option on your browser.  What you will see is basically a text file that is written in HTML.  

A key element for HTML are tags.  These are used to mark certain parts of the HTML page which can then be read by the browser to display properly on your screen.  For example, the tag <b> will specify an area that should be made bold.  Most tags have an opening and closing tag which is the same tag with a slash </b>.  So you may find this in HTML: <b>Everything here is in bold</b>.

A very common tag is for loading images.  <img src="http://www.infofornyc.com/images/infofornyc_logo.png"> tells the browser to load the image located at the URL http://www.infofornyc.com/images/infofornyc_logo.png.

URL stands for Uniform Resource Locator which points to a specific "resource".  A resource could be an image, a web page, a video, and more.  A typical URL will have the following:
  • Protocol - http - the most common for web is http though there are others.
  • Domain name - www.infofornyc.com in this example.  We discussed domain in the last post.
  • Port number - this is generally assumed to be 80 and usually not explicitly displayed on the web browser for http
  • Path - /images/infofornyc_logo.png - everything after the domain which helps identify the location of the content (look in the images folder for the infofornyc_logo.png image)
There are some additional parts, but I'll save that for a later in-depth look at the URL.

HTML has a lot of different elements besides image tags and bold tags but I'll also save that for a more detailed post later.  For the novice, just remember that HTML is sent by the web server to your browser as a text file.  Your browser will read the text file and load the content to display the page on your computer.

To finish out the example flow from our last post where www.google.com is placed into the browser, once DNS resolution is complete, the below occurs.


  1. Browser connects to the www.google.com web server at the IP address provided and makes a request for http://www.google.com.
  2. The Google web server delivers a HTML page to the browser.
  3. The browser reads the HTML page and displays content and identifies additional content to load.
  4. If the HTML page has additional content to load such as images (e.g., https://www.google.com/images/srpr/logo11w.png), the browser will request the content from the web server.
  5. The web server delivers all the additional content.  Steps 4 and 5 repeat until all content referenced in the HTML page are downloaded.
  6. The browser displays the full web page.
Congratulations!  If you have read all the posts up until now, you should have a solid foundation on understanding how the Internet works.  We'll go into some more advanced technical topics and dive deeper into some concepts already introduced.  We will likely start by explaining some of the key protocols such as IP (and how routers function), TCP, UDP, and HTTP.




Monday, October 27, 2014

Internet Basics - DNS Resolution

In this post, we will walk through the DNS resolution process.  This is the process in which a domain name is translated into an IP address so your computer knows how to connect to a web server.  This will be our first post with a diagram too!

Terminology

Let's start with some terms before diving into the diagram and process for DNS resolution.

Resolver - these are computers typically set up by your Internet Service Provider (ISP is the company giving you Internet service like Verizon or your Cable provider).  Resolvers will find out the IP address for the website you are trying to connect with.  Your computer or router are set up to point to a resolver's IP address.

Root Servers - these are servers that are designated by the Internet Assigned Numbers Authority (IANA*) to assign management of the top level domains.  A top level domain is the right most part of a hostname such as .com, .net, .org, and so on.  You can see a listing here of the companies or organizations responsible for top level domain management.  For example, as of the date of this post, Verisign is the company that manages the .com top level domain.

The company responsible for each top level domain runs the Authoritative Name Server for that top level domain.

Authoritative Name Server - this is a Name Server that can provide the definite answer for a particular DNS zone.  A DNS zone is the domain that a single company has been given management authority over.  In the prior example, Verisign runs and manages the Authoritative Name Server for .com.  Individual companies will typically manage their own Authoritative Name Server.  For example, Google will manage the Authoritative Name Server for Google.com.

Zone File - what it means to manage a zone is to manage a zone file.  This Zone file is basically a text file that typically has multiple DNS records on it.  An example of a DNS record is a mapping of a domain name to an IP Address.  This is called an A record which stands for address record.
    • A simplified example: "www.google.com.      A      74.125.224.72"
Ok, enough terminology for now.  It's a lot but hopefully you didn't get lost.  We'll solidify the concepts with the diagram below.

DNS Resolution Process

The following diagram outlines the DNS process.

DNS Resolution Process
  1. Let's say you want to go to www.google.com to run a web search.  You type in www.google.com in your browser and hit enter.  
    • Your computer or your router has a setting to ask your resolver (typically one set up by your ISP) "what is the IP address for www.google.com?"
  2. Your resolver will have to look up where to find the answer**.  The resolver will be configured to ask the root DNS server "what is the IP address for www.google.com?"
  3. The root DNS server will respond that it does not know but it will tell the resolver the IP address for the .com Authoritative Name Server to ask the same question.
  4. Now the resolver asks the .com Authoritative Name Server "what is the IP address for www.google.com?"
  5. The .com Authoritative Name Server will respond that it does not know either but it will tell the resolver the IP address for the Google.com Authoritative Name Server to ask the same question.
  6. Next the resolver asks the Google.com Authoritative Name server "what is the IP address for www.google.com?"  Since it is Authoritative for Google.com, the Google.com Name Server will have a Zone file on it with an A record: "www.google.com.      A      74.125.224.72"
  7. The Google.com name server will then respond to the resolver that "the IP address for www.google.com is 74.125.224.72".
  8. Finally, the resolver will tell your browser, "the IP address for www.google.com is 74.125.224.72".
Ok, that is a lot there, and I gave you the long version of DNS resolution so you understand the whole process.  Generally it is much quicker due to caching (storing the mapping of domain name to IP address to eliminate repeating all the steps every time).  We will plan on getting more technically detailed in later posts.

Note at this point, we haven't even contacted the Google web server to actually download the www.google.com page so you can run a search.  

Next in our series we will discuss HTML to walk through the next stage where a browser connects with a web server to load a web page.


Footnotes:
* IANA is a non-profit organization responsible for handling this to be 
** Typically your resolver will have the answer stored since it is a common query and by storing the answer for a period of time, it will cut down on the speed to do DNS resolution

Thursday, October 23, 2014

Internet Basics - Domain Name System (DNS)

In my previous entry, we discussed IP Addresses and how difficult it would be for us to memorize so many different numbers to get to our favorite websites.  The Domain Name System (DNS) was developed to make this easier for us.  With the DNS system people can easily remember their favorite websites and still allow computers to communicate with each other using their assigned IP address.

Back to a previous example using mailing addresses, it can be similarly difficult to memorize the addresses (or phone numbers) of all your friends and family.  To assist with that, we now store contacts on our phones to easily look up the address or phone number by people's names.  Though not quite popular anymore, there are still phonebooks out there or online directories where you can search for people's addresses and phone numbers by their names.

In the same way, DNS is a phone book for the Internet.  It will translate domain names into IP addresses.


Domain Names

To understand DNS, we first need to understand Domain Names.  Let's go through some terminology.

A Domain Name is a unique name to identify a website and other Internet resources.  For example google.com is a domain name.  

You will also hear the term subdomains which are just domain names that are part of a larger domain.  Some subdomains of google.com include news.google.com or translate.google.com.

A hostname is the full domain of the website you are going to.  These full names are called Fully Qualified Domain Names.

So all the examples above are domain names: google.com, news.google.com, and translate.google.com.

News.google.com and translate.google.com are subdomains of google.com.  They are also both hostnames as they bring you to the Google news site and translation site respectively.

Still with me?  Comment below if you have questions but let's finish up DNS.


DNS

So DNS was set up to translate these domain names to IP addresses and acts similar to a phonebook or contacts list on your phone.  It's easier to remember your friend's name than their phone number.  In order to map these domains to IP addresses, someone must assign the address to the domain.

Each company typically manages their own "phonebook" or "contacts list".  Let's take Google as an example.  Google is assigned a set of IP addresses that they can use to put their web servers on the Internet.  Google may add or remove domains frequently so it's best if the manage the assignment of their IPs to their domain names.  

The management of these assignments of IP to domain names is typically done on specialized servers called name servers.  We will go more into name servers in a later post so don't worry if you don't quite understand that part yet.


Summary

To summarize the above briefly:
  • DNS is a system to map domain names to IP addresses
  • DNS makes it easy for humans to remember web site hostnames but still allow computers to communicate each other using assigned IP addresses
In our next post we will go through the DNS Resolution process.  This is the actual flow of what happens when you type a hostname in your computer and how your computer translates the hostname into an IP address to connect to that website.



Wednesday, October 22, 2014

Internet Basics - the IP Address

Eventually I'll add some posts on the history and beginning of the Internet but for now, I'm going to focus on the technical components of the Internet.  We will start with IP addresses.  What is an IP address?

If you think about a street address, it is a way to provide information to allow others to find you.  Here is an example from the US Post Office web site.
Name or attention line:JANE L MILLER
Company:MILLER ASSOCIATES
Delivery address:1960 W CHELSEA AVE STE 2006
City, state, ZIP Code:ALLENTOWN PA 18104






In the same manner, an IP address provides a way for computers to find each other to communicate.  We'll talk about different versions of IP addresses but for now, we'll focus on the original version called IPv4.  An IPv4 address is represented with 4 different decimal numbers in the range between 0-255 which are separated by dots.
Sample IP address: 74.125.224.72
These IP addresses are used by routers* on the Internet to direct traffic to the right computer and websites.  Every computer and web site on the Internet can be reached by an IP address.  In fact, if you go to the web browser and type the sample IP address above, that will reach the Google web site (at least as of the time of this article).

Now you have a high level understanding of how computers can communicate and find each other.  As a human though, to memorize the address of all the websites on the Internet would be enormously challenging if you had to do it by IP address.  In the next post, we'll talk about how the problem was addressed using the Domain Name System or DNS.


 * Routers are specialized devices that send data between networks.  For example, they make sure requests you make from your browser are sent to Google's web site to do a web search.  We'll provide later articles with more details.

Tuesday, October 21, 2014

Learn Internet Basics

This web site is set up to provide a basic overview of how the Internet works.  Eventually we may go into more advanced topics.  For anyone who is either curious or need to learn this for a job, we plan to write this in a simple and straightforward manner that even a novice can pick up.

Enjoy!