Learn Internet Basics

Wednesday, March 11, 2015

IP Protocol, Routers, and Routing

If you have been reading all the posts, you are ready to get deeper into what happens behind the scenes on the Internet. We are going to dig into certain protocols starting with the Internet Protocol (IP). We will discuss how IP works to allow the transfer of information between computers on the Internet.

In our prior post, we talked about the IP address but how does the Internet Protocol itself work? As an address system, it is used to route data to the right computers and servers. An important concept to understand IP is to understand how data is sent.

IP Packets

Data across the Internet are sent in packets of data. We won't get into the details until another post but let's cover the basic concept with a mail example again. The US Postal system for example will only take packages and envelopes of certain sizes. The Internet is a bit more strict about its envelope as in general there are not different sizes supported.

So, let's say the US Postal system only takes a single envelope with a single sheet of paper of a standard size inserted into it. If you have a lot to say or a big book to send, you will end up using a lot of envelopes to send the entire communication through. With the Internet, these "envelopes" are network packets which contain data. These are more specifically, IP packets. When downloading a web page there are multiple IP packets that must be sent to download the entire page.

Just like the US Postal system, IP requires a "to" and "from" address to know where to send the letter and from where the letter originated from. As already discussed, this is the IP address for IP packets. We will get into details of the IP packet later but just know that it contains the "to" and "from" IP addresses along with the data (limited in size) that you need to send.

Routers

Routers are the specialized devices that send these IP packets to the right destination. They basically route the packet from you browser to Google's web server for example. You can see examples of Cisco routers below (hopefully not too dated by the time you read this) though in data centers, there would be a lot of cables plugged into each.

Examples of Cisco routers

Across the Internet, there are routers. For example, your Internet provider has routers within its network which in turn are connected to other routers which will eventually be connected to the router connected to Google's web server. This is how computers all over the world are connected through this web of routers.

When a router is put into the Internet, it needs to be connected to other routers but in order to know how to route IP packets, it needs to be configured. Each router will have a routing table. The router will use this table to inspect the destination IP address and either send it to a server it is directly connected to OR to forward the packet to the next router to get closer to the final destination. The following is a simple diagram.

Routers on the Internet forward data from your browser to destination servers and vice versa. Grey arrows represent possible "next router" that packets can be forwarded to. Green arrow shows the path that routers actually chose based on their routing table (note there are typically many more router "hops" between your browser and a web server).

Summary

Data sent across the Internet are done via IP packets which have both a source and destination IP address which routers utilize along with their routing tables to send the packet to the proper destination server. For those who are more technical, I know a lot of detail is missing which we will dig into as we get more sophisticated.

In our next posts, we will likely cover the TCP protocol.

Tuesday, October 28, 2014

Internet Basics - HTML (HyperText Markup Language)

In the prior post, we explained how your computer finds an IP address for a website through DNS resolution. After that, the browser connects to the web server and loads the web page. We will save the whole process of explaining how the connection is set up for a later post. In this one, we explain HyperText Markup Language (HTML) which is used to create web pages and allow browsers to display web pages on your browser.

HTML is a language that will give a browser instructions on how to construct and display a web page. You can view the HTML for any web page by choosing the view page source option on your browser. What you will see is basically a text file that is written in HTML.

A key element for HTML are tags. These are used to mark certain parts of the HTML page which can then be read by the browser to display properly on your screen. For example, the tag <b> will specify an area that should be made bold. Most tags have an opening and closing tag which is the same tag with a slash </b>. So you may find this in HTML: <b>Everything here is in bold</b>.

A very common tag is for loading images. <img src="http://www.infofornyc.com/images/infofornyc_logo.png"> tells the browser to load the image located at the URL http://www.infofornyc.com/images/infofornyc_logo.png.

URL stands for Uniform Resource Locator which points to a specific "resource". A resource could be an image, a web page, a video, and more. A typical URL will have the following:

Protocol - http - the most common for web is http though there are others.
Domain name - www.infofornyc.com in this example. We discussed domain in the last post.
Port number - this is generally assumed to be 80 and usually not explicitly displayed on the web browser for http
Path - /images/infofornyc_logo.png - everything after the domain which helps identify the location of the content (look in the images folder for the infofornyc_logo.png image)

There are some additional parts, but I'll save that for a later in-depth look at the URL.

HTML has a lot of different elements besides image tags and bold tags but I'll also save that for a more detailed post later. For the novice, just remember that HTML is sent by the web server to your browser as a text file. Your browser will read the text file and load the content to display the page on your computer.

To finish out the example flow from our last post where www.google.com is placed into the browser, once DNS resolution is complete, the below occurs.

Browser connects to the www.google.com web server at the IP address provided and makes a request for http://www.google.com.

The Google web server delivers a HTML page to the browser.

The browser reads the HTML page and displays content and identifies additional content to load.

If the HTML page has additional content to load such as images (e.g., https://www.google.com/images/srpr/logo11w.png), the browser will request the content from the web server.

The web server delivers all the additional content. Steps 4 and 5 repeat until all content referenced in the HTML page are downloaded.

The browser displays the full web page.

Congratulations! If you have read all the posts up until now, you should have a solid foundation on understanding how the Internet works. We'll go into some more advanced technical topics and dive deeper into some concepts already introduced. We will likely start by explaining some of the key protocols such as IP (and how routers function), TCP, UDP, and HTTP.

Monday, October 27, 2014

Internet Basics - DNS Resolution

In this post, we will walk through the DNS resolution process. This is the process in which a domain name is translated into an IP address so your computer knows how to connect to a web server. This will be our first post with a diagram too!

Terminology

Let's start with some terms before diving into the diagram and process for DNS resolution.

Resolver - these are computers typically set up by your Internet Service Provider (ISP is the company giving you Internet service like Verizon or your Cable provider). Resolvers will find out the IP address for the website you are trying to connect with. Your computer or router are set up to point to a resolver's IP address.

Root Servers - these are servers that are designated by the Internet Assigned Numbers Authority (IANA*) to assign management of the top level domains. A top level domain is the right most part of a hostname such as .com, .net, .org, and so on. You can see a listing here of the companies or organizations responsible for top level domain management. For example, as of the date of this post, Verisign is the company that manages the .com top level domain.

The company responsible for each top level domain runs the Authoritative Name Server for that top level domain.

Authoritative Name Server - this is a Name Server that can provide the definite answer for a particular DNS zone. A DNS zone is the domain that a single company has been given management authority over. In the prior example, Verisign runs and manages the Authoritative Name Server for .com. Individual companies will typically manage their own Authoritative Name Server. For example, Google will manage the Authoritative Name Server for Google.com.

Zone File - what it means to manage a zone is to manage a zone file. This Zone file is basically a text file that typically has multiple DNS records on it. An example of a DNS record is a mapping of a domain name to an IP Address. This is called an A record which stands for address record.

A simplified example: "www.google.com. A 74.125.224.72"

Ok, enough terminology for now. It's a lot but hopefully you didn't get lost. We'll solidify the concepts with the diagram below.

DNS Resolution Process

The following diagram outlines the DNS process.

Let's say you want to go to www.google.com to run a web search. You type in www.google.com in your browser and hit enter.

Your computer or your router has a setting to ask your resolver (typically one set up by your ISP) "what is the IP address for www.google.com?"

Your resolver will have to look up where to find the answer**. The resolver will be configured to ask the root DNS server "what is the IP address for www.google.com?"
The root DNS server will respond that it does not know but it will tell the resolver the IP address for the .com Authoritative Name Server to ask the same question.
Now the resolver asks the .com Authoritative Name Server "what is the IP address for www.google.com?"
The .com Authoritative Name Server will respond that it does not know either but it will tell the resolver the IP address for the Google.com Authoritative Name Server to ask the same question.
Next the resolver asks the Google.com Authoritative Name server "what is the IP address for www.google.com?" Since it is Authoritative for Google.com, the Google.com Name Server will have a Zone file on it with an A record: "www.google.com. A 74.125.224.72"
The Google.com name server will then respond to the resolver that "the IP address for www.google.com is 74.125.224.72".
Finally, the resolver will tell your browser, "the IP address for www.google.com is 74.125.224.72".

Ok, that is a lot there, and I gave you the long version of DNS resolution so you understand the whole process. Generally it is much quicker due to caching (storing the mapping of domain name to IP address to eliminate repeating all the steps every time). We will plan on getting more technically detailed in later posts.

Note at this point, we haven't even contacted the Google web server to actually download the www.google.com page so you can run a search.

Next in our series we will discuss HTML to walk through the next stage where a browser connects with a web server to load a web page.

Footnotes:

* IANA is a non-profit organization responsible for handling this to be

** Typically your resolver will have the answer stored since it is a common query and by storing the answer for a period of time, it will cut down on the speed to do DNS resolution

Thursday, October 23, 2014

Internet Basics - Domain Name System (DNS)

In my previous entry, we discussed IP Addresses and how difficult it would be for us to memorize so many different numbers to get to our favorite websites. The Domain Name System (DNS) was developed to make this easier for us. With the DNS system people can easily remember their favorite websites and still allow computers to communicate with each other using their assigned IP address.

Back to a previous example using mailing addresses, it can be similarly difficult to memorize the addresses (or phone numbers) of all your friends and family. To assist with that, we now store contacts on our phones to easily look up the address or phone number by people's names. Though not quite popular anymore, there are still phonebooks out there or online directories where you can search for people's addresses and phone numbers by their names.

In the same way, DNS is a phone book for the Internet. It will translate domain names into IP addresses.

Domain Names

To understand DNS, we first need to understand Domain Names. Let's go through some terminology.

A Domain Name is a unique name to identify a website and other Internet resources. For example google.com is a domain name.

You will also hear the term subdomains which are just domain names that are part of a larger domain. Some subdomains of google.com include news.google.com or translate.google.com.

A hostname is the full domain of the website you are going to. These full names are called Fully Qualified Domain Names.

So all the examples above are domain names: google.com, news.google.com, and translate.google.com.

News.google.com and translate.google.com are subdomains of google.com. They are also both hostnames as they bring you to the Google news site and translation site respectively.

Still with me? Comment below if you have questions but let's finish up DNS.

DNS

So DNS was set up to translate these domain names to IP addresses and acts similar to a phonebook or contacts list on your phone. It's easier to remember your friend's name than their phone number. In order to map these domains to IP addresses, someone must assign the address to the domain.

Each company typically manages their own "phonebook" or "contacts list". Let's take Google as an example. Google is assigned a set of IP addresses that they can use to put their web servers on the Internet. Google may add or remove domains frequently so it's best if the manage the assignment of their IPs to their domain names.

The management of these assignments of IP to domain names is typically done on specialized servers called name servers. We will go more into name servers in a later post so don't worry if you don't quite understand that part yet.

Summary

To summarize the above briefly:

DNS is a system to map domain names to IP addresses
DNS makes it easy for humans to remember web site hostnames but still allow computers to communicate each other using assigned IP addresses

In our next post we will go through the DNS Resolution process. This is the actual flow of what happens when you type a hostname in your computer and how your computer translates the hostname into an IP address to connect to that website.

Wednesday, October 22, 2014

Internet Basics - the IP Address

Eventually I'll add some posts on the history and beginning of the Internet but for now, I'm going to focus on the technical components of the Internet. We will start with IP addresses. What is an IP address?

If you think about a street address, it is a way to provide information to allow others to find you. Here is an example from the US Post Office web site.

Name or attention line: JANE L MILLER

Company: MILLER ASSOCIATES

Delivery address: 1960 W CHELSEA AVE STE 2006

City, state, ZIP Code: ALLENTOWN PA 18104

In the same manner, an IP address provides a way for computers to find each other to communicate. We'll talk about different versions of IP addresses but for now, we'll focus on the original version called IPv4. An IPv4 address is represented with 4 different decimal numbers in the range between 0-255 which are separated by dots.

Sample IP address: 74.125.224.72

These IP addresses are used by routers* on the Internet to direct traffic to the right computer and websites. Every computer and web site on the Internet can be reached by an IP address. In fact, if you go to the web browser and type the sample IP address above, that will reach the Google web site (at least as of the time of this article).

Now you have a high level understanding of how computers can communicate and find each other. As a human though, to memorize the address of all the websites on the Internet would be enormously challenging if you had to do it by IP address. In the next post, we'll talk about how the problem was addressed using the Domain Name System or DNS.

* Routers are specialized devices that send data between networks. For example, they make sure requests you make from your browser are sent to Google's web site to do a web search. We'll provide later articles with more details.

Tuesday, October 21, 2014