Caveman's Blog

My commitment to learning.

Archive for the ‘Infrastructure’ Category

Load balancing a web application

with 4 comments


Load Balancing (LB) is the technique of trying to achieve an even distribution of a given load between the load bearers of a system. The goal in LB is to achieve scalability of the system with increasing load, thus improving the performance of the system as a whole. The most common use of this technique is in areas such as: Telecommunications, Web Servers, Database Servers, Avionics, Shipping Industry and Power Grids to name a few.

Introduction

When a user on a computer uses a browser to request a web page from a web server, the browser makes a call to the DNS Server to determine the IP address of the webserver, followed by the browser making a connection to the  webserver using that IP address to establish a connection for request of information. As web server receives many such requests, resulting in the increase of the load/traffic at the webserver. Boosting the server capabilities via adding more RAM and more computational power will be the first order of implementation for improving the server performance(response time). This kind of scaling is limited and alternate methods of improving performance came into existence.

DNS [2] round robin is a model that was one of the early strategies employed for load balancing web servers. This mechanism was based on the fact that several IP addresses can be assigned to one host name, meaning to say the web server traffic can be distributed between multiple IP addresses (computers). Caching of IP addresses by the DNS servers can lead to traffic distribution limitations in the DNS round robin. Lets say when a cached IP addressed computer goes down, this load distribution solution becomes ineffective. At that point the DNS server doest not know how to route the request.  This short coming has lead to the evolution another effective and scalable solution via Server Load Balancing (SLB). Server load balancing is important especially because of the unpredictable nature of the web traffic (number of requests).

Server Load Balancing (SLB)

High availability and scalability is the most important criteria to be kept in mind when designing a Enterprise (Web) Application solution. Fortunately SLB is able to provide scalability and availability to cater to the needs of ever increasing server load. Typically multiple web servers are employed to host a website so that load can be distributed evenly when one server gets swamped. A web farm environment is like one large virtual computer where the load balancer acts as a controller that knows which processing unit (web server) has to be delegated a pending client web request and then promptly sends web responses to that client. This environment is multi-server scenario where we may have a server in each state of US for example. Then when the load on one server is in excess of the configured capacity, the other servers step in to bear the brunt.

LoadBalancer

How a load balancer bears the load it is based on various models listed below:
1. RoundRobin. (All servers share load equally)
2. NLB (economical)
3. HLB (expensive but can scale up to 8192 servers)
4. Hybrid (of 2 and 3).
5. CLB (Component load balancer).

State Management

One of the shortcomings of HTTP is that it is stateless protocol. It works in a disconnected fashion, meaning to say the once the server processes and send a response to a client, the web server does not retain the identity of the client. Hence the necessity for a mechanism that can keep track of the client’s identity and the client specific data, called also as State. Implementing the most suitable mechanism of state management is one of the most challenging part while setting up a web farm. State can be stored in this environment at one or more of the following three places:

1. State Server – single point of failure
2. Database. – additional overhead in processing the web request
3. Web Server – unreliable because of load balancing

I personally like a hybrid approach where the session is stored both in the database and the web server. This is why this might be prudent approach; when a request is processed, the web server first checks its own cache for the session state information, if it is not found it will hit the database to re-establish the state for that request. This way you get the best of both worlds via saving the session very reliably and being able to access it in very fast and efficient manner. Any changes to the state during the processing of a request will be persisted to the database. Ofcourse, storing the encrypted session id in a cookie is necessary to identify the state between round trips to the web server by the client activity.

Note: A web garden is different from a web farm in the sense that, a web garden is a multi-processor setup. i.e., a single server (not like the multi server above).

References:

1. Wikipedia – Load Balancing
2. Domain Name Server – DNS

Advertisements