On a grid, there are
certain services that always need to be available and accessible to
clients. These services are vital to the applications running on the
grid and its ability to process SAS jobs. Examples include:
-
-
-
-
Platform Grid Management Service
-
web application tier components
Configuring a grid that
provides high availability for these services requires these components:
-
providing failover hosts for machines
that run critical applications. Using multiple machines for critical
functions eliminates a single point of failure for the grid.
-
providing a way to monitor the
high-availability applications on the grid and to automatically restart
a failed application on the same host or on a failover host if needed.
-
providing a method to let the client
know to connect to the failover host instead of the regular host.
This can be done through software (DNS resolution) or hardware (the
hardware load balancer), but only one is used.
In normal operations,
the following sequence takes place:
-
The client determines
that it needs to access a service on a machine in the grid.
-
The client sends a query
to the corporate DNS server. The DNS server looks up the address for
the machine and returns that information to the client.
-
The client uses the
address to connect to the machine and use the application.
To provide business
continuity for the application, a failover host must be provided for
the critical services running in the grid environment. This provides
an alternative location for running the critical services and ensures
that it remains available to the applications on the grid. In addition,
both the main and failover machines must have access to a shared file
server. This ensures that the application has access to the data required
for operation, regardless of which machine is running the service.
To provide business
continuity for the application, the failover capability must also
be automatic. EGO is configured to monitor any number of critical
services running on the grid. If it detects that the application has
failed or that the machine running it has gone down, it is configured
to start the application on the failover server automatically, which
enables applications to continue running on the grid.
However, once the application
has started on the failover server, the client must have a way to
know which server is running the application. There are two methods
for accomplishing this:
-
Using
a hardware load balancer. The load balancer serves as an intermediary
between the client and the services running on the grid, which decouples
the grid operation from the physical structure of the grid. When the
client wants to connect to the service, it connects to the load balancer,
which then directs the request to the machine that is running the
service. The load balancer knows the addresses of both the main and
failover machines, so it passes the request on to whichever of the
machines is running in the servers. During normal operation, the request
goes to the main machine. When failover occurs, EGO starts services
on the failover host, and the load balancer forwards connections to
it (because it is not the host running the services).
-
DNS resolution. Once EGO starts
the application on the failover server, it sends the address of the
failover machine to the corporate DNS server. The entry for the application
is updated in the server, so the next time a client requests a connection
to the application, the DNS server returns the address of the failover
machine.
If you do not want
EGO to directly update the corporate DNS, you can configure the DNS
server to always point to EGO to provide the IP address for the machine.
When EGO starts the application on the failover machine, it then points
to the new machine.
The choice of whether
to use a load balancer or a DNS solution depends on your organization’s
policies. Using DNS resolution prevents you from having to purchase
an addition piece of hardware (the load balancer). However, your organization’s
policies might prohibit either the corporate DNS from being changed
by an outside DNS (EGO) or DNS requests to be forwarded to an outside
DNS. If this is the case, the hardware load balancer provides a high-availability
solution.