How to Configure a Domino Cluster Ted Hardenburgh

How to Configure a Domino Cluster
How to Configure
a Domino Cluster
Ted Hardenburgh
E-mail and calendaring — for many companies, these applications are
the lifeblood of the enterprise. Users expect these and other
applications to be always available, like water from a tap or electricity
from the nearest wall outlet. Fortunately, Domino server clustering
provides high availability and load balancing so our users can get to
their e-mail and other critical applications 24 hours a day, 7 days a
week.
Ted Hardenburgh is the
senior Notes and Domino
Administrator for Apria
Healthcare in Lake Forest,
CA. Ted is a Principal CLP
in R5 Administration and a
CLP in R5 Development. He
has been involved with
Lotus Notes and Domino
since release 4. His recent
projects include iNotes Web
Access LearningSpace and
implementations. He can be
reached at
[email protected].
(complete bio appears on page 26)
For many Domino administrators, however, the word clustering
suggests a long, arduous process involving endless tinkering with
hardware and software to get the system up and running. It’s clear from
conversations with my fellow administrators that there is often some
trepidation prior to creating a cluster.
This article’s mission is to dispel any confusion or concern that you
may have about clustering. It provides the basic information that you
need to take advantage of Domino clustering — covering what
clustering is, what’s required to set up clustering in a Domino
environment, and the tasks and databases that are used in Domino
clustering. Then it guides you through the implementation of a Domino
cluster, step-by-step, from initial software installation to final
configuration.
Along the way, I’ll provide some insight on how to identify the
databases that you need to enable for cluster support, as well as those
databases that might not need to be enabled. You will learn all about
failover: how it works and doesn’t work, how to set up mail-routing
failover, and what you need to know about failover of calendaring and
scheduling lookups. You’ll also discover how load balancing works and
what you can do to configure it properly. Finally, I’ll introduce you to
No portion of this publication may be reproduced without written consent.
3
THE VIEW September/October 2002
some tools that you can use for monitoring and
managing Domino clusters in an R5 environment.
By the time you’re finished reading this article,
you should be ready to implement Domino clustering
in your own environment.
What Is Domino Clustering?
A Domino cluster is a group of two to six servers
containing database replicas that are kept in near realtime synchronization across the clustered servers; the
servers act in concert to provide failover and load
balancing support. (For an explanation of failover
support in Domino clusters, see the sidebar on page 7;
for an explanation of load balancing, see the sidebar
on page 9.)
If you’re familiar with Domino’s replication task,
which keeps Notes databases up-to-date on a scheduled or on-demand basis, you can think of Domino
clustering as replication on steroids. Technically
speaking, Domino clustering is event-driven replication, which means that when an event occurs in a
cluster-enabled database, that event is queued for
replication to the other replicas of the database in the
cluster.
Event-driven replication is what makes high
availability and failover possible. In a Domino server
cluster, if one server becomes unavailable for any
reason, another server in the cluster will pick up the
load for it, and do so in a way that is, for the most
part, seamless to the Notes client user.1 A user who
is accessing a database on one server in the cluster
(Server A) does not notice when requests to that
server are redirected to another server in the cluster
(Server B), because the replica on Server B is synchronized with the replica on Server A. However,
1
4
On the client side, Domino clustering only supports requests from
Notes clients. Web clients cannot use the services provided by
clustering. To provide clustering support to Web clients, use Internet
Cluster Manager (ICM).
there are some events that prevent failover from
happening seamlessly. (See the Failover sidebar on
page 7.)
In most settings, Domino clustering is used to
provide high availability, failover support, and load
balancing for messaging, though many installations
use it to provide support to other applications. For
example, the primary use of clustering where I work
has been for e-mail, but we have also implemented
clustering to support an application whose users
require 24-hour availability.
Clustering was first released to the user community as part of the Domino Server R4.5 and R4.6
Advanced Services. Now it is part of the Domino
Enterprise Server of R5, which requires a different
license from the one used for a standard Domino Mail
or Application Server.
For an in-depth look at Domino clustering technology, including how it provides failover while
scaling to accommodate growing workloads and
numbers of users, see “Flexing Domino’s Clustering
Muscles,” by James Grigsby, THE VIEW (September/
October 1997).
Operating System (OS) Clustering
Compared with Domino Clustering
Many operating systems support clustering, either
natively or through third-party support. The most
commonly used clustering support may be that provided by Windows NT and Windows 2000. With
Windows NT 4, a cluster can have two nodes or
servers, but they can’t both be up at the same time.
This is called an active-passive cluster. This type of
cluster shares the same disk structure and is only
useful for failover. Load balancing is not supported.
With Windows 2000 Advanced Server, the ability to
have active-active clusters has been added, so each of
the two servers that are up and running can compensate for the other server’s workload in addition to its
own workload; however, they still use the same
shared disk structure. One reason to cluster servers is
www.eVIEW.com
©2002 THE VIEW. All rights reserved.
How to Configure a Domino Cluster
to avoid single points of failure. For example, while
the server may be protected from failover, the disk
structure remains exposed to a failure of a Redundant
Array of Inexpensive Drives (RAID) adapter. Also,
both types of clusters require dedicated network links
between the node servers. While not a disadvantage,
there is an additional hardware requirement.
Domino clustering is done at the application
level, not at the operating-system level. A Domino
cluster can have up to six nodes. Domino clusters are
of the active-active type, so they can support both
failover and load balancing. They are operatingsystem agnostic, meaning that the nodes in the cluster
can have different operating systems. Domino clusters also do not require a dedicated connection between cluster nodes; you can even have a Domino
cluster that is connected over a wide area network
(WAN), if desired.
across multiple servers in the cluster, so that if
a user’s home server becomes unavailable, the
user’s Notes client will be automatically redirected to another node in the cluster. For example, clustering can be a vehicle for meeting a
Service Level Agreement that mandates 24-hour
availability for e-mail.
•
Scalability: Domino clustering enables you to
tune servers to support a certain number of users,
so your application performance doesn’t degrade
as demand for it increases. You can think of load
balancing as planned failover.
•
Disaster preparedness: Domino clustering can
play an important part of a company’s disaster
recovery plan. A cluster node can be located in
an off-site location connected via WAN. The
server can be configured so that the data on it
remains synchronized with data on other cluster
nodes; users can only access the off-site server in
the event of an emergency.
In choosing whether to use OS clustering or
Domino clustering, you will need to consider your
company’s environment.
It is possible and sometimes desirable to run a
Domino cluster on top of an operating-system cluster,
but that topic is outside the scope of this article. For
an article that discusses this kind of clustering, see
“From High Availability to Disaster Tolerance:
Layering Domino Clusters on Wolfpack,” by Peter
Eggimann, THE VIEW (March/April 1998).
Why Cluster Your
Domino Servers?
There are several reasons why you might want to take
advantage of clustering:
•
High availability: If you have users who require
24-hour a day access to their application, you can
provide it using clustered servers (barring a disaster that affects all the servers in the cluster, of
course). You can put replicas of the application
What’s Needed for Clustering?
In setting up a Domino cluster, the first step is to
obtain a Domino R5 Enterprise Server license. This
license gives you the necessary access rights to install
Domino’s Advanced Services: clustering, partitioning, and billing.
Hardware Requirements
Because you can have separate hardware and operating systems, the hardware in your Domino cluster
may not need to be exactly the same for every server.
For example, the hardware in a cluster that includes a
Windows NT server and IBM iSeries server would
hardly be similar. What is necessary is that you have
hardware that can support the additional load that
either of these servers may be put under if one of the
other nodes fails for any reason.
No portion of this publication may be reproduced without written consent.
5
THE VIEW September/October 2002
In determining your hardware needs for any
single cluster node, I recommend working on the
assumption that one of the other nodes will have a
complete failure that will result in downtime measured in days, not minutes. You may think, “I’ve
never had a failure like that!” But remember, just
because you haven’t experienced a serious failure yet
doesn’t mean that you never will. Recall that hardware is measured with something called “Mean Time
Between Failure.” If the people that built the server
know that it will fail at some point, we administrators
should know that it will, too. No one wants to be the
person who gets a rude awakening early one morning
when the CEO is ranting about not being able to read
this morning’s e-mail because a cluster node went
down and the other node couldn’t handle the additional workload. At my company, we once experienced a severe hardware outage — I can tell you that
being prepared for this scenario certainly beats being
unprepared!
When calculating hardware needs, be conservative. Determine the number of users that the server
will support and increase that by half. That’s the
number to shoot for when configuring the server. For
example, if there are two servers in your cluster, and
there are 1000 users in the Domino domain, that’s
500 users per server. The server you configure will
need to handle at least 750 users to prepare for
failover. Talk to your hardware supplier or check out
the sizing guides that are provided for many platforms. (Most of these guides are fairly conservative
in the recommendations that they make.)
Server Requirements
In addition to obtaining the license and your hardware, you need to ensure that all servers in the cluster
meet the following requirements. They must:
•
•
6
•
Be in the same Notes Named Network
•
Have AdminP running (for automation of cluster
tasks)
Ideally, all servers in a cluster will be on the same
release of Domino, but it’s possible to have one node
on R4.6 and another node on R5; for example, you
might have to mix Domino releases in a Domino
cluster because you’re migrating between releases or
upgrading and changing hardware.
Important!
A server can be in only one Domino cluster at a
time, but you can have as many clusters as you
want in your Domino domain.
Other Considerations
In addition to hardware and server requirements,
you’ll want to consider the usual performance enhancements, such as Transaction Logging, the R5
on-disk structure (ODS) for databases, and multiple
MAIL.BOX databases.
In larger installations, you need to consider the
limit for databases that can be clustered. According
to Lotus, the number is between 4000 and 8000 databases per cluster, depending on the size of the databases. The largest cluster that I’ve ever worked with
had over 2500 clustered databases.
Clustering — Essential Tasks
and Databases
Be in the same Domino domain
Use TCP/IP for server-to-server communication
(clients can communicate with the servers using
any supported protocol)
The more you know about Domino clusters and how
they work, the better able you are to maximize
failover and load balancing. This section familiarizes
you with the inner workings of a Domino cluster.
www.eVIEW.com
©2002 THE VIEW. All rights reserved.
How to Configure a Domino Cluster
Failover
Failover is the redirection of a request from one
server to another server when the servicing
server has failed to respond. It is a means for
providing high availability of applications and, as
such, is perhaps the primary reason for setting up
a cluster.
Failover is triggered when a client attempts a
request to a database that is not available.
Unavailability can be caused at the server or
the client.
Server-side triggers of failover include:
•
The Domino server is not available due to
hardware, software, or network issues.
•
The Domino server is in a restricted state
(that is, Server_Restricted is not 0).
•
The Domino server is busy.
•
The Domino server has reached the
maximum number of user sessions as set by
the notes.ini parameter Server_MaxUsers.
•
The database is unavailable because it has
been marked Out of service or Pending
delete.
A Notes client attempting to do one of the
following can trigger failover:
•
Opening a database, either from a bookmark,
Workspace icon, document link, view link,
database link, or by using one of the following
programming methods:
- Notes Formula
@Command([FileOpenDatabase])
- LotusScript db.openwithfailover. (Using
db.open does not initiate failover.)
•
Initiating replication from a workstation.
•
Performing an operation relating to the user’s
home server, such as:
- Name resolution
- Compose mail
- Send mail
- Server lookups
•
Executing an agent that sends mail (triggers
failover if all the Domino servers in the cluster
are configured for mail transfer and delivery
failover).
Events that do not trigger failover:
•
A server becoming unavailable while the
database is open.
However, exiting the database and
attempting to reopen it will trigger failover.
•
A server becoming unavailable while a user is
editing a document.
Exiting will result in unsaved data being lost.
A user may want to copy what she was
working on to the Clipboard in order to save it
temporarily.
•
Using the command Tools > Open a
Calendar. This operation uses
@Command([OpenCalendar]), which is not
cluster-aware. There is no “open with
failover” option for calendars.
•
Selecting a database icon or bookmark and
choosing Properties, Access Control, Open,
or New Copy from the Database menu.
•
Using the command View > Go To.
•
Replicating with a server that is restricted or
that has met its MaxUser or Session settings,
or replicating with a database that is marked
Out of service. Replication will occur
regardless of these restrictions, so failover is
not needed.
•
The router attempting to deliver mail and
routing failover is not configured, or cluster
replication is disabled on the mail server.
The mail will sit in MAIL.BOX pending
delivery or delivery timeout.
•
Using the server as a template server when
creating a database.
•
The router attempting to deliver mail, but the
MailClusterFailover is set to 0 in the notes.ini
file or disabled in a configuration document.
No portion of this publication may be reproduced without written consent.
7
THE VIEW September/October 2002
Figure 1
Server Tasks Related to Clustering
The Cluster Administration Task
The Cluster Administration task (CLADMIN) runs
automatically when a clustered server starts. This
task manages all of the other cluster components; it
starts the other tasks, adds and removes servers from
the cluster, and starts the AdminP task if it isn’t
running.
When a server is added to a cluster, the
CLADMIN task is started; it then starts the other
cluster tasks and adds these tasks to the ServerTasks
setting in the notes.ini file. Once the CLADMIN
task has completed its job, it ends itself.
it cannot be launched separately like the AdminP or
Fixup tasks. Instead, it starts automatically when the
server is added to a cluster or when a change to a
cluster is detected in the Domino Directory. In the
Server Tasks window, it appears as one of the activities listed in the Database Server task. (See Figure 1,
which also shows some of the other Domino clusterrelated server tasks as they appear in this window.)
The Cluster Manager also does the following:
•
Handles database requests for failover or load
balancing from clients. (For an explanation of
how failover functions in a Domino cluster, see
the sidebar on page 12.)
•
Determines which servers are part of the cluster
by checking the Domino Directory for changes to
the server document and the clusters view.
•
Checks server availability and workload.
•
Communicates with other cluster nodes regarding
changes in cluster availability.
•
Logs failover and load balancing events to the
log.nsf file.
The Cluster Manager Task
The Cluster Manager task is responsible for taking
the pulse of all the servers in a cluster and maintaining a list of all those servers with their current status.
It runs on each server in the cluster, polling the other
servers using probes that are sent once a minute to
determine the other servers’ availability and
workload. Because it is part of the base server task,
8
www.eVIEW.com
©2002 THE VIEW. All rights reserved.
How to Configure a Domino Cluster
Figure 2
Cluster Database Directory
The Cluster Database Directory Task
and the Cluster Database Directory
When a cluster is created and a server joins the cluster, the Cluster Database Directory task (CLDBDIR)
is added to the ServerTasks setting in the notes.ini
file. This task creates a new database on the server
(the Cluster Database Directory, or cldbdir.nsf) and
populates it with documents for each database on the
server. CLDBDIR is started at server startup and
runs while the server is up. A replica of cldbdir.nsf
exists on every server in the cluster and is updated by
cluster replication so that if a database is added,
removed, or marked for maintenance, the other cluster members are immediately made aware of it for
failover and load balancing.
Cluster Manager task to determine failover routes
when a database or server is unavailable. Figure 2
shows the Cluster Database Directory displayed in the
Administrator client.
The Cluster Database Directory (cldbdir.nsf) is
the heart of a Domino cluster. It contains an entry for
each database on every member of the cluster, indicating whether the database is enabled for clustering,
and including the database’s name, file path, replica
ID, and its state in the cluster. (Cluster states include
Enabled, Disabled, Out of service, and Pending delete.) The Cluster Database Directory is used by the
No portion of this publication may be reproduced without written consent.
Load Balancing
Load balancing in a Domino cluster is the
redirection of a client to another node as a
means of distributing the workload of the
server. The server monitors its workload
(in terms of either response time or total
user sessions) and compares it against
settings in the notes.ini file or
configuration document to determine
whether it will accept new user sessions.
If the workload is too high, the server will
redirect client requests to another node in
the cluster. It’s important to understand
that in a Domino cluster, workload is
measured per server, not per database.
9
THE VIEW September/October 2002
The Cluster Replicator Task
If the Cluster Database Directory is the heart of your
Domino cluster, the Cluster Replicator task
(CLREPL) is its lifeblood. It is responsible for the
near real-time synchronization of the databases in the
cluster. CLREPL executes event-driven replication,
based on database activity, using the Cluster Database
Directory (cldbdir.nsf) to determine what databases
need to be replicated.2
Multiple CLREPL tasks can be run on a server
depending on the load. The Domino administrator
configures the number of tasks that may run, as discussed later in “Configuring a Domino Cluster.”
clubusy.nsf. Each server in the cluster has its own
replica of clubusy.nsf, which is updated through
cluster replication when any information for one of
its users changes.
Note!
If a user’s client fails over to a server that isn’t
his home server, clubusy.nsf won’t be updated
with any changes to his free time until his home
server is available again and the Schedule
Manager updates clubusy.nsf on that server. (The
Schedule Manager only updates a user’s free-time
information on the user’s home server.)
The Clustered Free-Time Database (clubusy.nsf)
In an R5 environment, clustered mail servers have a
clustered free-time-lookup database, clubusy.nsf.
This database allows any user in the cluster to look up
free-time information on another user without having
to make a call to that user’s home server (if it’s not
the same as the first user’s home server). It also
allows free-time lookups to continue if a user’s mail
server is unavailable; the lookups simply fail over to
another node. In R4.6, free-time lookups are not
clustered, so a lookup to a user on an unavailable
server will return the message “No Information.”
Note!
If a non-clustered server performs a free-time
search on a cluster server that goes down, the
search will not fail over to another cluster node.
Additional Cluster Databases
In addition to the Cluster Database Directory and the
clustered free-time database, Domino clusters use
documents contained by the Domino Directory, most
notably server documents. The Domino Directory is
where the initial configuration of the cluster is completed along with the Administration Request database. Once changes are made in the Domino Directory, AdminP requests are placed in the Administration Request database to be processed.
Finally, there is a client portion to clustering; it
isn’t all done on the server. Notes R4.5 and later
releases have a cluster cache file, cluster.ncf. (See
Figure 3.) This file is created when a client accesses
a clustered server for the first time. It lists the last
time the file was updated, the name of the cluster, and
Figure 3
Contents of cluster.ncf file
The clubusy.nsf database is created automatically
by the Schedule Manager task when a mail server is
added to a cluster. The Schedule Manager deletes the
existing busytime.nsf and creates a replica of
2
10
Standard replication is based on a scheduled connection document,
client demand, or some other external request.
www.eVIEW.com
©2002 THE VIEW. All rights reserved.
How to Configure a Domino Cluster
Note!
standing how a cluster works, let’s discuss configuring a Domino cluster.
The client must have successfully accessed a
clustered server and have a cluster.ncf file
generated before failover will work for the user.
Configuring a Domino Cluster
all the member servers of the cluster. When the
current server is unavailable, the Notes client looks in
this file for a server to which it can redirect its request. Without a cluster.ncf file, the client will not
know what servers it can try to fail over to. The role
of cluster.ncf is explained and illustrated in the
sidebar “How Failover Works” on page 12.
Now that you have the theoretical basis for underFigure 4
Before you create your Domino cluster, verify that
you have TCP/IP installed and configured on all
servers. Also ensure that you have Domain Name
System (DNS) or host file entries for all the servers in
the cluster.
If you haven’t already done so, install Advanced/
Enterprise Services for Domino, using your Domino
server installation software. (Remember to get the R5
Enterprise Server license so that you can access these
services.) Figure 4 shows the initial dialog box for the
Software Installation
No portion of this publication may be reproduced without written consent.
11
THE VIEW September/October 2002
How Failover Works
The diagram on the opposite page depicts the
failover process as it unfolds on a clustered
Domino server. Let’s go through the process
step-by-step:
1. The Notes client user clicks on the bookmark
to open her mail file on Server A.
2. Transparent to the user, the client gets the
“Server Not Responding” message.
3. The client looks at the cluster.ncf file to
determine what other servers are in the
cluster.
4. The client tries to access another server in
the cluster, Server B.
5. The Cluster Manager on Server B determines
the locations of replicas of the database and
directs the Notes client to a server containing
the replica. It performs the failover lookup in
one of two ways:
–
The primary lookup method is performed
by the replica ID of the requested
database. The Cluster Manager checks
the cldbdir.nsf file for another server that
has a database with the same replica ID.
–
If there are multiple replicas of the
database on a server, the Cluster
Manager matches the file path and
replica ID of the requested database.*
Note! The replicas may not be on this server.
Recall that you don’t need to have a
database replica on every cluster node. If the
database replica isn’t on this server, the client
is given the name of the server that has a
replica and the client will attempt to access
that server.
6. The client then accesses the server that has
the mail file; the server serves the file to the
user.
On the client, a new icon appears on the
Workspace for the replica database, and in
Bookmarks, the server name is added to the
Open Replica menu item when the user rightclicks on a bookmark or Workspace icon.
* It’s best to avoid having multiple replicas of a database on one server, since you cannot enable the database replicas for cluster
replication, because the Cluster Replicator task ignores selective replication formulas.
server software installation. You must select Domino
Enterprise Server on this dialog box. In some versions,
you may be prompted to select the pieces of the Enterprise license that you want to install.
After the server software has been installed, you
can start the Domino server. It must be up and running for the cluster to be configured. Now it’s time to
create the cluster. Follow these steps:
1. Open the Domino Administrator client, and select
the Configuration tab.
12
2. Select Clusters, and then select the All Server
Documents view.
3. Select the servers that you want to include in the
cluster, and then click the Add to Cluster button.
4. A dialog box like the one shown in Figure 5 now
prompts you to create a cluster or add to an existing cluster.
You’re creating a new cluster, so enter the name
for your cluster. (If you were adding a server to
www.eVIEW.com
©2002 THE VIEW. All rights reserved.
How to Configure a Domino Cluster
The user will continue to use the database on the
failover server even after the original server is
available again, unless she manually changes
back to the home server. This is standard Notes
client behavior — the user continues using the
last replica opened.
1
Database request
2 "Server Not
Responding" message
Requested Db
You can force users back to the first server by
making the failover server unavailable or by
setting it to busy. If you’re hoping to get users
back on their home server, this technique may
not work as expected in a cluster of more than
two servers.
cluster.ncf
3
Client checks
cluster.ncf
for another
clustered server
Server A
4
Database request
5
Name of server with
replica (Server C)
Notes client
6
cldbdir.nsf
Server B
Database request
Requested Db
Server C
Figure 5
Creating a Cluster
an existing cluster instead, you would simply
select the cluster from the list.) Give your cluster
a simple but descriptive name.
5. To add the servers to the cluster immediately,
click Yes at the prompt. To have the Administration Process add the servers to the cluster,
click No.
When the Domino Directory is replicated to all
the cluster nodes, the cluster is operative. When a
Domino server in the cluster reads the updates to its
server document and recognizes that it’s part of a
cluster, the following activities take place on each
clustered server:
No portion of this publication may be reproduced without written consent.
13
THE VIEW September/October 2002
•
The Cluster Administration and Cluster Manager
tasks start on the server.
•
The CLDBDIR and CLREPL tasks are added to
the ServerTasks setting in the notes.ini file so that
they start with the server. The tasks will also be
started at this point.
•
The line Server_Cluster_On=1 is added to the
notes.ini.
•
The AdminP task is started if needed. AdminP is
used by the cluster to update servers and server
documents and to work on databases, so it is
important that it be running.
•
The CLDBDIR task runs and adds documents for
every database on the server to the Cluster Database Directory and begins replicating with the
other cluster members.
•
The Cluster Manager sends probes to the other
servers in the cluster to determine their status,
and it informs each server of the status of the
other servers.
If any of your clustered servers are running multiple protocols on the server, you have an additional
step. Because clustering only works with the TCP/IP
protocol, you must tell the server which port to use by
adding two lines to the notes.ini or to the server’s
configuration document.
Server_Cluster_Default_Port=PortName
Server_Cluster_Probe_Port=PortName
PortName is the name of the TCPIP port listed in
the server document. Both entries can refer to the
same port or they can be configured separately for
cluster traffic and probes.
How Many Cluster Replication Tasks
Should You Run?
As you just saw, when you create a cluster, the
CLREPL task is automatically added to ServerTasks=
in the notes.ini file. You can run multiple instances
of this task by adding one entry for each instance to
14
the ServerTasks= line. Running multiple instances
helps to ensure that replication takes place more often
so that all database replicas are as up-to-date as
possible.
There are two different guidelines for determining the number of CLREPL tasks that should be run:
(1) Take the number of processors on your server,
subtract one, and that equals the number of CLREPL
tasks to run; or (2) Run one instance for each server
in the cluster with which the server is replicating. In
the second case, a four-server cluster would indicate
running three CLREPL tasks. Depending on your
server, you may be able to run more than three
CLREPL tasks in the short-term. At my company,
we normally run two CLREPL tasks, but we have
used up to three tasks on a dual processor AS/400 on
an as-needed basis with little negative impact.
Note!
You may need to run an additional Cluster
Replication task after a server has been down for
some time, to help get everything back in sync and
speed recovery after a crash.
When you decide you don’t need to run the
additional CLREPL task any longer, issue the Tell
CLREPL Quit command at the console. This ends
all CLREPL tasks, so you must start again each
CLREPL task that you normally run.
Private LANs for Cluster Traffic
Depending on your infrastructure, you may want to
consider setting up a private local area network
(LAN) for your cluster traffic to help improve response to client requests. To set up a private LAN
that separates cluster replication traffic from all other
traffic, follow these steps:
1. Have two network cards in each cluster node,
www.eVIEW.com
©2002 THE VIEW. All rights reserved.
How to Configure a Domino Cluster
Figure 6
Results of “Show Cluster” Command
with separate addresses assigned to each card and
a host name assigned to each secondary address.
2. In the Server document, create and enable a new
Notes port for each server.
3. In the Server Configuration document or the
notes.ini file, assign each Notes Network port to a
corresponding IP address by adding the following
lines. Below, I’ve defined two ports — one
named TCPIP, the other named CLUSTER:
TCPIP_TcpipAddress=0,xxx.xxx.xxx.xxx:1352
CLUSTER_TcpipAddress=0,xxx.xxx.xxx.xxx:1352
4. Add the following lines to the Server Configuration document or the notes.ini file:
You can also check the statistics
net.PORTNAME.BytesReceived and
net.PORTNAME.BytesSent to determine how much
data the CLREPL task is sending and to verify that
cluster replication is working on the correct port. Just
remember to change PORTNAME to the actual name
of your port.
Next Steps
For database requests to be redirected when the
current server isn’t available, the following two
conditions must be met:
•
The database must be enabled for clustering.
•
At least one replica of the database must exist on
another server in the cluster.
Server_Cluster_Default_Port=CLUSTER
Server_Cluster_Probe_Port=CLUSTER
5. Restart the server for the changes to be put in
place.
We’ll look next at how to decide which databases
to enable for clustering, which databases not to enable, and how to create replicas for cluster support.
6. To see if the changes have taken effect, enter the
Show Cluster command from the administrative
console, as shown in Figure 6. You’ll see the
port name listed in the data returned. Please note
that the port name only appears in the “show
cluster” information list if you’ve used the settings above. If a default cluster port is not defined, no port will appear in the list.
Enabling Databases
for Cluster Support
By default, Domino will enable every database and
template for clustering. I recommend that you change
No portion of this publication may be reproduced without written consent.
15
THE VIEW September/October 2002
Figure 7
Distribution of Mail Files in a Four-Server Cluster
a
b
c
b
c
d
c
d
a
d
a
b
Server 1
Server 2
Server 3
Server 4
1000
Mail Files
a = 1 - 250
b = 251 - 500
c = 501 - 750
d = 751 - 1000
the default so that no databases are initially clustered.
Once you have disabled all databases for clustering,
you can proceed to make considered decisions about
which databases to cluster on a case-by-case basis.
In the Cluster Database Directory, you can enable
or disable a database for clustering by simply selecting the database and choosing the action bar item,
Tools > Enable (or Disable) > Clustering. This action
wasn’t available in R4.6 and previous releases; however, you can add it to the database. Here’s how:
Open the database in Domino Designer and add an
action that calls an agent that changes the value of the
field Cluster Replicate to 1 for enabled or 0 for
disabled.
node the home server for half of your mail files. The
third server can then be designated as the “passive”
node — meaning that no users have it as their home
server. This server would have all the mail files, but
it would only be used as needed for backup and
failover. You might consider placing the passive
node in an off-site location as part of your disaster
recovery plan.
In a four-server mail cluster (for example,
Server 1, Server 2, Server 3, and Server 4), you could
split your mail files into fourths (let’s call them a,b,c,
and d) and put three of these fourths (but never the
same three!) on each server, like so:
•
a-b-c on Server 1.
•
b-c-d on Server 2.
Mail Databases
•
c-d-a on Server 3.
In a two-server mail cluster, it’s a safe bet that all the
mail files will be on both nodes; however, as you add
servers to the cluster, you should start thinking about
file placement. More than likely, you don’t need a
replica of every mail file on every node.
•
d-a-b on Server 4.
In a three-server mail cluster, for example, you
might designate two nodes as “active,” making each
16
Figure 7 illustrates how the mail files are
distributed across the servers.
In this way, you do not have all mail files on all
servers, but every mail file has two nodes available
for failover.
www.eVIEW.com
©2002 THE VIEW. All rights reserved.
How to Configure a Domino Cluster
There are many ways to think about splitting data.
You’ll need to take your organization’s needs into
account to determine what’s best.
server outage, there will be replicas of
clubusy.nsf that do not contain correct free-time
information for some users. If a Resource Reservations database is clustered, the potential for
double bookings is then increased as some bookings will be based on incorrect information.
Databases That Must Be Clustered
Some databases must be clustered — two in
particular:
•
clubusy.nsf — The clustered free-time database
Disabling the cluster status of this database on
one cluster server effectively disables free-time
searches of users on other cluster nodes. Once
cluster replication is disabled on one replica of
the clubusy.nsf, all other replicas essentially have
bad data. Any free-time lookups done for users
on the disabled server will not be correct. It is the
same behavior you would see in a non-clustered
network if the server itself was unavailable for
free-time lookups.
Databases That Should Not Be Clustered
There are some databases that you won’t want to
cluster:
•
•
Directory catalogs and other server-specific
“internal” databases — This category includes
statrep.nsf, events4.nsf, catalog.nsf, and the like.
cldbdir.nsf — The Cluster Database Directory
If the Cluster Database Directory is not enabled
for clustering, failover and load balancing may
not work properly due to database information
not being up-to-date.
•
•
MAIL.BOX — Clustering MAIL.BOX does not
create mail-routing failover. It will likely only
cause routing problems and not gain you anything. You need to set up mail-routing failover
separately. (I explain how to set up mail-routing
failover later, in the section “Configuring Mail
Routing Failover” on page 18.)
Resource Reservations databases — Because
the free-time database (clubusy.nsf) only gets
updated by a user’s home server, in the event of a
Working with Database Replicas
The easiest way to create database replicas is by using
the Administrator client and AdminP. Follow these
steps:
1. Go to the Files tab of the Domino Administrator
client and select the database or databases that
you want to replicate.
2. From the Tools > Database Task list on the righthand side of the screen, select the Create Replicas
task.
3. Open the Create Replicas dialog box to display
only cluster members. Select the servers on
which you want to create a replica of the selected
database and click OK.
4. The Administration Process will check the
server’s access rights to create a replica and add
the replica stub to the destination servers. It will
take a couple of cycles through AdminP to complete this step; you may want to manually replicate the AdminP databases to expedite this
process.
5. Initiate a replication session between the servers
to initialize the replica stub, or create a connection document that replicates between the servers
and wait for the scheduled replication to take
place. The Cluster Replication task will not
initialize a stub, so it won’t start replication until
the Replica task has done its job.
No portion of this publication may be reproduced without written consent.
17
THE VIEW September/October 2002
Note!
Do not create multiple replicas of a database on
the same server. You will not be able to enable
the database replicas for clustering, since the
Cluster Replica task ignores selective replication
formulas.
To enable mail routing failover, do one of the
following:
•
MailClusterFailover=1
•
Why Scheduled Replication Is Necessary
Even though the Cluster Replica task runs on every
clustered server, each server in the cluster still needs
the Replication task (REPLICA), because not all
databases on the server will necessarily be clustered.
It’s also helpful for keeping the clustered databases
up-to-date as well, especially after a server crash.
Cluster replication works on events that are written to memory, not on the database. This means that
if the server goes down, any pending cluster replication activity is lost. This is another reason to have
scheduled replication for all servers in your cluster.
In case of a server outage, standard replication will
retrieve any lost events and update all replicas in the
cluster.
Determine what schedule works best in your
domain: perhaps it’s once an hour, or only once every
few hours, but it should be at least once a day. At our
company, for example, we replicate between three
servers every two hours, alternating between pairs of
servers. Servers 1 and 2 replicate on even hours, and
servers 1 and 3 replicate on odd hours.
Configuring Mail Routing Failover
Failover for mail routing provides complete redundancy for your mail users. If a user’s home server is
unavailable, the mail routed to that user will be delivered to one of the cluster nodes that has a replica of
the mail file on it.
18
For R4.6x servers: Add the following line to the
notes.ini:
For R5 servers: Edit the configuration document
for the server(s) that you want to update and go to
the Router/SMTP / Advanced / Controls tab.
(See Figure 8.) Select one of the following
options for “Cluster failover”:
–
Enabled for last hop only (the default) —
This is the hop to the user’s home server.
–
Enabled for all transfers in this domain —
If any clustered server in the delivery route
is unavailable, the router will find another
server.
–
Disabled — Prevents mail routing failover.
To be useful, mail routing failover needs to be set
up on every clustered server in the domain. You can
use a global server configuration to do this, or if you
prefer, use individual or group configuration documents. I suggest creating a server group for your
cluster and using one configuration document for this
group. Be sure to use a group name that is different
from the cluster name to avoid confusion.
Configuring Load Balancing
For automatic workload balancing to work, each
server in the cluster must be made aware of its own
workload and availability as well as the availability of
the other servers in the cluster. Each server computes
its Server.Availability.Index (SAI) once per minute.
The SAI starts at 100 and is reduced, depending on
how the server responds to Remote Procedure calls
(RPC) made to it. The longer the response time, the
lower the availability index. A value of 100 indicates
www.eVIEW.com
©2002 THE VIEW. All rights reserved.
How to Configure a Domino Cluster
Figure 8
Configuring Mail Routing Failover
that the server is performing optimally; 0 indicates
that the server is completely unavailable. It’s the job
of the Cluster Manager task to keep all the servers in
the cluster aware of each other’s workload and availability. The Cluster Manager sends out probes to all
other servers in the cluster to determine availability.
Important!
Don’t wait until a server’s index is 0 before
redirecting requests to other servers. That would
mean that users would be experiencing slower and
slower response times as the server’s index is
dropping. Instead, redirect the workload to a
server with a high SAI before response time has
become unacceptable.
Now, let’s learn how to configure load balancing
to achieve the best possible response times across all
servers in the cluster.
The Server_Availability_Threshold Setting
To configure load balancing, use the
Server_Availability_Threshold setting (SAT). When
the Server.Availability.Index (SAI) exceeds the SAT,
new client requests are redirected to other servers in
the cluster. The Cluster Manager redirects requests to
the most available server that has a replica of the
database. If other servers are less available than the
originally requested server, the original server serves
the request.
So, for example, if a server’s SAT is 90 and its
SAI drops from 91 to 89, requests to this server will
be redirected. When the SAI improves and is once
again greater than the SAT value, new requests are
accepted.
The SAT is set on a server-by-server basis in the
server configuration document or the notes.ini file. To
set the SAT, add or modify the following line, where
n= 0 - 99:
No portion of this publication may be reproduced without written consent.
Server_Availability_Threshold=n
19
THE VIEW September/October 2002
To determine which Server_Availability settings
to use, monitor your servers for peak usage and check
your load balancing settings. Look at the peak user
and SAI statistics. Don’t forget to obtain feedback
from users about how they perceive the databases to
be working.
Start the SAT at a number around 95 and work
down until you begin observing unacceptable response times. SAI is a logarithmic function and not a
straight percentage. For example, an SAI value of 90
indicates that the server is taking 10 times longer to
respond to a request than in optimum conditions (that
is, SAI=100). SAI only measures the server response,
not network or client-side issues; therefore, your
users may be experiencing slower performance than
what is indicated by the SAI statistic.
One thing to keep in mind is that setting the SAT
value too high can cause failover to happen before it
really needs to happen.
be allowed to remain on the server, but they will not
be able to open another database.
Note!
The Server_MaxUsers setting applies only to
active user sessions; it doesn’t affect server-toserver communication, so cluster replication will
continue. Also, users who do not have a database
open are not counted when setting the maximum
number of users permitted to access the server.
Here are some caveats when using the
Server_MaxUsers setting:
•
Users may get a “Server Not Responding” error if
failover doesn’t occur. This behavior is different
from what happens when the SAT setting is
reached!
•
Setting Server_MaxUsers to 0 in the notes.ini file
means there is no limit to user sessions; it does
not mean that 0 users are permitted!
•
Avoid using the notes.ini setting
Server_MaxSessions because it restricts serverto-server sessions as well as client sessions.
Note!
The Server_Availability_Threshold setting should
be different on each server in your cluster.
This is especially true when you have different
platforms.
Important!
The Server_MaxUsers Setting
Another way to measure workload and configure load
balancing is purely in terms of the number of users on
the server. Use the Server_MaxUsers setting to set
the maximum number of client sessions that are
permitted to access the server. Setting the maximum
number of users to 1 effectively causes failover to
other nodes, since the server would reach its maximum once one user opens a session.
When the maximum number of users has been
reached on a server, users with existing sessions will
20
Don’t use load balancing with mail clusters.
Many mail events require users to be on their
home server to work properly.
Clustering Tools
Administrators have several tools available for monitoring and managing a Domino cluster. In this section, we’ll look at each clustering tool, starting with
www.eVIEW.com
©2002 THE VIEW. All rights reserved.
How to Configure a Domino Cluster
the primary one — which is, of course, the Domino
Administrator client.
Note!
You also may need to run an additional Cluster
Replication task after a server has been down for
some time, to help get everything back in sync and
speed up recovery after a crash.
The Domino Administrator Client
In the Domino Administrator client’s tabbed interface, there are two tabs that you can use for monitoring and managing clustering — the Server tab and the
Files tab.
To get essential information about a cluster, open
the Server tab. The Status subtab shows the current
tasks running on the server being administered.
There should be only one CLDBDIR task running on
a server at a time, although I have seen two tasks
running occasionally. If you see more than one
CLDBDIR task running, end the tasks and restart
CLDBDIR.
At the far right of the Status subtab is the Console
tab, which you can select to get to the console window. Use the Show Cluster command to see information about the cluster, including its name, the current
server, the probe timing, port, availability threshold
and index, the number of servers in the cluster, and
the server names and their availability index.
In the Files tab, you can do all of the following:
•
Create replicas of databases via the AdminP task.
This feature is useful for creating replicas of mail
files for new mail users.
•
Enable and disable clustering for specific databases in this section as well as in the CLDBDIR
database.
•
Set the clustered databases properties Out of
service, In service, and Pending delete:
To monitor replication, go to the Replica section
of the Statistics subtab, which is also on the Server
tab. Look at the replica.cluster.workqueuedepth
statistic. This statistic should be as close to 0 as
possible. If it is regularly over 0 for long periods of
time, you may need to add another CLREPL task.
The replica.cluster.workqueuedepth statistic is calculated every 15 minutes, so a long period of time
would be 30 to 60 minutes. Add another instance of
CLREPL to the ServerTasks= line of the notes.ini file
to make it permanent, or start the CLREPL task from
the Administrator client as a temporary fix. Just
remember that when you decide that you don’t need
two or three CLREPL tasks running any longer, you
will need to use the Tell CLREPL Quit command
at the console. This ends all CLREPL tasks, so
you must start again each CRLEPL task that you
normally run.
No portion of this publication may be reproduced without written consent.
–
Out of service — Prevents users from opening a database and redirects them to another
replica if available. Otherwise, they will get
the message: “Access to the database has
been restricted by the administrator.” Users
already in the database will remain in the
database. Once they close the database session, they won’t be able to get back in. The
server maintains connections to the database
and allows replication to continue. The
server can operate on the database to run
tasks such as fixup, updall, or compact. Setting a database Out of service is helpful if
you need to perform maintenance such as
running fixup and don’t want users in the
database.
Use the setting Server_Restricted to take
a server out of service. A value of 1 restricts
a server until it is restarted; a value of 2
restricts a server even after it has been
restarted. You can also take a server out of
service at the console by using the Set Config
Server_Restricted=X command.
21
THE VIEW September/October 2002
–
–
In service — Makes an out of service database available to users.
Pending delete — Prohibits new requests to
open a database; however, it does allow users
currently working in the database to continue
working. The database will be removed after
the last person has closed her connection to
the database and any changes written to this
database have been replicated to other replicas in the cluster.
Figure 9 shows the Manage Clusters dialog box,
where you can set the clustered database properties. To open this dialog box, select the database
for which you want to set the properties, select
the Tools tab, then select Databases and the Cluster option (or simply right-click Databases and
select Clusters).
Figure 9
Cluster Dialog Box in Admin Client
Tip!
If you want to set a server into a busy state for
maintenance or to force client requests back to
users’ home servers, you can set the notes.ini
setting Server_Availability_Threshold to 100.
This will force any new connections to another
server.
replication events. The cluster replicator logs all
cluster replication events once an hour. Any cluster
replication errors that occurred in the last hour are
also logged in the document.
You can force the Cluster Replicator to write the
information to the log, using the console command
Tell CLREPL Log.
To get essential information about a
cluster, open the Server tab. The Status
subtab shows the current tasks running
on the server being administered. There
should be only one CLDBDIR task
running on a server at a time (though I
occasionally have seen two tasks
running ). If you see more than one
CLDBDIR task running, end the tasks
and restart CLDBDIR.
Cluster Analysis
Cluster Analysis enables you to check your cluster to
ensure that it is configured correctly and running the
way that you configured it. The results of the cluster
analysis are written to a Notes database that you
specify; by default, it is on your local client.
Domino Server Log
The Domino server log file, log.nsf, displays records
of failover and load balancing, as well as records of
22
There are several different checks that you can
perform using this tool. I’ve put checkmarks next to
the ones that I recommend running every time.
www.eVIEW.com
©2002 THE VIEW. All rights reserved.
How to Configure a Domino Cluster
the cluster. Returns “failed” if no other replica
exists. All it checks for is the existence of one
other replica, not a replica on each cluster server.
Cluster analysis checks at the server level:
! Consistent domain membership — Checks that
all servers in the cluster are members of the same
domain, which is a requirement for clustering.
! Number of cluster members — Just what
it says.
! Consistent protocols — Checks that all servers
are running the same protocols. Cluster members
can’t communicate if they don’t have the same
protocols. In cases where you have clients using
protocols other than TCP/IP, you must ensure that
every server in the cluster is using these protocols. If they aren’t, failover and load balancing
will not work properly for these clients.
! Required server tasks — Checks the
ServerTasks setting in the notes.ini file to ensure
that CLDBDIR and CLREPL tasks are listed so
that they can start on server startup. Clustering
won’t work properly if these tasks don’t start.
Run only those reports for which you want to
have results. Running just one test can take a considerable amount of time. For example, when I run an
analysis, it can take approximately three hours for just
a couple of checks. Therefore, I recommend that you
run your analysis checks on another PC, go to lunch,
or go home.
To run a cluster analysis, follow these steps:
1. Open the Domino Administrator client and select
the server that you want to check.
2. Select the Server tab, then the Analysis subtab.
3. In the Tools pane, select Analyze > Cluster.
The Cluster Analysis dialog box opens. (See
Figure 10.)
Cluster analysis checks at the database level:
! Consistent ACL — Compares the ACLs of
Figure 10
Cluster Analysis Choices
databases in the cluster to ensure that the ACLs
are the same throughout. Inconsistent or missing
ACLs can cause cluster replication or
failover problems.
! Disabled replication — Checks to see if cluster
replication is enabled for a database so that
when a client fails over to the database, no data
is missing.
! Consistent replication formulas — Checks for
consistent replication formulas for databases that
have the same file path. Replicas with the same
path should have the same replication formulas.
(Note that this is the only check I’m not recommending that you run every time.)
! Replicas exist in the cluster — Verifies that the
databases on the current server have replicas in
No portion of this publication may be reproduced without written consent.
23
THE VIEW September/October 2002
Figure 11
Cluster Analysis Results
4. To store the results in a database other than the
Cluster Analysis database, click Results Database
and select the database that you want.
•
By Cluster — Breaks down the test by cluster, so
you can tell how each cluster in your environment
is behaving.
5. If you already have a Cluster Analysis database,
you can append the new results to the database,
or overwrite the existing database, which is
the default.
•
By Date — Presents all tests by date. You will
probably use this view the least.
•
By Test — Groups all clusters by test. It’s
the easiest view to read of all three views.
(See Figure 11.)
6. Choose the types of reports that you want to run.
7. For databases, choose the details that you want to
have included in the database.
8. Click OK to run the analysis and open
the database.
The report will open automatically when it is
completed; you’ll see something similar to the screen
in Figure 11. You can view the Cluster Analysis
database by opening the database and selecting one of
the following views:
24
When looking at the analysis results, you want to
look at any failed tests and determine why they failed.
The test that fails most often, especially in mail clusters, is “Replicas Exist within Cluster.” Because this
test looks for databases that don’t have replicas, it can
help you discover:
•
Databases that have been cluster-enabled by
default, but for which you never created a replica
•
Database replicas that were not removed when
users were removed
www.eVIEW.com
©2002 THE VIEW. All rights reserved.
How to Configure a Domino Cluster
•
Database replicas that were not created when
users were added
Looking Ahead to
Notes/Domino 6
In Domino 6, there are some improvements in how
clustering works and how it’s configured:
•
•
“Flexing Domino’s Clustering Muscles” by
James Grigsby, THE VIEW (September
/October 1997).
•
The Lotus Yellowbooks Administering the
Domino System and Administering Domino
Clusters (available on www-10.lotus.com/ldd;
click the Documentation link). These books
provide good basic information that will help
you get started with clustering.
The Cluster Administration task is now a thread
in the server task, similar to the Cluster Manager
task. This thread is responsible for starting the
CLREPL and CLDBDIR tasks at server startup.
The CLREPL and CLDBDIR tasks are removed
from the ServerTasks= line in the notes.ini file.
•
You can set the number of CLREPL tasks to run
on the server using the new notes.ini setting
Cluster_Replicators=x. The default is one task,
but if you need more than one Cluster Replicator
task, you can change this setting.
•
You can disable cluster replication on a server
using the new notes.ini setting
Disable_Cluster_Replication=1. To enable cluster replication on the server, you can either remove the setting or set the value to 0.
•
(available on www.lotus.com/redbooks). This
book uses Windows and Linux as the platforms,
but much of the information is good for
any platform.
The Admin Help database is the same information in electronic format.
•
Lotus Developer Domain (www-10.lotus. com/
ldd). Look at the LDD Today section for articles
on clustering. Much of the earlier information is
still valid in R5.
In Domino 6, you can set the number of
CLREPL tasks to run on the server using
the new notes.ini setting
Cluster_Replicators=x. The default is
one task, but if you need more than one
Cluster Replicator task, you can change
this setting.
You can configure directory assistance for
failover if you have replicas of a Domino Directory database within a cluster that are part of a
directory assistance database. In the Directory
Assistance document under Replicas, specify one
of the replicas as enabled for clustering.
Conclusion
Clustering Resources
For more information on clustering, I recommend the
following resources:
•
The IBM Redbook Lotus Domino R5 Clustering
with IBM eServer xSeries and Netfinity Servers
Perhaps the best thing about Domino clustering is that
it provides greatly increased server availability and
performance with only a little effort required on the
part of the administrator. When we set out to implement our first cluster at my company, it initially
seemed like a daunting task. After setting it up,
however, we were amazed that “that’s all it took.”
No portion of this publication may be reproduced without written consent.
25
THE VIEW September/October 2002
In a future article, I’ll take you deeper into the
mysteries of Domino clustering — how you can get
the most out of your Domino clusters through monitoring and tuning, how database quotas function in a
cluster, how a tool can alleviate some of the pain that
quotas can cause, and things to consider when designing an application that will be clustered.
26
Ted Hardenburgh is the senior Notes and Domino
Administrator for Apria Healthcare in Lake
Forest, CA. Ted is a Principal CLP in R5
Administration and a CLP in R5 Development. He
has been involved with Lotus Notes and Domino
since release 4. His recent projects include a
LearningSpace and iNotes Web Access
implementation. Ted has experience with Notes
and Domino on the Windows and iSeries
platforms, migrations from cc:Mail and R4 to R5,
clustering, and security. In the past, he also
worked with OS/2, cc:Mail, and LMEF. Ted
works in southern California, where he lives with
his wife and family. He can be reached at
[email protected].
www.eVIEW.com
©2002 THE VIEW. All rights reserved.