How to Configure a Domino Cluster How to Configure a Domino Cluster Ted Hardenburgh E-mail and calendaring — for many companies, these applications are the lifeblood of the enterprise. Users expect these and other applications to be always available, like water from a tap or electricity from the nearest wall outlet. Fortunately, Domino server clustering provides high availability and load balancing so our users can get to their e-mail and other critical applications 24 hours a day, 7 days a week. Ted Hardenburgh is the senior Notes and Domino Administrator for Apria Healthcare in Lake Forest, CA. Ted is a Principal CLP in R5 Administration and a CLP in R5 Development. He has been involved with Lotus Notes and Domino since release 4. His recent projects include iNotes Web Access LearningSpace and implementations. He can be reached at [email protected]. (complete bio appears on page 26) For many Domino administrators, however, the word clustering suggests a long, arduous process involving endless tinkering with hardware and software to get the system up and running. It’s clear from conversations with my fellow administrators that there is often some trepidation prior to creating a cluster. This article’s mission is to dispel any confusion or concern that you may have about clustering. It provides the basic information that you need to take advantage of Domino clustering — covering what clustering is, what’s required to set up clustering in a Domino environment, and the tasks and databases that are used in Domino clustering. Then it guides you through the implementation of a Domino cluster, step-by-step, from initial software installation to final configuration. Along the way, I’ll provide some insight on how to identify the databases that you need to enable for cluster support, as well as those databases that might not need to be enabled. You will learn all about failover: how it works and doesn’t work, how to set up mail-routing failover, and what you need to know about failover of calendaring and scheduling lookups. You’ll also discover how load balancing works and what you can do to configure it properly. Finally, I’ll introduce you to No portion of this publication may be reproduced without written consent. 3 THE VIEW September/October 2002 some tools that you can use for monitoring and managing Domino clusters in an R5 environment. By the time you’re finished reading this article, you should be ready to implement Domino clustering in your own environment. What Is Domino Clustering? A Domino cluster is a group of two to six servers containing database replicas that are kept in near realtime synchronization across the clustered servers; the servers act in concert to provide failover and load balancing support. (For an explanation of failover support in Domino clusters, see the sidebar on page 7; for an explanation of load balancing, see the sidebar on page 9.) If you’re familiar with Domino’s replication task, which keeps Notes databases up-to-date on a scheduled or on-demand basis, you can think of Domino clustering as replication on steroids. Technically speaking, Domino clustering is event-driven replication, which means that when an event occurs in a cluster-enabled database, that event is queued for replication to the other replicas of the database in the cluster. Event-driven replication is what makes high availability and failover possible. In a Domino server cluster, if one server becomes unavailable for any reason, another server in the cluster will pick up the load for it, and do so in a way that is, for the most part, seamless to the Notes client user.1 A user who is accessing a database on one server in the cluster (Server A) does not notice when requests to that server are redirected to another server in the cluster (Server B), because the replica on Server B is synchronized with the replica on Server A. However, 1 4 On the client side, Domino clustering only supports requests from Notes clients. Web clients cannot use the services provided by clustering. To provide clustering support to Web clients, use Internet Cluster Manager (ICM). there are some events that prevent failover from happening seamlessly. (See the Failover sidebar on page 7.) In most settings, Domino clustering is used to provide high availability, failover support, and load balancing for messaging, though many installations use it to provide support to other applications. For example, the primary use of clustering where I work has been for e-mail, but we have also implemented clustering to support an application whose users require 24-hour availability. Clustering was first released to the user community as part of the Domino Server R4.5 and R4.6 Advanced Services. Now it is part of the Domino Enterprise Server of R5, which requires a different license from the one used for a standard Domino Mail or Application Server. For an in-depth look at Domino clustering technology, including how it provides failover while scaling to accommodate growing workloads and numbers of users, see “Flexing Domino’s Clustering Muscles,” by James Grigsby, THE VIEW (September/ October 1997). Operating System (OS) Clustering Compared with Domino Clustering Many operating systems support clustering, either natively or through third-party support. The most commonly used clustering support may be that provided by Windows NT and Windows 2000. With Windows NT 4, a cluster can have two nodes or servers, but they can’t both be up at the same time. This is called an active-passive cluster. This type of cluster shares the same disk structure and is only useful for failover. Load balancing is not supported. With Windows 2000 Advanced Server, the ability to have active-active clusters has been added, so each of the two servers that are up and running can compensate for the other server’s workload in addition to its own workload; however, they still use the same shared disk structure. One reason to cluster servers is www.eVIEW.com ©2002 THE VIEW. All rights reserved. How to Configure a Domino Cluster to avoid single points of failure. For example, while the server may be protected from failover, the disk structure remains exposed to a failure of a Redundant Array of Inexpensive Drives (RAID) adapter. Also, both types of clusters require dedicated network links between the node servers. While not a disadvantage, there is an additional hardware requirement. Domino clustering is done at the application level, not at the operating-system level. A Domino cluster can have up to six nodes. Domino clusters are of the active-active type, so they can support both failover and load balancing. They are operatingsystem agnostic, meaning that the nodes in the cluster can have different operating systems. Domino clusters also do not require a dedicated connection between cluster nodes; you can even have a Domino cluster that is connected over a wide area network (WAN), if desired. across multiple servers in the cluster, so that if a user’s home server becomes unavailable, the user’s Notes client will be automatically redirected to another node in the cluster. For example, clustering can be a vehicle for meeting a Service Level Agreement that mandates 24-hour availability for e-mail. • Scalability: Domino clustering enables you to tune servers to support a certain number of users, so your application performance doesn’t degrade as demand for it increases. You can think of load balancing as planned failover. • Disaster preparedness: Domino clustering can play an important part of a company’s disaster recovery plan. A cluster node can be located in an off-site location connected via WAN. The server can be configured so that the data on it remains synchronized with data on other cluster nodes; users can only access the off-site server in the event of an emergency. In choosing whether to use OS clustering or Domino clustering, you will need to consider your company’s environment. It is possible and sometimes desirable to run a Domino cluster on top of an operating-system cluster, but that topic is outside the scope of this article. For an article that discusses this kind of clustering, see “From High Availability to Disaster Tolerance: Layering Domino Clusters on Wolfpack,” by Peter Eggimann, THE VIEW (March/April 1998). Why Cluster Your Domino Servers? There are several reasons why you might want to take advantage of clustering: • High availability: If you have users who require 24-hour a day access to their application, you can provide it using clustered servers (barring a disaster that affects all the servers in the cluster, of course). You can put replicas of the application What’s Needed for Clustering? In setting up a Domino cluster, the first step is to obtain a Domino R5 Enterprise Server license. This license gives you the necessary access rights to install Domino’s Advanced Services: clustering, partitioning, and billing. Hardware Requirements Because you can have separate hardware and operating systems, the hardware in your Domino cluster may not need to be exactly the same for every server. For example, the hardware in a cluster that includes a Windows NT server and IBM iSeries server would hardly be similar. What is necessary is that you have hardware that can support the additional load that either of these servers may be put under if one of the other nodes fails for any reason. No portion of this publication may be reproduced without written consent. 5 THE VIEW September/October 2002 In determining your hardware needs for any single cluster node, I recommend working on the assumption that one of the other nodes will have a complete failure that will result in downtime measured in days, not minutes. You may think, “I’ve never had a failure like that!” But remember, just because you haven’t experienced a serious failure yet doesn’t mean that you never will. Recall that hardware is measured with something called “Mean Time Between Failure.” If the people that built the server know that it will fail at some point, we administrators should know that it will, too. No one wants to be the person who gets a rude awakening early one morning when the CEO is ranting about not being able to read this morning’s e-mail because a cluster node went down and the other node couldn’t handle the additional workload. At my company, we once experienced a severe hardware outage — I can tell you that being prepared for this scenario certainly beats being unprepared! When calculating hardware needs, be conservative. Determine the number of users that the server will support and increase that by half. That’s the number to shoot for when configuring the server. For example, if there are two servers in your cluster, and there are 1000 users in the Domino domain, that’s 500 users per server. The server you configure will need to handle at least 750 users to prepare for failover. Talk to your hardware supplier or check out the sizing guides that are provided for many platforms. (Most of these guides are fairly conservative in the recommendations that they make.) Server Requirements In addition to obtaining the license and your hardware, you need to ensure that all servers in the cluster meet the following requirements. They must: • • 6 • Be in the same Notes Named Network • Have AdminP running (for automation of cluster tasks) Ideally, all servers in a cluster will be on the same release of Domino, but it’s possible to have one node on R4.6 and another node on R5; for example, you might have to mix Domino releases in a Domino cluster because you’re migrating between releases or upgrading and changing hardware. Important! A server can be in only one Domino cluster at a time, but you can have as many clusters as you want in your Domino domain. Other Considerations In addition to hardware and server requirements, you’ll want to consider the usual performance enhancements, such as Transaction Logging, the R5 on-disk structure (ODS) for databases, and multiple MAIL.BOX databases. In larger installations, you need to consider the limit for databases that can be clustered. According to Lotus, the number is between 4000 and 8000 databases per cluster, depending on the size of the databases. The largest cluster that I’ve ever worked with had over 2500 clustered databases. Clustering — Essential Tasks and Databases Be in the same Domino domain Use TCP/IP for server-to-server communication (clients can communicate with the servers using any supported protocol) The more you know about Domino clusters and how they work, the better able you are to maximize failover and load balancing. This section familiarizes you with the inner workings of a Domino cluster. www.eVIEW.com ©2002 THE VIEW. All rights reserved. How to Configure a Domino Cluster Failover Failover is the redirection of a request from one server to another server when the servicing server has failed to respond. It is a means for providing high availability of applications and, as such, is perhaps the primary reason for setting up a cluster. Failover is triggered when a client attempts a request to a database that is not available. Unavailability can be caused at the server or the client. Server-side triggers of failover include: • The Domino server is not available due to hardware, software, or network issues. • The Domino server is in a restricted state (that is, Server_Restricted is not 0). • The Domino server is busy. • The Domino server has reached the maximum number of user sessions as set by the notes.ini parameter Server_MaxUsers. • The database is unavailable because it has been marked Out of service or Pending delete. A Notes client attempting to do one of the following can trigger failover: • Opening a database, either from a bookmark, Workspace icon, document link, view link, database link, or by using one of the following programming methods: - Notes Formula @Command([FileOpenDatabase]) - LotusScript db.openwithfailover. (Using db.open does not initiate failover.) • Initiating replication from a workstation. • Performing an operation relating to the user’s home server, such as: - Name resolution - Compose mail - Send mail - Server lookups • Executing an agent that sends mail (triggers failover if all the Domino servers in the cluster are configured for mail transfer and delivery failover). Events that do not trigger failover: • A server becoming unavailable while the database is open. However, exiting the database and attempting to reopen it will trigger failover. • A server becoming unavailable while a user is editing a document. Exiting will result in unsaved data being lost. A user may want to copy what she was working on to the Clipboard in order to save it temporarily. • Using the command Tools > Open a Calendar. This operation uses @Command([OpenCalendar]), which is not cluster-aware. There is no “open with failover” option for calendars. • Selecting a database icon or bookmark and choosing Properties, Access Control, Open, or New Copy from the Database menu. • Using the command View > Go To. • Replicating with a server that is restricted or that has met its MaxUser or Session settings, or replicating with a database that is marked Out of service. Replication will occur regardless of these restrictions, so failover is not needed. • The router attempting to deliver mail and routing failover is not configured, or cluster replication is disabled on the mail server. The mail will sit in MAIL.BOX pending delivery or delivery timeout. • Using the server as a template server when creating a database. • The router attempting to deliver mail, but the MailClusterFailover is set to 0 in the notes.ini file or disabled in a configuration document. No portion of this publication may be reproduced without written consent. 7 THE VIEW September/October 2002 Figure 1 Server Tasks Related to Clustering The Cluster Administration Task The Cluster Administration task (CLADMIN) runs automatically when a clustered server starts. This task manages all of the other cluster components; it starts the other tasks, adds and removes servers from the cluster, and starts the AdminP task if it isn’t running. When a server is added to a cluster, the CLADMIN task is started; it then starts the other cluster tasks and adds these tasks to the ServerTasks setting in the notes.ini file. Once the CLADMIN task has completed its job, it ends itself. it cannot be launched separately like the AdminP or Fixup tasks. Instead, it starts automatically when the server is added to a cluster or when a change to a cluster is detected in the Domino Directory. In the Server Tasks window, it appears as one of the activities listed in the Database Server task. (See Figure 1, which also shows some of the other Domino clusterrelated server tasks as they appear in this window.) The Cluster Manager also does the following: • Handles database requests for failover or load balancing from clients. (For an explanation of how failover functions in a Domino cluster, see the sidebar on page 12.) • Determines which servers are part of the cluster by checking the Domino Directory for changes to the server document and the clusters view. • Checks server availability and workload. • Communicates with other cluster nodes regarding changes in cluster availability. • Logs failover and load balancing events to the log.nsf file. The Cluster Manager Task The Cluster Manager task is responsible for taking the pulse of all the servers in a cluster and maintaining a list of all those servers with their current status. It runs on each server in the cluster, polling the other servers using probes that are sent once a minute to determine the other servers’ availability and workload. Because it is part of the base server task, 8 www.eVIEW.com ©2002 THE VIEW. All rights reserved. How to Configure a Domino Cluster Figure 2 Cluster Database Directory The Cluster Database Directory Task and the Cluster Database Directory When a cluster is created and a server joins the cluster, the Cluster Database Directory task (CLDBDIR) is added to the ServerTasks setting in the notes.ini file. This task creates a new database on the server (the Cluster Database Directory, or cldbdir.nsf) and populates it with documents for each database on the server. CLDBDIR is started at server startup and runs while the server is up. A replica of cldbdir.nsf exists on every server in the cluster and is updated by cluster replication so that if a database is added, removed, or marked for maintenance, the other cluster members are immediately made aware of it for failover and load balancing. Cluster Manager task to determine failover routes when a database or server is unavailable. Figure 2 shows the Cluster Database Directory displayed in the Administrator client. The Cluster Database Directory (cldbdir.nsf) is the heart of a Domino cluster. It contains an entry for each database on every member of the cluster, indicating whether the database is enabled for clustering, and including the database’s name, file path, replica ID, and its state in the cluster. (Cluster states include Enabled, Disabled, Out of service, and Pending delete.) The Cluster Database Directory is used by the No portion of this publication may be reproduced without written consent. Load Balancing Load balancing in a Domino cluster is the redirection of a client to another node as a means of distributing the workload of the server. The server monitors its workload (in terms of either response time or total user sessions) and compares it against settings in the notes.ini file or configuration document to determine whether it will accept new user sessions. If the workload is too high, the server will redirect client requests to another node in the cluster. It’s important to understand that in a Domino cluster, workload is measured per server, not per database. 9 THE VIEW September/October 2002 The Cluster Replicator Task If the Cluster Database Directory is the heart of your Domino cluster, the Cluster Replicator task (CLREPL) is its lifeblood. It is responsible for the near real-time synchronization of the databases in the cluster. CLREPL executes event-driven replication, based on database activity, using the Cluster Database Directory (cldbdir.nsf) to determine what databases need to be replicated.2 Multiple CLREPL tasks can be run on a server depending on the load. The Domino administrator configures the number of tasks that may run, as discussed later in “Configuring a Domino Cluster.” clubusy.nsf. Each server in the cluster has its own replica of clubusy.nsf, which is updated through cluster replication when any information for one of its users changes. Note! If a user’s client fails over to a server that isn’t his home server, clubusy.nsf won’t be updated with any changes to his free time until his home server is available again and the Schedule Manager updates clubusy.nsf on that server. (The Schedule Manager only updates a user’s free-time information on the user’s home server.) The Clustered Free-Time Database (clubusy.nsf) In an R5 environment, clustered mail servers have a clustered free-time-lookup database, clubusy.nsf. This database allows any user in the cluster to look up free-time information on another user without having to make a call to that user’s home server (if it’s not the same as the first user’s home server). It also allows free-time lookups to continue if a user’s mail server is unavailable; the lookups simply fail over to another node. In R4.6, free-time lookups are not clustered, so a lookup to a user on an unavailable server will return the message “No Information.” Note! If a non-clustered server performs a free-time search on a cluster server that goes down, the search will not fail over to another cluster node. Additional Cluster Databases In addition to the Cluster Database Directory and the clustered free-time database, Domino clusters use documents contained by the Domino Directory, most notably server documents. The Domino Directory is where the initial configuration of the cluster is completed along with the Administration Request database. Once changes are made in the Domino Directory, AdminP requests are placed in the Administration Request database to be processed. Finally, there is a client portion to clustering; it isn’t all done on the server. Notes R4.5 and later releases have a cluster cache file, cluster.ncf. (See Figure 3.) This file is created when a client accesses a clustered server for the first time. It lists the last time the file was updated, the name of the cluster, and Figure 3 Contents of cluster.ncf file The clubusy.nsf database is created automatically by the Schedule Manager task when a mail server is added to a cluster. The Schedule Manager deletes the existing busytime.nsf and creates a replica of 2 10 Standard replication is based on a scheduled connection document, client demand, or some other external request. www.eVIEW.com ©2002 THE VIEW. All rights reserved. How to Configure a Domino Cluster Note! standing how a cluster works, let’s discuss configuring a Domino cluster. The client must have successfully accessed a clustered server and have a cluster.ncf file generated before failover will work for the user. Configuring a Domino Cluster all the member servers of the cluster. When the current server is unavailable, the Notes client looks in this file for a server to which it can redirect its request. Without a cluster.ncf file, the client will not know what servers it can try to fail over to. The role of cluster.ncf is explained and illustrated in the sidebar “How Failover Works” on page 12. Now that you have the theoretical basis for underFigure 4 Before you create your Domino cluster, verify that you have TCP/IP installed and configured on all servers. Also ensure that you have Domain Name System (DNS) or host file entries for all the servers in the cluster. If you haven’t already done so, install Advanced/ Enterprise Services for Domino, using your Domino server installation software. (Remember to get the R5 Enterprise Server license so that you can access these services.) Figure 4 shows the initial dialog box for the Software Installation No portion of this publication may be reproduced without written consent. 11 THE VIEW September/October 2002 How Failover Works The diagram on the opposite page depicts the failover process as it unfolds on a clustered Domino server. Let’s go through the process step-by-step: 1. The Notes client user clicks on the bookmark to open her mail file on Server A. 2. Transparent to the user, the client gets the “Server Not Responding” message. 3. The client looks at the cluster.ncf file to determine what other servers are in the cluster. 4. The client tries to access another server in the cluster, Server B. 5. The Cluster Manager on Server B determines the locations of replicas of the database and directs the Notes client to a server containing the replica. It performs the failover lookup in one of two ways: – The primary lookup method is performed by the replica ID of the requested database. The Cluster Manager checks the cldbdir.nsf file for another server that has a database with the same replica ID. – If there are multiple replicas of the database on a server, the Cluster Manager matches the file path and replica ID of the requested database.* Note! The replicas may not be on this server. Recall that you don’t need to have a database replica on every cluster node. If the database replica isn’t on this server, the client is given the name of the server that has a replica and the client will attempt to access that server. 6. The client then accesses the server that has the mail file; the server serves the file to the user. On the client, a new icon appears on the Workspace for the replica database, and in Bookmarks, the server name is added to the Open Replica menu item when the user rightclicks on a bookmark or Workspace icon. * It’s best to avoid having multiple replicas of a database on one server, since you cannot enable the database replicas for cluster replication, because the Cluster Replicator task ignores selective replication formulas. server software installation. You must select Domino Enterprise Server on this dialog box. In some versions, you may be prompted to select the pieces of the Enterprise license that you want to install. After the server software has been installed, you can start the Domino server. It must be up and running for the cluster to be configured. Now it’s time to create the cluster. Follow these steps: 1. Open the Domino Administrator client, and select the Configuration tab. 12 2. Select Clusters, and then select the All Server Documents view. 3. Select the servers that you want to include in the cluster, and then click the Add to Cluster button. 4. A dialog box like the one shown in Figure 5 now prompts you to create a cluster or add to an existing cluster. You’re creating a new cluster, so enter the name for your cluster. (If you were adding a server to www.eVIEW.com ©2002 THE VIEW. All rights reserved. How to Configure a Domino Cluster The user will continue to use the database on the failover server even after the original server is available again, unless she manually changes back to the home server. This is standard Notes client behavior — the user continues using the last replica opened. 1 Database request 2 "Server Not Responding" message Requested Db You can force users back to the first server by making the failover server unavailable or by setting it to busy. If you’re hoping to get users back on their home server, this technique may not work as expected in a cluster of more than two servers. cluster.ncf 3 Client checks cluster.ncf for another clustered server Server A 4 Database request 5 Name of server with replica (Server C) Notes client 6 cldbdir.nsf Server B Database request Requested Db Server C Figure 5 Creating a Cluster an existing cluster instead, you would simply select the cluster from the list.) Give your cluster a simple but descriptive name. 5. To add the servers to the cluster immediately, click Yes at the prompt. To have the Administration Process add the servers to the cluster, click No. When the Domino Directory is replicated to all the cluster nodes, the cluster is operative. When a Domino server in the cluster reads the updates to its server document and recognizes that it’s part of a cluster, the following activities take place on each clustered server: No portion of this publication may be reproduced without written consent. 13 THE VIEW September/October 2002 • The Cluster Administration and Cluster Manager tasks start on the server. • The CLDBDIR and CLREPL tasks are added to the ServerTasks setting in the notes.ini file so that they start with the server. The tasks will also be started at this point. • The line Server_Cluster_On=1 is added to the notes.ini. • The AdminP task is started if needed. AdminP is used by the cluster to update servers and server documents and to work on databases, so it is important that it be running. • The CLDBDIR task runs and adds documents for every database on the server to the Cluster Database Directory and begins replicating with the other cluster members. • The Cluster Manager sends probes to the other servers in the cluster to determine their status, and it informs each server of the status of the other servers. If any of your clustered servers are running multiple protocols on the server, you have an additional step. Because clustering only works with the TCP/IP protocol, you must tell the server which port to use by adding two lines to the notes.ini or to the server’s configuration document. Server_Cluster_Default_Port=PortName Server_Cluster_Probe_Port=PortName PortName is the name of the TCPIP port listed in the server document. Both entries can refer to the same port or they can be configured separately for cluster traffic and probes. How Many Cluster Replication Tasks Should You Run? As you just saw, when you create a cluster, the CLREPL task is automatically added to ServerTasks= in the notes.ini file. You can run multiple instances of this task by adding one entry for each instance to 14 the ServerTasks= line. Running multiple instances helps to ensure that replication takes place more often so that all database replicas are as up-to-date as possible. There are two different guidelines for determining the number of CLREPL tasks that should be run: (1) Take the number of processors on your server, subtract one, and that equals the number of CLREPL tasks to run; or (2) Run one instance for each server in the cluster with which the server is replicating. In the second case, a four-server cluster would indicate running three CLREPL tasks. Depending on your server, you may be able to run more than three CLREPL tasks in the short-term. At my company, we normally run two CLREPL tasks, but we have used up to three tasks on a dual processor AS/400 on an as-needed basis with little negative impact. Note! You may need to run an additional Cluster Replication task after a server has been down for some time, to help get everything back in sync and speed recovery after a crash. When you decide you don’t need to run the additional CLREPL task any longer, issue the Tell CLREPL Quit command at the console. This ends all CLREPL tasks, so you must start again each CLREPL task that you normally run. Private LANs for Cluster Traffic Depending on your infrastructure, you may want to consider setting up a private local area network (LAN) for your cluster traffic to help improve response to client requests. To set up a private LAN that separates cluster replication traffic from all other traffic, follow these steps: 1. Have two network cards in each cluster node, www.eVIEW.com ©2002 THE VIEW. All rights reserved. How to Configure a Domino Cluster Figure 6 Results of “Show Cluster” Command with separate addresses assigned to each card and a host name assigned to each secondary address. 2. In the Server document, create and enable a new Notes port for each server. 3. In the Server Configuration document or the notes.ini file, assign each Notes Network port to a corresponding IP address by adding the following lines. Below, I’ve defined two ports — one named TCPIP, the other named CLUSTER: TCPIP_TcpipAddress=0,xxx.xxx.xxx.xxx:1352 CLUSTER_TcpipAddress=0,xxx.xxx.xxx.xxx:1352 4. Add the following lines to the Server Configuration document or the notes.ini file: You can also check the statistics net.PORTNAME.BytesReceived and net.PORTNAME.BytesSent to determine how much data the CLREPL task is sending and to verify that cluster replication is working on the correct port. Just remember to change PORTNAME to the actual name of your port. Next Steps For database requests to be redirected when the current server isn’t available, the following two conditions must be met: • The database must be enabled for clustering. • At least one replica of the database must exist on another server in the cluster. Server_Cluster_Default_Port=CLUSTER Server_Cluster_Probe_Port=CLUSTER 5. Restart the server for the changes to be put in place. We’ll look next at how to decide which databases to enable for clustering, which databases not to enable, and how to create replicas for cluster support. 6. To see if the changes have taken effect, enter the Show Cluster command from the administrative console, as shown in Figure 6. You’ll see the port name listed in the data returned. Please note that the port name only appears in the “show cluster” information list if you’ve used the settings above. If a default cluster port is not defined, no port will appear in the list. Enabling Databases for Cluster Support By default, Domino will enable every database and template for clustering. I recommend that you change No portion of this publication may be reproduced without written consent. 15 THE VIEW September/October 2002 Figure 7 Distribution of Mail Files in a Four-Server Cluster a b c b c d c d a d a b Server 1 Server 2 Server 3 Server 4 1000 Mail Files a = 1 - 250 b = 251 - 500 c = 501 - 750 d = 751 - 1000 the default so that no databases are initially clustered. Once you have disabled all databases for clustering, you can proceed to make considered decisions about which databases to cluster on a case-by-case basis. In the Cluster Database Directory, you can enable or disable a database for clustering by simply selecting the database and choosing the action bar item, Tools > Enable (or Disable) > Clustering. This action wasn’t available in R4.6 and previous releases; however, you can add it to the database. Here’s how: Open the database in Domino Designer and add an action that calls an agent that changes the value of the field Cluster Replicate to 1 for enabled or 0 for disabled. node the home server for half of your mail files. The third server can then be designated as the “passive” node — meaning that no users have it as their home server. This server would have all the mail files, but it would only be used as needed for backup and failover. You might consider placing the passive node in an off-site location as part of your disaster recovery plan. In a four-server mail cluster (for example, Server 1, Server 2, Server 3, and Server 4), you could split your mail files into fourths (let’s call them a,b,c, and d) and put three of these fourths (but never the same three!) on each server, like so: • a-b-c on Server 1. • b-c-d on Server 2. Mail Databases • c-d-a on Server 3. In a two-server mail cluster, it’s a safe bet that all the mail files will be on both nodes; however, as you add servers to the cluster, you should start thinking about file placement. More than likely, you don’t need a replica of every mail file on every node. • d-a-b on Server 4. In a three-server mail cluster, for example, you might designate two nodes as “active,” making each 16 Figure 7 illustrates how the mail files are distributed across the servers. In this way, you do not have all mail files on all servers, but every mail file has two nodes available for failover. www.eVIEW.com ©2002 THE VIEW. All rights reserved. How to Configure a Domino Cluster There are many ways to think about splitting data. You’ll need to take your organization’s needs into account to determine what’s best. server outage, there will be replicas of clubusy.nsf that do not contain correct free-time information for some users. If a Resource Reservations database is clustered, the potential for double bookings is then increased as some bookings will be based on incorrect information. Databases That Must Be Clustered Some databases must be clustered — two in particular: • clubusy.nsf — The clustered free-time database Disabling the cluster status of this database on one cluster server effectively disables free-time searches of users on other cluster nodes. Once cluster replication is disabled on one replica of the clubusy.nsf, all other replicas essentially have bad data. Any free-time lookups done for users on the disabled server will not be correct. It is the same behavior you would see in a non-clustered network if the server itself was unavailable for free-time lookups. Databases That Should Not Be Clustered There are some databases that you won’t want to cluster: • • Directory catalogs and other server-specific “internal” databases — This category includes statrep.nsf, events4.nsf, catalog.nsf, and the like. cldbdir.nsf — The Cluster Database Directory If the Cluster Database Directory is not enabled for clustering, failover and load balancing may not work properly due to database information not being up-to-date. • • MAIL.BOX — Clustering MAIL.BOX does not create mail-routing failover. It will likely only cause routing problems and not gain you anything. You need to set up mail-routing failover separately. (I explain how to set up mail-routing failover later, in the section “Configuring Mail Routing Failover” on page 18.) Resource Reservations databases — Because the free-time database (clubusy.nsf) only gets updated by a user’s home server, in the event of a Working with Database Replicas The easiest way to create database replicas is by using the Administrator client and AdminP. Follow these steps: 1. Go to the Files tab of the Domino Administrator client and select the database or databases that you want to replicate. 2. From the Tools > Database Task list on the righthand side of the screen, select the Create Replicas task. 3. Open the Create Replicas dialog box to display only cluster members. Select the servers on which you want to create a replica of the selected database and click OK. 4. The Administration Process will check the server’s access rights to create a replica and add the replica stub to the destination servers. It will take a couple of cycles through AdminP to complete this step; you may want to manually replicate the AdminP databases to expedite this process. 5. Initiate a replication session between the servers to initialize the replica stub, or create a connection document that replicates between the servers and wait for the scheduled replication to take place. The Cluster Replication task will not initialize a stub, so it won’t start replication until the Replica task has done its job. No portion of this publication may be reproduced without written consent. 17 THE VIEW September/October 2002 Note! Do not create multiple replicas of a database on the same server. You will not be able to enable the database replicas for clustering, since the Cluster Replica task ignores selective replication formulas. To enable mail routing failover, do one of the following: • MailClusterFailover=1 • Why Scheduled Replication Is Necessary Even though the Cluster Replica task runs on every clustered server, each server in the cluster still needs the Replication task (REPLICA), because not all databases on the server will necessarily be clustered. It’s also helpful for keeping the clustered databases up-to-date as well, especially after a server crash. Cluster replication works on events that are written to memory, not on the database. This means that if the server goes down, any pending cluster replication activity is lost. This is another reason to have scheduled replication for all servers in your cluster. In case of a server outage, standard replication will retrieve any lost events and update all replicas in the cluster. Determine what schedule works best in your domain: perhaps it’s once an hour, or only once every few hours, but it should be at least once a day. At our company, for example, we replicate between three servers every two hours, alternating between pairs of servers. Servers 1 and 2 replicate on even hours, and servers 1 and 3 replicate on odd hours. Configuring Mail Routing Failover Failover for mail routing provides complete redundancy for your mail users. If a user’s home server is unavailable, the mail routed to that user will be delivered to one of the cluster nodes that has a replica of the mail file on it. 18 For R4.6x servers: Add the following line to the notes.ini: For R5 servers: Edit the configuration document for the server(s) that you want to update and go to the Router/SMTP / Advanced / Controls tab. (See Figure 8.) Select one of the following options for “Cluster failover”: – Enabled for last hop only (the default) — This is the hop to the user’s home server. – Enabled for all transfers in this domain — If any clustered server in the delivery route is unavailable, the router will find another server. – Disabled — Prevents mail routing failover. To be useful, mail routing failover needs to be set up on every clustered server in the domain. You can use a global server configuration to do this, or if you prefer, use individual or group configuration documents. I suggest creating a server group for your cluster and using one configuration document for this group. Be sure to use a group name that is different from the cluster name to avoid confusion. Configuring Load Balancing For automatic workload balancing to work, each server in the cluster must be made aware of its own workload and availability as well as the availability of the other servers in the cluster. Each server computes its Server.Availability.Index (SAI) once per minute. The SAI starts at 100 and is reduced, depending on how the server responds to Remote Procedure calls (RPC) made to it. The longer the response time, the lower the availability index. A value of 100 indicates www.eVIEW.com ©2002 THE VIEW. All rights reserved. How to Configure a Domino Cluster Figure 8 Configuring Mail Routing Failover that the server is performing optimally; 0 indicates that the server is completely unavailable. It’s the job of the Cluster Manager task to keep all the servers in the cluster aware of each other’s workload and availability. The Cluster Manager sends out probes to all other servers in the cluster to determine availability. Important! Don’t wait until a server’s index is 0 before redirecting requests to other servers. That would mean that users would be experiencing slower and slower response times as the server’s index is dropping. Instead, redirect the workload to a server with a high SAI before response time has become unacceptable. Now, let’s learn how to configure load balancing to achieve the best possible response times across all servers in the cluster. The Server_Availability_Threshold Setting To configure load balancing, use the Server_Availability_Threshold setting (SAT). When the Server.Availability.Index (SAI) exceeds the SAT, new client requests are redirected to other servers in the cluster. The Cluster Manager redirects requests to the most available server that has a replica of the database. If other servers are less available than the originally requested server, the original server serves the request. So, for example, if a server’s SAT is 90 and its SAI drops from 91 to 89, requests to this server will be redirected. When the SAI improves and is once again greater than the SAT value, new requests are accepted. The SAT is set on a server-by-server basis in the server configuration document or the notes.ini file. To set the SAT, add or modify the following line, where n= 0 - 99: No portion of this publication may be reproduced without written consent. Server_Availability_Threshold=n 19 THE VIEW September/October 2002 To determine which Server_Availability settings to use, monitor your servers for peak usage and check your load balancing settings. Look at the peak user and SAI statistics. Don’t forget to obtain feedback from users about how they perceive the databases to be working. Start the SAT at a number around 95 and work down until you begin observing unacceptable response times. SAI is a logarithmic function and not a straight percentage. For example, an SAI value of 90 indicates that the server is taking 10 times longer to respond to a request than in optimum conditions (that is, SAI=100). SAI only measures the server response, not network or client-side issues; therefore, your users may be experiencing slower performance than what is indicated by the SAI statistic. One thing to keep in mind is that setting the SAT value too high can cause failover to happen before it really needs to happen. be allowed to remain on the server, but they will not be able to open another database. Note! The Server_MaxUsers setting applies only to active user sessions; it doesn’t affect server-toserver communication, so cluster replication will continue. Also, users who do not have a database open are not counted when setting the maximum number of users permitted to access the server. Here are some caveats when using the Server_MaxUsers setting: • Users may get a “Server Not Responding” error if failover doesn’t occur. This behavior is different from what happens when the SAT setting is reached! • Setting Server_MaxUsers to 0 in the notes.ini file means there is no limit to user sessions; it does not mean that 0 users are permitted! • Avoid using the notes.ini setting Server_MaxSessions because it restricts serverto-server sessions as well as client sessions. Note! The Server_Availability_Threshold setting should be different on each server in your cluster. This is especially true when you have different platforms. Important! The Server_MaxUsers Setting Another way to measure workload and configure load balancing is purely in terms of the number of users on the server. Use the Server_MaxUsers setting to set the maximum number of client sessions that are permitted to access the server. Setting the maximum number of users to 1 effectively causes failover to other nodes, since the server would reach its maximum once one user opens a session. When the maximum number of users has been reached on a server, users with existing sessions will 20 Don’t use load balancing with mail clusters. Many mail events require users to be on their home server to work properly. Clustering Tools Administrators have several tools available for monitoring and managing a Domino cluster. In this section, we’ll look at each clustering tool, starting with www.eVIEW.com ©2002 THE VIEW. All rights reserved. How to Configure a Domino Cluster the primary one — which is, of course, the Domino Administrator client. Note! You also may need to run an additional Cluster Replication task after a server has been down for some time, to help get everything back in sync and speed up recovery after a crash. The Domino Administrator Client In the Domino Administrator client’s tabbed interface, there are two tabs that you can use for monitoring and managing clustering — the Server tab and the Files tab. To get essential information about a cluster, open the Server tab. The Status subtab shows the current tasks running on the server being administered. There should be only one CLDBDIR task running on a server at a time, although I have seen two tasks running occasionally. If you see more than one CLDBDIR task running, end the tasks and restart CLDBDIR. At the far right of the Status subtab is the Console tab, which you can select to get to the console window. Use the Show Cluster command to see information about the cluster, including its name, the current server, the probe timing, port, availability threshold and index, the number of servers in the cluster, and the server names and their availability index. In the Files tab, you can do all of the following: • Create replicas of databases via the AdminP task. This feature is useful for creating replicas of mail files for new mail users. • Enable and disable clustering for specific databases in this section as well as in the CLDBDIR database. • Set the clustered databases properties Out of service, In service, and Pending delete: To monitor replication, go to the Replica section of the Statistics subtab, which is also on the Server tab. Look at the replica.cluster.workqueuedepth statistic. This statistic should be as close to 0 as possible. If it is regularly over 0 for long periods of time, you may need to add another CLREPL task. The replica.cluster.workqueuedepth statistic is calculated every 15 minutes, so a long period of time would be 30 to 60 minutes. Add another instance of CLREPL to the ServerTasks= line of the notes.ini file to make it permanent, or start the CLREPL task from the Administrator client as a temporary fix. Just remember that when you decide that you don’t need two or three CLREPL tasks running any longer, you will need to use the Tell CLREPL Quit command at the console. This ends all CLREPL tasks, so you must start again each CRLEPL task that you normally run. No portion of this publication may be reproduced without written consent. – Out of service — Prevents users from opening a database and redirects them to another replica if available. Otherwise, they will get the message: “Access to the database has been restricted by the administrator.” Users already in the database will remain in the database. Once they close the database session, they won’t be able to get back in. The server maintains connections to the database and allows replication to continue. The server can operate on the database to run tasks such as fixup, updall, or compact. Setting a database Out of service is helpful if you need to perform maintenance such as running fixup and don’t want users in the database. Use the setting Server_Restricted to take a server out of service. A value of 1 restricts a server until it is restarted; a value of 2 restricts a server even after it has been restarted. You can also take a server out of service at the console by using the Set Config Server_Restricted=X command. 21 THE VIEW September/October 2002 – – In service — Makes an out of service database available to users. Pending delete — Prohibits new requests to open a database; however, it does allow users currently working in the database to continue working. The database will be removed after the last person has closed her connection to the database and any changes written to this database have been replicated to other replicas in the cluster. Figure 9 shows the Manage Clusters dialog box, where you can set the clustered database properties. To open this dialog box, select the database for which you want to set the properties, select the Tools tab, then select Databases and the Cluster option (or simply right-click Databases and select Clusters). Figure 9 Cluster Dialog Box in Admin Client Tip! If you want to set a server into a busy state for maintenance or to force client requests back to users’ home servers, you can set the notes.ini setting Server_Availability_Threshold to 100. This will force any new connections to another server. replication events. The cluster replicator logs all cluster replication events once an hour. Any cluster replication errors that occurred in the last hour are also logged in the document. You can force the Cluster Replicator to write the information to the log, using the console command Tell CLREPL Log. To get essential information about a cluster, open the Server tab. The Status subtab shows the current tasks running on the server being administered. There should be only one CLDBDIR task running on a server at a time (though I occasionally have seen two tasks running ). If you see more than one CLDBDIR task running, end the tasks and restart CLDBDIR. Cluster Analysis Cluster Analysis enables you to check your cluster to ensure that it is configured correctly and running the way that you configured it. The results of the cluster analysis are written to a Notes database that you specify; by default, it is on your local client. Domino Server Log The Domino server log file, log.nsf, displays records of failover and load balancing, as well as records of 22 There are several different checks that you can perform using this tool. I’ve put checkmarks next to the ones that I recommend running every time. www.eVIEW.com ©2002 THE VIEW. All rights reserved. How to Configure a Domino Cluster the cluster. Returns “failed” if no other replica exists. All it checks for is the existence of one other replica, not a replica on each cluster server. Cluster analysis checks at the server level: ! Consistent domain membership — Checks that all servers in the cluster are members of the same domain, which is a requirement for clustering. ! Number of cluster members — Just what it says. ! Consistent protocols — Checks that all servers are running the same protocols. Cluster members can’t communicate if they don’t have the same protocols. In cases where you have clients using protocols other than TCP/IP, you must ensure that every server in the cluster is using these protocols. If they aren’t, failover and load balancing will not work properly for these clients. ! Required server tasks — Checks the ServerTasks setting in the notes.ini file to ensure that CLDBDIR and CLREPL tasks are listed so that they can start on server startup. Clustering won’t work properly if these tasks don’t start. Run only those reports for which you want to have results. Running just one test can take a considerable amount of time. For example, when I run an analysis, it can take approximately three hours for just a couple of checks. Therefore, I recommend that you run your analysis checks on another PC, go to lunch, or go home. To run a cluster analysis, follow these steps: 1. Open the Domino Administrator client and select the server that you want to check. 2. Select the Server tab, then the Analysis subtab. 3. In the Tools pane, select Analyze > Cluster. The Cluster Analysis dialog box opens. (See Figure 10.) Cluster analysis checks at the database level: ! Consistent ACL — Compares the ACLs of Figure 10 Cluster Analysis Choices databases in the cluster to ensure that the ACLs are the same throughout. Inconsistent or missing ACLs can cause cluster replication or failover problems. ! Disabled replication — Checks to see if cluster replication is enabled for a database so that when a client fails over to the database, no data is missing. ! Consistent replication formulas — Checks for consistent replication formulas for databases that have the same file path. Replicas with the same path should have the same replication formulas. (Note that this is the only check I’m not recommending that you run every time.) ! Replicas exist in the cluster — Verifies that the databases on the current server have replicas in No portion of this publication may be reproduced without written consent. 23 THE VIEW September/October 2002 Figure 11 Cluster Analysis Results 4. To store the results in a database other than the Cluster Analysis database, click Results Database and select the database that you want. • By Cluster — Breaks down the test by cluster, so you can tell how each cluster in your environment is behaving. 5. If you already have a Cluster Analysis database, you can append the new results to the database, or overwrite the existing database, which is the default. • By Date — Presents all tests by date. You will probably use this view the least. • By Test — Groups all clusters by test. It’s the easiest view to read of all three views. (See Figure 11.) 6. Choose the types of reports that you want to run. 7. For databases, choose the details that you want to have included in the database. 8. Click OK to run the analysis and open the database. The report will open automatically when it is completed; you’ll see something similar to the screen in Figure 11. You can view the Cluster Analysis database by opening the database and selecting one of the following views: 24 When looking at the analysis results, you want to look at any failed tests and determine why they failed. The test that fails most often, especially in mail clusters, is “Replicas Exist within Cluster.” Because this test looks for databases that don’t have replicas, it can help you discover: • Databases that have been cluster-enabled by default, but for which you never created a replica • Database replicas that were not removed when users were removed www.eVIEW.com ©2002 THE VIEW. All rights reserved. How to Configure a Domino Cluster • Database replicas that were not created when users were added Looking Ahead to Notes/Domino 6 In Domino 6, there are some improvements in how clustering works and how it’s configured: • • “Flexing Domino’s Clustering Muscles” by James Grigsby, THE VIEW (September /October 1997). • The Lotus Yellowbooks Administering the Domino System and Administering Domino Clusters (available on www-10.lotus.com/ldd; click the Documentation link). These books provide good basic information that will help you get started with clustering. The Cluster Administration task is now a thread in the server task, similar to the Cluster Manager task. This thread is responsible for starting the CLREPL and CLDBDIR tasks at server startup. The CLREPL and CLDBDIR tasks are removed from the ServerTasks= line in the notes.ini file. • You can set the number of CLREPL tasks to run on the server using the new notes.ini setting Cluster_Replicators=x. The default is one task, but if you need more than one Cluster Replicator task, you can change this setting. • You can disable cluster replication on a server using the new notes.ini setting Disable_Cluster_Replication=1. To enable cluster replication on the server, you can either remove the setting or set the value to 0. • (available on www.lotus.com/redbooks). This book uses Windows and Linux as the platforms, but much of the information is good for any platform. The Admin Help database is the same information in electronic format. • Lotus Developer Domain (www-10.lotus. com/ ldd). Look at the LDD Today section for articles on clustering. Much of the earlier information is still valid in R5. In Domino 6, you can set the number of CLREPL tasks to run on the server using the new notes.ini setting Cluster_Replicators=x. The default is one task, but if you need more than one Cluster Replicator task, you can change this setting. You can configure directory assistance for failover if you have replicas of a Domino Directory database within a cluster that are part of a directory assistance database. In the Directory Assistance document under Replicas, specify one of the replicas as enabled for clustering. Conclusion Clustering Resources For more information on clustering, I recommend the following resources: • The IBM Redbook Lotus Domino R5 Clustering with IBM eServer xSeries and Netfinity Servers Perhaps the best thing about Domino clustering is that it provides greatly increased server availability and performance with only a little effort required on the part of the administrator. When we set out to implement our first cluster at my company, it initially seemed like a daunting task. After setting it up, however, we were amazed that “that’s all it took.” No portion of this publication may be reproduced without written consent. 25 THE VIEW September/October 2002 In a future article, I’ll take you deeper into the mysteries of Domino clustering — how you can get the most out of your Domino clusters through monitoring and tuning, how database quotas function in a cluster, how a tool can alleviate some of the pain that quotas can cause, and things to consider when designing an application that will be clustered. 26 Ted Hardenburgh is the senior Notes and Domino Administrator for Apria Healthcare in Lake Forest, CA. Ted is a Principal CLP in R5 Administration and a CLP in R5 Development. He has been involved with Lotus Notes and Domino since release 4. His recent projects include a LearningSpace and iNotes Web Access implementation. Ted has experience with Notes and Domino on the Windows and iSeries platforms, migrations from cc:Mail and R4 to R5, clustering, and security. In the past, he also worked with OS/2, cc:Mail, and LMEF. Ted works in southern California, where he lives with his wife and family. He can be reached at [email protected]. www.eVIEW.com ©2002 THE VIEW. All rights reserved.
© Copyright 2025