Update a SharePoint farm with minimal site downtime

It was the time of the year again to do an upgrade of several SharePoint farms for my customer. This upgrade was for installing SP2 and the June cumulative update on an entire farm and the requirement was to avoid too many downtime by the installation. This post will cover the process that I used to upgrade the entire farm.

Farm setup:
– 1 hardware load balancer
– 2 Web Front End servers (I will call the WFE1 and WFE2)
– 1 index server
– 1 SQL cluster

What I needed for upgrading the farm smoothly was a way to put a maintenance page for all web applications to appear when the content is really down. I used the simple ASP .Net trick by creating a – in my case custom –  App_Offline.htm file which mentions that the site is down for maintenance. Copying this file into the root location of each IIS website used by SharePoint Web Applications will show this message instead of the SharePoint content.

Another thing I wanted to do is to detach the databases before running the Configuration Wizard. Why? To avoid the upgarde to fail on a single content database and shorten the upgrade time. Once the config wizard completes, I reattach the content databases one by one, causing them to be upgraded at that moment.

preparation tasks:
– create a custom App_Offline.htm file for showing a maintenance page
– create a batch file that conveniently copies the App_Offline.htm file to all Web Applications (make sure not to copy it to the Central Admin WebApp) 
– create a batch file that conveniently deletes the App_Offline.htm file from all Web Applications 
– create a batch file that detaches all content databases for all Web applications with the exception of the Central Admin and SSP Web Apps ( add stsadm -o preparetomove command before detach database if you are still running MOSS SP1 pre-Infrastructure Update) 
– create a batch file that attaches the content databases

Upgrade process:

1.  Make sure that the hardware load balancers stops the services for WFE1 and only uses WFE2 to service user requests. We have an internal procedure that allows for manipulation of the load balancer. Actually we simply need to stop a custom IIS web site on the WFE server which will cause the load balancer to failover to the second WFE automatically.
   Availability Result: Users are still able to access SharePoint content through WFE2.
   Timing result: this operation took 2 minutes

2. Install the binaries for your SharePoint upgrade on WFE1. In my case WSS SP2 + MOSS SP2 +  all SP2 versions of the WSS and SharePoint Language Packs and finally installing the June Cumulative Update for WSS and MOSS. When installation completes, reboot the server.
   Availability Result: Users are still able to access SharePoint content through WFE2.
   Timing result: this operation took 50 minutes

3. Simultaneously install the same binaries on the index server. When installation completes, reboot the server.
   Availability Result: Users are still able to access SharePoint content through WFE2.
   Timing result: this operation took 40 minutes

OK So far so good. So basically, at this point, I have installed the binaries on 2 servers and I still have 1 to go, which is WFE2 that is still serving the SharePoint sites. I have two possibilities to continue:
– option1: install the binaries on WFE2 and reboot
– option2: run the configuration wizard on the upgraded WFE1 or the index server.

Option 1 will take all the sites down, because the installation of new binaries will stop IIS = Downtime and 404 errors. I cannot redirect my users to the upgraded WFE1, because the configuration Wizard has not run yet. So I am working with option 2

4. on WFE2 I launch my script that sets all my sites in maintenance mode (copies the App_Offline.htm file, that is)
   Availability Result: Users are not able to access SharePoint content, but they receive a nice page stating that their site is down for maintenance through WFE2.
   Timing result: this operation took 1 minute

5. on WFE2 I launch my script for detaching all content databases
   – this script launches a stsadm -o preparetomove command for each content database (except Central Admin and SSP databases). This command is no longer required if you have at least SP1 with the Infrastructure Update installed.
   – this script launches a stsadm -o deletecontentdb command for each content database (except Central Admin and SSP databases)
   Availability Result: Users are still not able to access SharePoint content, but they receive a nice page stating that their site is down for maintenance through WFE2.
   Timing result: this operation took 5 minutes ( I had 5 content databases)

6. on WFE1, run the SharePoint Products and Technologies Configuration Wizard.
If the upgrade process fails, investigate the log specified by the wizard, but also check 12-Hive\LOGS\Upgrade.log and the default SharePoint ULS logs. I have already seen that the SharePoint logs are written to the 12-Hive\LOGS folder instead of the location you specified in Central Admin during this upgrade process. After the upgrade your specified Logging location is used again.
   Availability Result: Users are still not able to access SharePoint content, but they receive a nice page stating that their site is down for maintenance
   Timing result: this operation took 15 minutes

7. Now that the farm configuration databases have been upgraded, your WFE1 is ready to start serving users again as soon as the content databases have been reattached. So, on WFE1 I launch my script to reattach the content databases. If one the operations generate an error, you can find the specific error in the 12-Hive\LOGS\Upgrade.log file.
   Availability Result: Users are still not able to access SharePoint content, but they receive a nice page stating that their site is down for maintenance
   Timing result: this operation took 10 minutes.

8. Make sure that the hardware load balancers starts the services for WFE1 and stops the services for WFE2 to service user requests.
   Availability Result: Users are again able to access SharePoint content through WFE1.
   Timing result: this operation took 2 minutes

My upgrade status is now complete with regards to the SharePoint content. My farm is servicing users again through a single Web Frontend Server for the moment, but it is servicing which is my main concern at this point. I no longer have downtime towards my users. If you add up all the minutes, then I have had a downtime towards my users of 33 minutes, which can be considered a small downtime. Now I continue with the rest of the upgrade process.

9. WFE2 is free now to do with whatever I want since it is no longer included in the load balancer pool.
– first, I launch my script to deactivate the site maintenance which simply deletes all App_Offline.htm files
– Next, I Install the binaries for the SharePoint upgrade on WFE2 + Reboot the server
   Availability Result: Users are able to access SharePoint content through WFE1.
   Timing result: this operation took 50 minutes

10. While WFE2 is installing the new binaries, I can run the SharePoint Products and Configuration Wizard on the index server.
   Availability Result: Users are able to access SharePoint content through WFE1.
   Timing result: this operation took 6 minutes

11. Run SharePoint Products and Configuration Wizard on WFE2
   Availability Result: Users are able to access SharePoint content through WFE1.
   Timing result: this operation took 8 minutes

12. Final step: Add WFE2 back into the load balancer pool

Conclusion:
Although the entire operation took about 4 hours, there was a downtime of only 33 minutes for our users and furthermore our users did not hit any 404 pages, but received a nice site maintenance page telling them exactly what is going on. Needless to say, that my customer was satisifed with the result for the downtime

Hopefully this process is of any use to you guys.

 

Sample of my script files as requested by KbNk:

The maintenance mode script and the de-reattach scripts are simple batch files (*.bat).

Here is a sample for the scripts:

example data:

-> 1 Web Application with url http://webapp1.contoso.local
-> SQL server name: sqlserver01
-> content database name for the webapp: wss_content_webapp1
-> IIS Site directory location on file system: E:\IIS\mywebapp

– maintenance mode on script = simple copy command, no rocket science
   copy e:\App_Offline.htm E:\IIS\mywebapp\

– maintenance mode off script
   del e:\IIS\mywebapp\App_Offline.htm

– detach database batch file sample:
   stsadm -o preparetomove -contentdb sqlserver01:wss_content_webapp1 -Site http://webapp1.contoso.local  (Remove this line if you have SP1 wth Infrastructure Update or later installed)
   stsadm -o deletecontentdb -url http://webapp1.contoso.local -databaseserver sqlserver01 -databasename wss_content_webapp1

– attach database bacth file sample:
   stsadm -o addcontentdb -url http://webapp1.contoso.local -databaseserver sqlserver01 -databasename wss_content_webapp1

11 thoughts on “Update a SharePoint farm with minimal site downtime

  1. Hello, thanks for great post, just two questions:
    1.can you share also your scripts ? The maintenance mode one and de-reattach one?
    2.is it OK in this case to run and finish config wizard one by one ? And not to run on 1st server, ten on the other .. etc ? Thanks

  2. Hi KbNk,

    1. on your request, I have added some samples of the batch files I used. Of course I cannot post my complete batch files, but with the information you have here you have enough to build your own. All the commands you need are there.

    2. for your second question, it is indeed OK to only run the Configuration Wizard and complete it one server at a time. Only know that as soon as you have completed it on the first server, your Web front end servers that have not yet run the Configuration Wizard will most probably refuse to server your SharePoint content sites until you have upgraded run the configuration wizard on them as well. If I remember correctly, you get an error message indicating that the configuration database is of a newer version than the system.

    Thanks for the feedback anyway.

    Dirk

  3. Thanks for this amazing steps, I have very very similar configuration and very same patched to be installed, as you might remember, and with this I was able to patch our test servers with 50min downtime even it took 12 hours for those slowww machines to install binaries .. thanks again!! i am looking forward to patch production 😉

  4. Hello again KbNk 🙂

    I have missed that post 🙂
    With your comment I ran straight away to my favorite Microsoft Premier Support engineer who indeed confirmed that you do no longer need to use this command anymore.

    I will update my post immediately accordingly 🙂

    Thanks for your comment

    Dirk

  5. Hi Dirk,

    Need your help here. I’m going to apply SP2 to a SharePoint farm with 1 CA and 7 WFEs, so I can’t really refer to your article bcoz you didn’t mention the CA server and only 2 WFEs. Are the basic rules below correct?

    1. SP2 binaries can be installed simultaneously
    2. Configuration Wizard can only be installed in 1 server at a time

    Then as per your article, you divide the WFEs to 2 parts, WFE1 and WFE2. In my case WFE1 would be my CA and WFE1, and WFE2 would be WFE2-WFE7, is this correct?

    Thanks before.

  6. Hi Vincent,

    Your scenario will depend on the number of WFE you need to have available at all times.

    I would select a number of WFE that need to available quickly, let’s say 3.

    Take the group of 3 WFE offline (out of the load balancer) ans start installing the binaries on them including the CA server, no config wizard at this time on any of them. In the meantime your users will still be able to access data through the 4 remaining. Once the binaries installed on the 3 WFE’s, activate your maintenance page on the remaining 4 WFE = site down and immediately start running the config wizard on the CA, then the first upgraded WFE. Once done, add this WFE to the load balancer again and remove the 4 WFE that were serving the maintenance page. At this time you will have 1 WFE available to your users for serving content = sites available.

     Continue to the next upgraded WFE (2 out of 3) and run the config wizard after which you add it too to the load balancer. In the meantime upgrade the 4 remaining WFE and don’t forget to remove the maintenance page. Again do not run the config wizard on them. If you feel that you really do need a minimum number of WFE available because of high usage, then do not add upgraded WFE to the load balancer before you have the required number of WFE upgraded.

    Hope  this helps you out.

    Best regards,

    Dirk

  7. Great post Dirk,
    Is there any way to do this for a single site? That is, not for a Web Application, or Site Collection, but for a site? I’m looking for a way to take a site off-line to intranet users while allowing internal users (site administrators) to rework permissions on a site.

  8. Hi Terry,

    not out of the box. Possible ways I am thinking of that might be worth looking into are:
    – URL redirection or rewriting to make IIS redirect you to a different location if it detects the URL to the subsite in the HTTP request
    – Redirect page as home page of the site ( will only redirect the homepage though )
    – export / import the site to a different web application and delete the original one during the updating.
    – rename the subsite and change the url

    hope this helps

Leave a Reply

Your email address will not be published. Required fields are marked *