It was the time of the year again to do an upgrade of several SharePoint farms for my customer. This upgrade was for installing SP2 and the June cumulative update on an entire farm and the requirement was to avoid too many downtime by the installation. This post will cover the process that I used to upgrade the entire farm.
Farm setup:
- 1 hardware load balancer
- 2 Web Front End servers (I will call the WFE1 and WFE2)
- 1 index server
- 1 SQL cluster
What I needed for upgrading the farm smoothly was a way to put a maintenance page for all web applications to appear when the content is really down. I used the simple ASP .Net trick by creating a – in my case custom - App_Offline.htm file which mentions that the site is down for maintenance. Copying this file into the root location of each IIS website used by SharePoint Web Applications will show this message instead of the SharePoint content.
Another thing I wanted to do is to detach the databases before running the Configuration Wizard. Why? To avoid the upgarde to fail on a single content database and shorten the upgrade time. Once the config wizard completes, I reattach the content databases one by one, causing them to be upgraded at that moment.
preparation tasks:
- create a custom App_Offline.htm file for showing a maintenance page
- create a batch file that conveniently copies the App_Offline.htm file to all Web Applications (make sure not to copy it to the Central Admin WebApp)
- create a batch file that conveniently deletes the App_Offline.htm file from all Web Applications
- create a batch file that detaches all content databases for all Web applications with the exception of the Central Admin and SSP Web Apps ( add stsadm -o preparetomove command before detach database if you are still running MOSS SP1 pre-Infrastructure Update)
- create a batch file that attaches the content databases
Upgrade process:
1. Make sure that the hardware load balancers stops the services for WFE1 and only uses WFE2 to service user requests. We have an internal procedure that allows for manipulation of the load balancer. Actually we simply need to stop a custom IIS web site on the WFE server which will cause the load balancer to failover to the second WFE automatically.
Availability Result: Users are still able to access SharePoint content through WFE2.
Timing result: this operation took 2 minutes
2. Install the binaries for your SharePoint upgrade on WFE1. In my case WSS SP2 + MOSS SP2 + all SP2 versions of the WSS and SharePoint Language Packs and finally installing the June Cumulative Update for WSS and MOSS. When installation completes, reboot the server.
Availability Result: Users are still able to access SharePoint content through WFE2.
Timing result: this operation took 50 minutes
3. Simultaneously install the same binaries on the index server. When installation completes, reboot the server.
Availability Result: Users are still able to access SharePoint content through WFE2.
Timing result: this operation took 40 minutes
OK So far so good. So basically, at this point, I have installed the binaries on 2 servers and I still have 1 to go, which is WFE2 that is still serving the SharePoint sites. I have two possibilities to continue:
- option1: install the binaries on WFE2 and reboot
- option2: run the configuration wizard on the upgraded WFE1 or the index server.
Option 1 will take all the sites down, because the installation of new binaries will stop IIS = Downtime and 404 errors. I cannot redirect my users to the upgraded WFE1, because the configuration Wizard has not run yet. So I am working with option 2
4. on WFE2 I launch my script that sets all my sites in maintenance mode (copies the App_Offline.htm file, that is)
Availability Result: Users are not able to access SharePoint content, but they receive a nice page stating that their site is down for maintenance through WFE2.
Timing result: this operation took 1 minute
5. on WFE2 I launch my script for detaching all content databases
– this script launches a stsadm -o preparetomove command for each content database (except Central Admin and SSP databases). This command is no longer required if you have at least SP1 with the Infrastructure Update installed.
– this script launches a stsadm -o deletecontentdb command for each content database (except Central Admin and SSP databases)
Availability Result: Users are still not able to access SharePoint content, but they receive a nice page stating that their site is down for maintenance through WFE2.
Timing result: this operation took 5 minutes ( I had 5 content databases)
6. on WFE1, run the SharePoint Products and Technologies Configuration Wizard.
If the upgrade process fails, investigate the log specified by the wizard, but also check 12-Hive\LOGS\Upgrade.log and the default SharePoint ULS logs. I have already seen that the SharePoint logs are written to the 12-Hive\LOGS folder instead of the location you specified in Central Admin during this upgrade process. After the upgrade your specified Logging location is used again.
Availability Result: Users are still not able to access SharePoint content, but they receive a nice page stating that their site is down for maintenance
Timing result: this operation took 15 minutes
7. Now that the farm configuration databases have been upgraded, your WFE1 is ready to start serving users again as soon as the content databases have been reattached. So, on WFE1 I launch my script to reattach the content databases. If one the operations generate an error, you can find the specific error in the 12-Hive\LOGS\Upgrade.log file.
Availability Result: Users are still not able to access SharePoint content, but they receive a nice page stating that their site is down for maintenance
Timing result: this operation took 10 minutes.
8. Make sure that the hardware load balancers starts the services for WFE1 and stops the services for WFE2 to service user requests.
Availability Result: Users are again able to access SharePoint content through WFE1.
Timing result: this operation took 2 minutes
My upgrade status is now complete with regards to the SharePoint content. My farm is servicing users again through a single Web Frontend Server for the moment, but it is servicing which is my main concern at this point. I no longer have downtime towards my users. If you add up all the minutes, then I have had a downtime towards my users of 33 minutes, which can be considered a small downtime. Now I continue with the rest of the upgrade process.
9. WFE2 is free now to do with whatever I want since it is no longer included in the load balancer pool.
- first, I launch my script to deactivate the site maintenance which simply deletes all App_Offline.htm files
- Next, I Install the binaries for the SharePoint upgrade on WFE2 + Reboot the server
Availability Result: Users are able to access SharePoint content through WFE1.
Timing result: this operation took 50 minutes
10. While WFE2 is installing the new binaries, I can run the SharePoint Products and Configuration Wizard on the index server.
Availability Result: Users are able to access SharePoint content through WFE1.
Timing result: this operation took 6 minutes
11. Run SharePoint Products and Configuration Wizard on WFE2
Availability Result: Users are able to access SharePoint content through WFE1.
Timing result: this operation took 8 minutes
12. Final step: Add WFE2 back into the load balancer pool
Conclusion:
Although the entire operation took about 4 hours, there was a downtime of only 33 minutes for our users and furthermore our users did not hit any 404 pages, but received a nice site maintenance page telling them exactly what is going on. Needless to say, that my customer was satisifed with the result for the downtime
Hopefully this process is of any use to you guys.
Sample of my script files as requested by KbNk:
The maintenance mode script and the de-reattach scripts are simple batch files (*.bat).
Here is a sample for the scripts:
example data:
-> 1 Web Application with url http://webapp1.contoso.local
-> SQL server name: sqlserver01
-> content database name for the webapp: wss_content_webapp1
-> IIS Site directory location on file system: E:\IIS\mywebapp
- maintenance mode on script = simple copy command, no rocket science
copy e:\App_Offline.htm E:\IIS\mywebapp\
- maintenance mode off script
del e:\IIS\mywebapp\App_Offline.htm
- detach database batch file sample:
stsadm -o preparetomove -contentdb sqlserver01:wss_content_webapp1 -Site http://webapp1.contoso.local (Remove this line if you have SP1 wth Infrastructure Update or later installed)
stsadm -o deletecontentdb -url http://webapp1.contoso.local -databaseserver sqlserver01 -databasename wss_content_webapp1
- attach database bacth file sample:
stsadm -o addcontentdb -url http://webapp1.contoso.local -databaseserver sqlserver01 -databasename wss_content_webapp1