Migrating Your Legacy ECM System Sharepoint Part IV

In Part I and Part II of this blog series, we have discussed why and when to consider an Enterprise Content Management (ECM) legacy system migration to SharePoint. In Part III of the blog series we provided planning and migration strategies to help you prepare for a successful migration. Taking that one step further, it is important to reduce risk factors by considering a few key best practices before, during, and after the actual migration.

Best Practices Before Migration

We’ve already touched on migration strategies; specifically planning questions regarding the actual content. However before a migration begins, it’s important to take steps to ensure that the destination platform is ready to rapidly accept potentially large volumes of content.

Storage Platform

For most migrations, the legacy solution has been running for a while, which results in millions of documents that need to be migrated. That means you’re about to blast SharePoint with a lot of content in a very rapid time frame. It is important to take a look the profile of the documents that you will be migrating to see how that will impact your destination SharePoint platform.

  • Identify raw document storage. Analyze the average file size, total number of documents, and total storage size of the content that will be migrated to SharePoint.
  • Identify document load rates. Use historical information from the legacy system to determine document load rates and year-over-year load rate changes.
  • Ensure sufficient storage architecture. Use document volumes and load rates to determine the necessary SharePoint storage, from both a raw storage standpoint as well as storage performance. It’s important to understand that migrating to SharePoint is not a simple 1-to-1 migration calculation. There are several factors that determine how to plan for overhead data. There are a couple of Microsoft TechNet articles that can help with this planning. They can be found here for SP2010 and here for SP2013.

Other SharePoint Farm Resources

During the migration, the destination web servers and SQL database servers will be hit pretty hard by a properly multi-threaded migration solution. It’s important to ensure that the SharePoint web servers, SQL Servers and storage subsystems can handle typical business user requests without being impacted by the significant additional load added by the migration servers.

  • SharePoint web servers. For existing SharePoint users, estimate the impact that a high volume migration solution will have on the existing web servers. If possible, stand up dedicated “target” servers that the migration software can use without impacting end users.
  • SharePoint SQL Servers (log files). During the migration, data will be pushed into content databases at a very rapid pace. Without question, the #1 impact the migration has on the farm is content database LOG FILES. Most SharePoint deployments do not plan for the intense load rate of a migration. Without proper planning, the log files WILL fill up during a migration. This can cause the migration to perform erratically or experience errors and end users will also be affected.
  • SQL Server resources. SQL Server is used by SharePoint for the storage and management of document data. During a migration the content databases are hit pretty hard. Also, a proper migration solution should document an audit trail of all operations performed between the source and destination systems. This can result in a heavy performance impact on the migration database server. For larger migrations it may be necessary to dedicate a SQL Server to handle any migration databases that the migration software uses. It’s always possible to throttle back a migration, but in most cases time is money and the longer a migration runs the higher the migration cost.

Best Practices During Migration
Now that the farm is ready to accept large volumes of content, it’s important to consider a few things that need to be managed during a migration. In general, we need to make sure we’re not impacting end users.

Search / Crawl Impact
When content is added to SharePoint, under most circumstances it’s flagged for crawling during the next incremental, continuous, or full crawl. This can be problematic, particularly when migrating content into a SharePoint web application in which end users are also collaborating. For example, lets say a user uploads a new document to portal.acme.com. If the migration solution is also blasting portal.acme.com with hundreds or thousands of documents per minute, the incremental/continuous crawl is going to be VERY overloaded. This means that the user’s document won’t be available for searching for a lot longer than normal.

Ideally, it’s best not to migrate content into the same “content source” that end users may be using in a live system. The search content source in SharePoint determines what start addresses are crawled by a given crawl schedule. Check out this TechNet article for more information. The bottom line is that you should design the migration such that you’re pushing content into a different web application (or possibly a host named site collection) that is serviced in a completely different content source. For example, if the primary collaboration site is “portal.acme.com”, perhaps the migrated content could be pushed to “archive.acme.com”.

Since we are now pushing our migration content into archive.acme.com we can control the crawl schedule for just that content source. Ideally, the crawl should be completely turned off during the migration. This is because the crawl subsystem will become overloaded with “changed/new” content. This will ultimately result in end user content being buried in the migration content. By turning off all crawls for the migrated content, the crawl subsystem can focus on crawling new/changed content that end users submit.

Monitoring SharePoint

By far, the most important task during a migration is to ensure that SharePoint remains in good health at all times. The following SharePoint metrics need to be monitored regularly during a migration.

  • Content database data and log files. Did I mention that logs fill up fast? Keep an eye on them. Learn what is “normal” based on your migration load rate. Stay ahead of any problems by having a large enough log file to ingest the high volume of transactions between log file backups.
  • Migration libraries. You should be migrating your content in such a way that you are not exceeding SharePoint software boundaries. Specifically, don’t just push all your migrated documents into the root of some poor library. You will kill your performance and make it very difficult to programatically fix the problem due to the way that the SharePoint server side development API works when enumerating content in a given parent object (library, folder, etc).
  • Gut check. Keep an eye on the destination locations for all of your content. Do the overall document counts make sense? Navigate into the libraries. Do you see content? Sometimes it appears nothing is in the libraries when you KNOW you’ve migrated content. For an example, this can happen when a required field is not getting data. SharePoint will accept the document but it will remain in a “checked out” state such that other user accounts (yours) can’t see the content that the migration service account is using to upload the content.
  • Resource Metrics. Keep an eye on CPU utilization, memory usage (available RAM vs Paged RAM), available disk space, and disk I/O. Disk I/O can be monitored by watching Average Disk Queue Length for the volume where the content databases are storing content on the SQL Server. The ADQL should be in the decimal range. If it’s in single digits, it’s probably doing OK. If it’s in high double digits, triple digits, or even quadruple digits, then you’re in a world of hurt. Your disk spindles aren’t fast enough to keep up with your load rate. In addition to bottle-necking your migration, your disk I/O is also affecting your end users. It would be time to throttle back the migration if any of these metrics are severely out of balance.

Best Practices After a Migration

After the migration is complete, there are a few tasks that need to be performed before you consider the data “live”.

  • Back up everything! Do not pass “Go”. Do not “collect your $200” migration bonus. After your migration completes, the very next thing you should do is back up all content databases affected by the migration as well as the migration database. The audit trail results and the status of the content in SharePoint must be in harmony! If a failure of some sort gets the migration database out of sync with SharePoint then you have no way of proving that the migration was successful. So do yourself a favor and execute (and verify) full backups of all migration related databases.
  • Validate your results. Ideally, the migration platform will have a mechanism for performing programmatic validation of content. KnowledgeLake’s tools are able to do this with reliable results for most migrations. I say “most” because there are some documents that SharePoint modifies after they are migrated into the library. When SharePoint modifies a document, it can’t be pragmatically compared to the source document. That said, KnowledgeLake has ways of working around this limitation to get a reliable validation.
  • Have your business users review the migrated content. The business users know the content better than anyone. They have the ability to look at the migrated content and know if something is wrong. Are pages being cut off? Is metadata missing? Are versions or annotations missing? Your business users can help you ensure that the migration was truly successful. The migration engineer is usually not the best person to do this. They are too close to the migration and will take certain assumptions for granted. A migration is only “officially” successful until the business says it is successful.
  • Turn those crawls on. Based on the guidance of this post, you probably migrated to a web application that is serviced by a search content source that is currently disabled. Now is the time to configure the full crawl schedule, kick off a full crawl, and then configure the incremental/continuous crawls (in that order). You want the full crawl to get started. An incremental crawl on tens of thousands, hundreds of thousands, or millions of documents is nothing short of a painful event on the crawl servers! But a full crawl is designed to handle the load of a raw, not-previously-crawled content source.

Wrapping Up

In this post, the important before, during, and after migration concepts were addressed. If your organization has not taken on a migration before, take your time, test, validate, test again, migrate, then validate again. It is very easy to make a mistake that calls into question the reliability of the migration. If there is any doubt, the doubt must be mitigated. In many cases, there are compliance laws that must be taken into consideration. If a lawsuit requires that archive documents be produced and those documents were lost during a migration, it could get very expensive!

KnowledgeLake has a skilled migration staff with extensive experience and a proven track record of successful migrations. Our detailed methodology and flexible tool set provides us with the ability to execute migrations with predictable results time and time again. If you have any questions, don’t hesitate to give us a call. We can help.

Leave a Comment

You must be logged in to post a comment.