/

/

Data Extraction: The Underestimated Factor in a CRM Data Migration

Data migration from a legacy system or systems is often the wild card item on a CRM services proposal. It can represent the most commonly underestimated set of tasks.

Technically, data migration involves three components: extract, transform, and load. These are commonly referred to as ETL.

Data Extraction

From a non-technical perspective, data migration should involve, among other things, understanding legacy system use cases to preserve key ones; developing field and record exclusion criteria; and making recommendations for improved data management within the new system. But more on the business side of data migration in a future post.

The data extraction component of ETL, which should be easy, can have some unexpected complexities.

An important question to ask of your implementation partner is whether they are a specialist in extracting data from your specific legacy system(s). If your partner is counting on you to provide data extracts, do you have the internal skill sets and/or time?

What happens if you get well into your CRM implementation project and find out that nobody knows how to properly extract data from your legacy system(s)?

When Standard Data Repositories Are Non-Standard

Often, legacy data is in a seemingly “standard” store such as Microsoft SQL Server, MySQL, Postgres, or Oracle, so data extraction should be easy, right?

Not necessarily. Just because a data source is a standard repository doesn’t mean the data itself is in a simple or standard format, or that extraction is a straightforward exercise.

1. SQL Views

What’s exposed in the user interface of a legacy system can, under the hood, be a SQL view that’s comprised of multiple database tables. For someone unfamiliar with the legacy system’s structure, it can take significant effort to determine what’s behind the data displayed to users.

2. Volume of Tables

Legacy CRM systems can sometimes have hundreds of tables with many joins. If your implementation partner doesn’t have experience extracting data from a specific legacy CRM system, they may not be able to identify all the tables from which data needs to be extracted.

3. Formatting Tags

What is readable to a user in a rich text field in the legacy system may have proprietary formatting tags behind the scenes. For example, while later versions of GoldMine run exclusively on Microsoft SQL Server, the memo field in the Mailbox (email) table contains proprietary tags that need to be removed for the email body to be readable in a new CRM system.

4. BLOB Fields

As one example, file attachments can be stored directly in a legacy database within BLOB (binary large object) fields. In this case, attachments can only be extracted via the system’s API — a direct extraction would yield large amounts of meaningless binary data.


If you’re halfway through a CRM implementation project and discover that there’s no one who can properly extract legacy system data, can you call in a data extraction specialist to save the day? Yes, but that specialist may need to understand and engage with the bigger picture.

Data extraction should be part of the overall implementation process and will be suboptimal if performed in isolation or in a time crunch. It’s always better to ensure, ahead of time, that informed data extraction is part of the project plan.