DemandTools Module Detailed Help Single Table Deduplicator

Updated February 20, 2007

OVERVIEW

The Single Table Deduplicator allows extremely flexible viewing of potential duplicates within salesforce.com. This detailed document will walk you through selecting an object, criteria and merge process. Alternatively, CRMfusion is now offering a NO DUPES service that you can take advantage of!

***IMPORTANT NOTES ABOUT MERGING***

General Duping Guidelines:

1. Run dupes in the proper order. Ensures most information retained during merging with the least amount of time and effort expended by the CRM Admin.

Recommended Order: Account; Contact; Lead; Lead to Contact; Lead to Account; Opportunity; Custom Objects

2. Use atleast a 3 pass strategy for identifying dupes

-Start with very rigid criteria
-Loosen criteria with each pass
-STD comes with pre-built scenarios for primary objects - These are just a starting point, NEED REVIEW AND MODIFICATION TO SPECIFIC CUSTOMER NEEDS.

3. Objects owned by inactive users

In PRE-1.7 versions of DemandTools, Contacts, Opportunities, Leads, and Cases would only be merged if they were owned by an Active user. For example, if an opportunity within an Account that is being merged into a master Account was owned by an Inactive User, the Opportunity would NOT be merged to the master account. You would need to first change the ownership of the object to an active user, and then run the merge.

In this release, we have added the ability to merge objects owned by inactive users, and KEEP THE OWNERSHIP HISTORY. In step#3 Merge Control, you can select the “Keep Ownership” option under “Inactive Ownership Exceptions”. This will re-activate the inactive user just long enough to transfer the object to the master, and then de-activate the user once again. YOU WILL NEED A SPARE SF LICENSE TO USE THIS OPTION. You also have the option to change the ownership of these objects to the Master Owner, and/or the Current User.

Tasks and Events that are owned by inactive users will be moved in the deduplication process.

4. Archived completed activities

Completed Activities will only be moved if they are less than a year old. Salesforce.com will "Archive" activities older than a year and they are not accessible via the API, and hence, the DemandTools.

You can request from salesforce.com that only activities older than 3 years be archived so you will be able to move tasks and events less than 3 years old. You will need to contact salesforce.com technical support to have this done.

UPDATE:  All requests for archive date changes are evaluated individually and implementation of the request is not guaranteed. Possible ramifications of changes to the archive period include a possible performance impact on activity related reports and list views for the specific organization.  Unarchiving activites will also increase your storage usage.

NEW:  Many customers choose to change the archive period temporarily (i.e. 30 days) then run their first round of de-duplication.  Going forward running"maintenance" dupes (i.e. weekly, monthly etc.), select the "oldest record" as the master.  This assumes that all new duplicates will have been created byrecently added records that would not have any activities greater than one year old.

 

The process to do this as follows:
  1. Create a case salesforce.com with a subject of "unarchive activity request"
  2. In the body of the message specify:
    - the justification for this request.
    - how many years you'd like your activities to remain active (unarchived.)
    - if the previously archived records should be unarchived as well.
  3. The request must be made by a user with the System Administrator profile on the Account.

All requests for archive date changes are evaluated individually and implementation of the request is not guaranteed. Possible ramifications of changes to the archive period include a possible performance impact on activity related reports and list views for the specific organization.

5. In order for objects that have to be cloned vs. reparented to retain the proper CreatedDate and CreatedByID field values, you need to create a case with salesforce.com to allow you to modify system fields.

There is a feature in salesforce.com called "Modifiable System Fields" and is only available when you are creating new records (cannot be used when updating existing records).

The process to do this is as follows:

1. Create a case with Salesforce.com with a subject of "Modifiable System Fields Request"
2. In the body of the message specify the justification for this request. Include as MUCH DETAIL as possible. (i.e. migrating from a legacy system and wanting to keep the integrity of the original data, merging data using DT ).
3. The request must be made by a user with the System Administrator profile on the Account.

All requests for are evaluated individually and implementation of the request is not guaranteed.

HOW TO FIND IT?

Access the Single Table Deduplicator (STD) from the DemandTools Today screen using the menu in the left column. Select the Single Table Deduplication option from this menu or from the pull down menus across the top of the interface to be taken to the Single Table Deduplicators first wizard screen.

WHAT DO YOU WANT TO DEDUPLICATE?

There are two options for running the STD:

  1. Use Scenarios
    1. Pre-loaded Scenario (object and merge settings already selected)
    2. Customize a Pre-loaded Scenario to fit your business needs
    3. Build your own Scenario and save for future use.
  2. Choose the table/object you would like to deduplicate and all associated settings for merging
    1. Select the Object (Table) to Deduplicate
    2. Select the fields to show on the Potential Duplicates List
    3. Develop Criteria to Determine Which Objects in the Table are Searched for Potential Duplicates
    4. Proceed to Mapping Screen, SCREEN 2
    5. Add Mapping Options
    6. Select Fields to Match
    7. Choose a Match Category and Mapping Type
    8. Mapping Options
    9. Searching for Duplicates

 

WHERE TO VIEW THE DUPLICATES AND SELECT MASTER OPTIONS?

  1. View the duplicates
  2. Selecting a master record
  3. Merge Options
  4. Executing the Merge

Known Limitations

    Back to top

 Section 1. Scenarios

The scenarios are listed in the left hand column, descriptions of the scenarios are listed in the middle column and the fields that will be viewable are listed in the right hand column.

 

Load Scenario
Loads the scenario and all settings. User must manually select the next screen for each step.
Run Scenario
Selection of Run Scenario takes the user to the final Single Table Deduplication screen. Duplicates are displayed, master records are selected. User must execute the merge.
Delete Scenario
Deletes the source xml file and removes the file from the interface listing
Save Scenario
Used for newly created or modified scenarios. Scenarios are stored in the Demand Tools directory.
Clear Scenario
Lets you start over with either selecting another scenario or selecting an object to deduplicate

 

    Back to top

Preloaded Scenarios:

Section B. Choose the table/object you would like to deduplicate and all associated settings for merging

Screen 1 of Single Table Deduper - Steps #1 - #4

 Back to top

Step #1 - Select the Object (Table) to Deduplicate

Select the proper salesforce.com database to deduplicate. All tables are available beneath the pull down list and are sorted in alphabetical order. After selecting the object in the pull down menu, press the "Use Object" button to begin.

 Back to top

Step #2 - Select the fields to show on the Potential Duplicates List

The primary goal of step two is to select the fields to display in the found duplicates grid (Screen Three of the STD). Most users will want to display a combination of fields that would help verify that the records are truly duplicates and fields that display the most important data values. DemandTools shows both the table field name as well as the Field Label from the interface. Select the fields in the order you would like to see them when the potential duplicates list is shown. The Order column will be populated numerically to indicate you selections as you make them.

NOTE: DemandTools gives you the ability to show additional fields in the display screen (Screen 3) even after duplicates are located. So, if important selections are forgotten here, you will have another opportunity later to do so as well.

Back to top

Step #3 - Develop Criteria to Determine Which Objects in the Table are Searched for Potential Duplicates

Select either the default radio button "Use All" (to select all objects to search for potential duplicates) or the radio button "Use Conditions" (develop conditions to narrow the search range).

 

After each selection of field and value to narrow by the user must press "Add Condition" so that the expression builder will place the expression in the large expression display box.


Developing conditions to narrow the records in the Table that are being searched for potential duplicates.

Due to a salesforce.com API limitation, it is not possible currently to select records in one table based upon another tables conditions. For example you cannot select Contacts associated with Accounts that start with the letter A.

Back to top

Step #4 - Proceed to Mapping Screen

After selecting the table (object) to deduplicate, the fields to show on the potential duplicates list, and the subset of object to search, select " Screen 2 - Mapping" to proceed.

 

Screen 2 of Single Table Deduper - Steps #5 - # 10

Back to top

Step #5 - Select "Add Mapping Condition"

Mapping conditions consist of Fields to Match, Match Category and Mapping Options.

TIP: To remove a mapping condition, highlight the row by clicking on the box to the left of the Match Blank Values Column and hit Delete on your keyboard.

Back to top

Step #6 - Selecting "Fields to Match "

If you select the MATCH BLANK VALUES checkbox, DemandTools will return records where the values for that particular field are blank in all records that match on the other fields you select for matching.

Select the salesforce.com field that will be used to identify duplicates. Multiple selections of course may be necessary in order to properly determine duplicates with confidence.

Back to top

Step #7 - Choose a Mapping Technique
Select the 'Match Category" to look through the techniques by category or go directly to the "Match Type" (leaving "Match Category" as All) to see a list of all available match types.

The available list of Match Categories includes the following options:

In this version of the STD the following is a list of all available Matching types (techniques), they can be found under their respective "Match Category".


Back to top

Mapping Types Explained in Detail

Accessing Account Name Cleaning List

Replace Value Editor

The Nickname list allows the deduplication tool to see Bill, William, Billy etc as potential duplicates of each other. This list is also customizable by the end user for localization or even in theory for non contact substitution on any field by replacing the nickname list with synonyms.

With relaxed address match, all of the followingl seen as the deduper in the lowest common denominator of "123 Pavillion":

Back to top

Step #8 - Select Mapping Options
In addition to the already covered mapping types there are also options that can be applied to many of them. The STD provides for three different options to be augmented to the appropriate types of mappings. The available options are:

 

Type Description Mapping Types
Phonetics engine capable of analyzing words for how they sound when pronounced. Through a technique of removing vowels and analyzing the remaining consonants the fuzzy engines works very well for matching fields with spelling mistakes. Cleaned Account Name
Exact
FirstName

The transpositional engine allows for fields to appear to be duplicates even if the have differences in their word order. For example Jones, Smith and Jackson will appear to be a duplicate of Jackson, Smith and Jones.

Cleaned Account Name
Exact
FirstName
Street

The alpha cleaner extends some of the capabilities of the account name cleaner to other fields for matching. The alpha cleaner is used when you know you only have ascii (north american) data and you would like to ensure that the only characters that are analyzed are the 26 characters of the english alphabet and the numbers 0-9. Any other character that the field may contain will be ignored and not seen by the deduplication matching algorithms.

Cleaned Account Name
Exact
FirstName
Numeric
Street
Zip 5 and 9

Back to top

Step #9 - Go back to Step 5 and define more Mapping Options until your criteria is complete!

Back to top

Step #10 - Search for Potential Duplicates

Select the "Screen 3 - Merge Control" Button to search for potential duplicates and move to screen 3 to analyze and eventually process.

Screen 3 - The Merge Grid

Back to top

Step #1 - Work in the Grid:   View, analyze, edit, process, single record merge

Step #2 - Choose or design a master rule

The Single Table Deduplicator is packaged at initial installation to have default master rules ready for use for most of the standard salesforce.com objects that are typically chosen for deduplication.   These may include rules like oldest or newest record, most recently modified record, owned by, etc.   Custom objects and less frequently de-duped objects may require that even the simplest rules such as oldest record and newest record will have to be created by the end user.

Select a Master Rule

When using strict deduplication criteria (looking for records that are nearly for sure duplicates) you may decide that (after a quick scan maybe) that you want to process all the records in the grid view.  

In order to process duplicates in this mode you do not need to apply master rules to each duplicate set. Simply select the button "Check All" at the bottom of the grid and all the duplicate sets will be checked.     All records are now checked but no master servant selection has been pre-made.   Masters  will automatically be selected based on the currently selected master rule in the upper right hand of the interface when the end user selects "Merge Checked Objects" and the merge process will be completed.


Currently selected master rule

Back to top

Master Rules Creation and Editing of Master Rules

Any object can have a master rule written for it. When building a master rule you have the option to add unlimited conditions for analyzing either the record (as a whole) or the field values that the record contains.

At the record level, options such as oldest, newest, most recently modified and many other advanced constraints are available. For field values, the STD displays constraints appropriate for the data type of the field being analyzed.

Sample constraints on different field types
Rule Dialog Box

After the setup of the initial field and constraint you have the ability to score the positive incident of the constraint with an arbitrary point value. The default rules that ship with the software are based upon a base 100 scoring system but any system can be used. When adding multiple constraints the record is scored in cumulative fashion with each additional positive expression adding to the total score of the records.   Additionally negative scores can also be applied to penalize a record.


A complex rule with multiple expressions and a score delta set


The final component to building rules after determining the desired expressions is to decide to assign a score delta. The score delta will be used by the deduper to automatically determine master vs. servant records. The value of the score delta determines how many points away a master record has to score from the next nearest servant record in order to automatically be selected as the master.

If the score delta threshold is set and not met:

If no score delta is set:

Back to top

Methods for Applying Master Rules Against Duplicates (Semi-Automatic Deduplication)

In order to apply rules on a record by record basis the end user can right click in the duplicate key set box. This will display the popup menu with the manual merge options on the left and the available master rules in the right hand column. Simply select the master rule to apply to the duplicate set and the records will be scored according to the rules scoring system.

If a score delta has been set, the master record will be automatically selected (pin turns green) for all the records with the largest cumulative point score that surpass the next nearest servant by at least the delta point value.   If no records are determined to be master worthy, then the pins will remain gray (no master selected) but the score of the records will be shown in the interface.

As opposed to right clicking in each duplicate set the interface provides for two more automated processes for apply master rules to larger duplicate sets.

The button "Apply Rule to Checked" will find all records/record sets that have a check mark in the square left box but do not currently have a master vs . servant selection made to them. The records will be scored according to the selected Master Rule and assigned a master vs. servant if the score delta is set and surpassed by any one record.   As above, if no records are determined to be master worthy, then the pins will remain gray (no master selected).

The "Apply Rules to All " button which will process all the records in the duplicate grid (regardless of check status) and score and select all records.

Back to top

Step #3 -Select the merge control options

Update Fields where Master is Empty

Combine Field Values

Merge Objects

Reassign to Master Owner

Options

Non-Master Objects

Object
Default Prefix Field
Account
Name
Contact
Last Name
Lead
Last Name

 

Duplicate Campaign Exception

Inactive Ownership Exceptions

Reassignment

Additional log fields

Merging of custom object sub-objects

Step #4 - Selecting to process records in batch mode

Once the records you wish to merge are checked, and the master records determined (either by score delta or manual selection) select "Merge Checked Objects" to begin the merge process.


Executing the duplicates batch

 

Back to top

 

     

Back to top

Fully Automatic Deduplication

Schedule Automatic Non-Attended Dedupe Scenarios

Instructions for the scheduling of the Single Table Deduplication module are located here: Scheduling Events. The STD is fully capable of working within the Windows scheduler interface and will allow for times execution of scenarios as long as the host cpu has an active internet connection.

     

Back to top

Match Categories

Back to top

Mapping Types

Cleaned Account Name:   Uses the built in Account Name Cleaning List. To see the Cleaning list tool select the "Edit Account Cleaning List" button.   The cleaning list standardizes punctuation, spaces, removes common business prefixes and suffixes. These lists are customizable to your language(s) and/or line of business.

Country Match:   The country mapping type is used to standardize field values for the recognized countries of the world. It makes the long name, 2 digit ISO short form, 3 digit ISO short form and the numeric ISO country value all to appear to be matches of each other.

Domain:   The domain mapping type is used when mapping web pages and/or email addresses. It allows for the independent analysis of the domain information contained within the URL or the email address. For email addresses it uses any information to the right of the @ sign. For web pages it parses the XXXXX.com portion. This tool allows for easy comparison of web page field vs. web page field or email field vs. email field. It also by nature allows for the comparison of email addresses compared to web pages and vice versa.

Exact : The exact mapping type in the Single Table Deduplication tool is exactly that, a 100% match of every character (assuming no options apply).

FirstName:   Uses the built in Nickname List. To see the Nickname tool select the "Edit Nickname List" button at the top of the interface.

The Nickname list allows the deduplication tool to see Bill, William, Billy, etc. as potential duplicates of each other. This list is also customizable by the end user for localization or even in theory for non contact substitution on any field by replacing the nickname list with synonyms.

First XX Letters:   Compares only the first XX letters in a field. Text fields are the only applicable field type.   The user can select as many letters as they would like to compare.

Numeric:   Compares only the numeric values in a field.   Other characters that the field contains, such as spaces or punctuation, will be ignored and not seen by the deduper.   A field with a value of "Apt # 31" is seen to the deduper as only the numeric characters "31".   This is often used with phone number fields, so that (999) 555-1212 will match to 999-555-1212.   In this case the deduper will see this as 9995551212.

Relaxed Address Match:   Parses the street address to the lowest common denominator. Based on North American standards, it has also proved effective with most country address formats.

With relaxed address match the following addresses are all seen in the lowest common denominator of: 123 Pavillion:


Relaxed NA Phone Match: Removes all non-numeric characters and spaces. If the first is a 1 or 0 removes it. If just 7 digits are left use those seven digits, else just return digits 4 - 10. It will not match the "Phone-word" values and will trim off the "SPOT" in the phone number and only look at the numeric portion.

Salesforce.com ObjectID Match :   Used to match any Salesforce.com objects' 15 digit ID to its 18 digit equivalent ID and vice versa.

Street Address Match:   The street address match is a slightly more rigid criteria than the relaxed address match tool. It will ignore the differences in street type short forms such as crescent - cres, road - rd, street – st.

Zip 5 and 9 Match:   This mapping type will automatically match USPS 5 and 9 digit zip codes together without the need to standardize them first to a common number of digits.

Back to top

Mapping Options

Type Description Mapping Types
Phonetics engine capable of analyzing words for how they sound when pronounced. Through a technique of removing vowels and analyzing the remaining consonants the fuzzy engines works very well for matching fields with spelling mistakes. Cleaned Account Name
Exact
FirstName

The transpositional engine allows for fields to appear to be duplicates even if the have differences in their word order. For example Jones, Smith and Jackson will appear to be a duplicate of Jackson, Smith and Jones.

Cleaned Account Name
Exact
FirstName
Street

The alpha cleaner extends some of the capabilities of the account name cleaner to other fields for matching. The alpha cleaner is used when you know you only have ascii (North American) data and you would like to ensure that the only characters that are analyzed are the 26 characters of the english alphabet and the numbers 0-9. Any other character that the field may contain will be ignored and not seen by the deduplication matching algorithms.

Cleaned Account Name
Exact
FirstName
Numeric
Street
Zip 5 and 9

Back to top

Master Rule Options

Field

Constraint

Value

Record

  • Owned by
  • Not Owned by
  • Oldest
  • Newest
  • Most Recently Modified
  • Most Opportunities
  • Most Contacts
  • Equals
  • Not Equal to
  • Prepopulates with sfdc values

Field Name

  • Equals
  • Not Equal to
  • Starts With
  • Does Not Start With
  • Ends With
  • Does Not End With
  • Is Populated By
  • Not Populated By
  • Contains
  • Does Not Contain

This will vary based on the field selected.   If there is a pick list associated with the field the possible values will be shown in a pop-up menu.

 

Back to top

Known Limitations

Object Area Description
Contacts SELF SERVICE USERS When a servant contact is merged into a master contact and the servant contains a self service user the SSU will be inactivated on the servant contact and will be recreated on the master contact. You will lose all login information for the SSU and will have to send the user a new password as this is not moved in the creation of the SSU.

Additionally, if the servant was also a "Super User" you will need to manually re-activate this status on the master contact.
Contracts ACTIVE CONTRACTS The SF.com API does not allow the re-parenting of "Active" contracts. Therefore when duping account records DemandTools CANNOT merge "active" contracts from slave accounts to the selected master account.

As a workaround we have added a master rule for "Most Active Contracts", so you can set the master based on active contracts. When you run your Single Table De-dupe on your accounts you will recieve errors if there are still contracts that could not be re-parented. In the log file, it will indicate which slave accounts had active contracts and that it was unable to delete these slave records. You can then select these using SF.com's merge feature and merge them one at a time.
Activities CREATED DATE In order for tasks and activities older than one year to be moved/merged, the customer must contact salesforce.com to 'open the window' for the API to functionally be able to do this. A 'work' around may be that the customer selects the oldest record as the master.
Contacts CAMPAIGN MEMBER STATUS DATES

When Deduping contacts and opting to use the "Duplicate Campaign Exception - Use Most Recent, if the most recent resord is on the slave, when it is cloned to the master, the "Last Modified Date" (which is also the status date) will become the date of the merge. If the most recent record is on the master, we do not touch the record, so the original date stays intact.

This will happen EVEN If you have "Modify System Fields" turned on by salesforce.com. The only workaround is to create a custom object to track campaign history. There are some 3rd party vendors that have developed software that does this for you. Search the SF AppExchange for a current list.

Use Conditions OWNER ROLE

Using the "Owner Role" in your "Use Conditions" for selecting which objects to check for duplicates will physically convert the Owner Role to the individual user id's.

If you update the members of a Role, you should re-do the "use Conditions" (clear and re-add) so the correct user will be inserted.

     
     
     

 

: