Explore LABS


S2.4.1 - SSIS Tutorial - setting up a basic matching process

Previous Article matchIT SQL Index Next Article 

The first thing is to understand the matching levels, to read more on the matching levels - please see this article first


  1. Open up a the package from the previous article that already has key generation set up
  2. Drag and drop a findmatches and groupmatches tasks, connect them to generatekeys in the specified order 
    • Generatekeys --> findmatches --> groupmatches
    • it is important to connect the tasks in the specified order, as one task is dependent on the tables created by the previous task.
  3. Open up findmatches, go through iterations of choosing a level, depending on what fields you mapped back in generatekeys, if you can't choose the level you want, then cancel out and go back to your generatekeys task and double check your mappings
    • you can select individual alone, business alone, all four main levels, or can deselect by clicking again, company only/name only should just be selected by themselves
    • If you want to do a custom level such as name/address/email, then we would still suggest choosing individual level and adding keys/weights for email, and unchecking constraints like 'must match location' to let in email and name matches as opposed to choosing custom and building a level and choosing keys/weights from scratch
  4. Go back to choosing individual only
  5. In order to proceed, you need to also choose a volume level, choose low volume
  6. Click next, stick with the default keys unless you're sure you want to change it, then save.
    • If you go back to the task, switch to business level (assuming your sample data had an organization as well) and then go to the keys tab and refresh the keys, you'll notice that they changed slightly, you can modify existing tasks, but we suggest building from scratch instead of copy/pasting, in this case we don't want to switch matching levels, so click cancel
  7. Open up group matches
    the matching level and source are already read in from the previous task
    You can also chose to 'Exclude Duplicates from Further Processing' by checking the option. This option will prevent any records that were found as duplicates, from being included in any further processing, by matchIT SQL SSIS tasks that may follow. E.g. this option would prevent duplicate records from being included in Mail Sortation or Output. Also, duplicates would not be uploaded to the Hosted Service by the HostedService task, and would not be included in any further matching analysis should any additional matching or overlap tasks follow this task. 
  8. Select the matching groups, duplicates, and deduped tables for output, if you're proficient with SQL, you may just select the matching groups table and can build the other tables based off what is in that table.
  9. Execute the package
  10. go to SSMS, refresh tables – you should now see the deduped, duplicates and matching groups
  11. open up the matching groups table
  12. Take note of the relationship of the ID to the matchref, when they’re equal, this indicates that record is the master record in the group and that record will appear in the deduped table. When the ID and matchref are unequal – this indicates the record is a duplicate and that record will appear in the duplicates table
  13. Open up the duplicates and deduped and the counts of those two tables added up should be the same as the count of your original table
  14. When you rerun the matching, it will just automatically overwrite the results tables each time.  (we do suggest your database be set to a simple recovery model as matchIT does to a lot of reading and writing, and that can easily eat up lots of hard drive space over time)


Previous Article matchIT SQL Index Next Article
Was this article helpful?
0 out of 0 found this helpful


Please sign in to leave a comment.