Previous Article | matchIT SQL Index | Next Article |
The first thing is to understand the matching levels, to read more on the matching levels - please see this article first
- Open up a the package from the previous article that already has key generation set up
- Drag and drop a findmatches and groupmatches tasks, connect them to generatekeys in the specified order
- Generatekeys --> findmatches --> groupmatches
- it is important to connect the tasks in the specified order, as one task is dependent on the tables created by the previous task.
- Open up findmatches, go through iterations of choosing a level, depending on what fields you mapped back in generatekeys, if you can't choose the level you want, then cancel out and go back to your generatekeys task and double check your mappings
- you can select individual alone, business alone, all four main levels, or can deselect by clicking again, company only/name only should just be selected by themselves
- If you want to do a custom level such as name/address/email, then we would still suggest choosing individual level and adding keys/weights for email, and unchecking constraints like 'must match location' to let in email and name matches as opposed to choosing custom and building a level and choosing keys/weights from scratch
- Go back to choosing individual only
- In order to proceed, you need to also choose a volume level, choose low volume
- Click next, stick with the default keys unless you're sure you want to change it, then save.
- If you go back to the task, switch to business level (assuming your sample data had an organization as well) and then go to the keys tab and refresh the keys, you'll notice that they changed slightly, you can modify existing tasks, but we suggest building from scratch instead of copy/pasting, in this case we don't want to switch matching levels, so click cancel
- Open up group matches
the matching level and source are already read in from the previous task
You can also chose to 'Exclude Duplicates from Further Processing' by checking the option. This option will prevent any records that were found as duplicates, from being included in any further processing, by matchIT SQL SSIS tasks that may follow. E.g. this option would prevent duplicate records from being included in Mail Sortation or Output. Also, duplicates would not be uploaded to the Hosted Service by the HostedService task, and would not be included in any further matching analysis should any additional matching or overlap tasks follow this task. - Select the matching groups, duplicates, and deduped tables for output, if you're proficient with SQL, you may just select the matching groups table and can build the other tables based off what is in that table.
- Execute the package
- go to SSMS, refresh tables – you should now see the deduped, duplicates and matching groups
- open up the matching groups table
- Take note of the relationship of the ID to the matchref, when they’re equal, this indicates that record is the master record in the group and that record will appear in the deduped table. When the ID and matchref are unequal – this indicates the record is a duplicate and that record will appear in the duplicates table
- Open up the duplicates and deduped and the counts of those two tables added up should be the same as the count of your original table
- When you rerun the matching, it will just automatically overwrite the results tables each time. (we do suggest your database be set to a simple recovery model as matchIT does to a lot of reading and writing, and that can easily eat up lots of hard drive space over time)
Previous Article | matchIT SQL Index | Next Article |
0 Comments