Explore LABS


Configuration UI - Advanced Configuration

Previous Article Cortex Index  

Advanced Configuration

Advanced settings are edited in a popup tabbed dialog.

Matching Tab

Minimum scores Specifies a matching threshold score for each matching level enabled.
Maximum cluster size All processed data is added to clusters. When a record is added to an existing cluster, it's then compared to each record already in the cluster, provided the maximum cluster size has not been exceeded. If the cluster has reached the maximum size, then no more comparisons will be performed on that cluster and it will be logged as a large cluster.

Output Options

Output unique refs only If enabled, then only unique refs are output. If disabled (default), then the output contains a copy of the input data, which can include the unique ref.
Output component scores If enabled, then scores for mapped components are output for each matching level in addition to total scores. If disabled (the default), then only the total score for each matching level is output.
Output exact match scores If enabled, then a total score is output for exact matches that is the sum of the sure score setting for all mapped components plus one. Otherwise the score field is blank for exact matches. Regardless of this setting the component scores for exact matches are always blank.
Output all exact matches When disabled (the default), matching pairs are only output if a record exactly matches the first record of a cluster. If enabled, then all matching pairs are output.
Output highest scores If enabled, highest scores are also output to the Grouped Matching Pairs and Matching Groups output types. This is the highest score achieved by any matching pair within each group. (Conversely, the base score is the lowest score achieved by the pairs within a group and is always output.)
Output duplicates count If enabled, the number of duplicates in each group is output to the Matching Groups output type. This is one less than the number of records in the group.
Output compare results If enabled, the matching matrices indices and acronym match flag are included in the Matching Pairs output type.

Grouping Options

Name bridging prevention Prevents records with different forenames being grouped together because they all match to a record that is missing forename.
Prefix bridging prevention Prevents records with ‘Miss’ and ‘Mrs’ being grouped together because they match to a record with ‘Ms’.
Company bridging prevention Prevents records with different company names being grouped together because they all match to a record with an acronym. (e.g. IBM matches “International Business Machines” and “Injection Blow Moulding”).
Aggressive splitting If enabled, bridging records will be disassociated from all matching records. If disable (the default), bridging records will remain matched to one sub-group of non-bridging records.
Master record identification If enabled (default), the master record in each group is chosen according to: Master Priorities rules, then address length, then lowest UniqueRef. If disabled, the master record in each group is simply the record with the lowest UniqueRef.

Match Keys

Match keys determine how records are clustered. When a new record is added to an existing cluster (containing one or more existing records) the record is compared to each of those existing records. Clusters are used to group potentially matching records.

Keys Lists the keys that will be used to cluster records for matching.
Key types Keys are grouped into ‘exact keys’ and ‘fuzzy keys’. All the records in a fuzzy key cluster are compared to one another. All the records in an exact cluster are automatically considered matching, without needing to compare.
Key fields Each key is a combination of key fields, e.g. Address Key + Premise.
Key functions Functions (such as UPPER, TRIM, etc) can be applied to key fields. Functions are best used with raw input data (names, address lines, postcodes, etc.) rather than with the key fields generated by the Hub engine (NameKey, AddressKey, etc.)
Allow blank keys Key fields can be marked as 'optional' by enclosing them within square brackets, alternatively enabling ‘allow blank keys’ makes all key fields optional. The two methods cannot be used at the same time.
Dynamic keys In overlap mode (and lookup mode) enabling this option instructs Matching to dynamically choose which keys to use (from those defined), on a record-by-record basis, depending on which input columns are populated.

Matching Rules

Matching levels The Matching Rules dialog has one tab for each matching level enabled (Individual, Name only, Family, Address, Business, Company only).
Weights Weights are used when compared records are scored. Weights are configured automatically when the basic configuration settings are specified (nationality, tightness), there is no need to manually configure these weights unless customizations are being made.
Thresholds Scoring thresholds can be applied to provide further matching requirements when two records are compared. It is not recommended to change these settings.


Must match gender When enabled, potential matches will be disregarded if their genders differ. However, if the gender is unknown in one or both of the records, the records will potentially be classed as a match.
Must match suffix When enabled, potential matches will be disregarded if their suffixes differ. However, if the suffix is unknown in one or both of the records, the records will potentially be classed as a match.
Must match joint names When enabled, potential matches will be disregarded if one record has a joint name but the other doesn't. For example, normal behaviour will match "Mr and Mrs J Smith" with "Mr J Smith"; enabling must match joint names will prevent such matches.
Address constraints The address matching constraints (must match location, premise, directional, etc) are now implemented via post-matching rules, so do not need to be configured here.

Matching Matrices

Three dimensional matching matrices are used to decide the level of match records should achieve. In the name matching matrix the three dimensions represent the individual name fields: last name, first name, middle name. In the company matching matrix the three dimensions are name1, name2, and name3. The matrix maps the match type for these individual name fields (equal, both_empty, one_empty, sounds_equal, etc.) to an overall match level (sure, likely, possible, etc.).


Post-Matching Rules

Advanced Post-Matching rules are applied to matching pairs prior to grouping. The Advanced Post-Matching rules only apply to fuzzy compared matches. Each rule specifies both a condition using a SQL-like syntax, plus an action that determines what happens when a condition is satisfied.

Conditions Rule conditions are logical expressions that results in a Boolean (true or false). An expression can be a function – such as “matches(city)” – or a logical operation such as “AddressScore >= 30”, “City == ‘RALEIGH’”. Conditions can consist of a single logical expression or of multiple expressions (combined using “and”, or “or”).
Actions Rule actions are either "Keep" or "Delete". If any successful rule specifies a Keep action, then the match is kept. If any successful rule specifies a Delete action, then the match is deleted, but only if the match isn’t being kept.

Master Priorities

Master priority rules are used to determine which record in a matching group should be marked as the master record (i.e. the best record).


Word Lookup

The Names and Words tables (NAMES.DAT & NAMES2.DAT) control:

  • the matching equivalent of words e.g. Tony = Anthony
  • the gender of forenames e.g. John = Male, Susan = Female, Chris = Either
  • casing rules e.g. PO Box, IBM, 360Science
  • expansion/contraction of abbreviations and correction of typing errors e.g. Svcs = Services, Finacial = Financial
  • attributing type to these and other words e.g. Mr = Prefix, Ltd = Business, FL = State, The = Noise.
Previous Article Cortex Index  
Was this article helpful?
0 out of 0 found this helpful


Please sign in to leave a comment.