Explore LABS


M4 - Names and Words

Names and Words Tables

The Names and Words tables (NAMES.DAT & NAMES2.DAT) control:

  • the matching equivalent of words e.g. Tony = Anthony
  • the gender of forenames e.g. John = Male, Susan = Female, Chris = Either
  • casing rules e.g. PO Box, IBM, 360Science
  • expansion/contraction of abbreviations and correction of typing errors e.g. Svcs = Services, Finacial = Financial
  • attributing type to these and other words e.g. Mr = Prefix, Ltd = Business, FL = State, The = Noise.

These are fixed-width text files. The layout of the NAMES.DAT & NAMES2.DAT files is as follows:

Property Width Description
TYPE 1 Type of entry – see below
NAME 25 Matching equivalent of the entry (e.g. 'Anthony' has a matching equivalent of 'Tony', enabling these two names to be matched)
EQUIVALENT 10 The word which is actually looked up
GENDER 1 Indicates the gender of the forename or prefix
SALUTATION 2 Indicates the type of salutation to be generated for a particular prefix – see below
PROPER CASE 30 Proper case value for the entry
SWITCH 1 Indicates whether this entry is the first part of a two word lookup

Note: matchIT will only look up the word in the Equivalent column, not the Name column. This means that all names must have an entry with name equal to Equivalent.

The different types that can be entered in the table are as follows:

Type Description
'A' Address Word, such as "Rd" or "Street"
'B' Business word, such as "Ltd" or "Printers"
'C' UK county, such as "Kent" or "Glos"
'E' Exclusion word, such as "Deceased" or "Moved"
'F' Female forename (note the gender has to be set for these entries too)
'I' Initials, such as "E" or "W"; these entries are in the table as they may be the first part of a two word phrase, such as "E Midlands"
'J' Job title word, such as "Manager"
'M' Male forename (note the gender has to be set for these entries too)
'N' Noise word (i.e. ignored when generating keys or address matching), such as “The” or “House”
'O' Overseas i.e. foreign country
'L' Local country, such as "UK" or "Scotland"; this enables local countries to be identified as countries, without the record being marked as foreign
'P' Prefix, such as "Mr" or "Captain" (note the gender has to be set for these entries too, also the SALUTATION TYPE – see below)
'Q' Qualification word, such as "PhD" or "ARICS"; these entries typically always need a proper case entry as casing of qualifications can be unusual
'S' Special casing word, i.e. a word that is cased unusually but doesn't fall into any of the above categories, such as "PhotoMe"
'T' State or province, such as "Pennsylvania" or "PA"
'U' Unknown word; this is for the first word of a two word phrase, which, on their own, have no special meaning, such as the "Hong" in "Hong Kong"

Each prefix entry must have a salutation type associated with it. The following list shows the salutation types, along with an example of the type of salutation that will be generated:

Type Rule Example
S Dear Prefix Surname Dear Mr Smith
C Dear Prefix Surname Dear Mr Smith
FS Dear Prefix Forename Surname Dear Mr John Smith
FF Dear Forename Dear John
F Dear Prefix Forename Dear Sir John
B Dear Prefix Dear Sir
T Prefix My Lord

Salutation type C is different from type S in that it is treated as a name even if it is found in address lines 1 or 2 with Scan Address Lines for Names set. This means that if the option is switched on and e.g. MR has salutation type C, then Mr J Smith would be identified as a name in address line 1 or 2, whereas if MR has salutation type S, then it would not be identified.

Additionally, each prefix, male forename and female forename must have a gender associated with it, taking a value of either ‘M’ (Male), ‘F’ (Female), or ‘E’ (Either).

These tables are stored in a fixed width format that can be edited via any text editor (including, for example, Notepad, Notepad++, and Programmer’s Notepad).

Note: if you inadvertently change the record length or field positions of these files, it may cause a failure in the matchIT API. You may find it useful to ensure that your text editor is set to display whitespace characters when editing these files.


Surnames and Towns Tables

Surnames Table

SURNAMES.DAT - used for casing surname prefixes such as "de" in Charles de Gaulle. To add or modify entries, follow the layout of the existing entries.

Towns Table

TOWNS.DAT - used for extracting towns from address lines to a specific Town field, also for upper casing Towns. This file is available for UK "post towns" only i.e. defined as such by Royal Mail.


Was this article helpful?
0 out of 0 found this helpful


Please sign in to leave a comment.