SOUNDEX "Fuzzy" Match Function
Write a routine to convert a word or a name to a four character Soundex code as described below. The routine developed in this assignment may well be handy in future applications that you may write in Java. The Soundex algorithm is the most commonly used "fuzzy" match algorithm used to look up people's names in lists.
The Soundex algorithm is probably the oldest of the "fuzzy match" algorithms. It was patented twice in the 1920's! It is documented best in Donald Knuth's "Sorting and Searching Algorithms". The basic intent of the Soundex algorithm code is to convert an input word, usually a person's name, into a four-character (one alphabet character and 3 digits) representation of how the word "sounds", rather than depending on exact spelling. The Soundex code for a word is a "many to one" code, i.e. many words will convert to the same Soundex code but there is only one Soundex code for a given word.
The algorithm works as follows:
Code Table:
The following table gives the letters that each code number represents:
Code 0 - A, E, I, O, U, H, W, Y
Code 1 - B, F, P, V
Code 2 - C, G, J, K, Q, S, X, Z
Code 3 - D, T
Code 4 - L
Code 5 - M, N
Code 6 - R
A list of names with the corresponding Soundex codes will be available to use as verification of your Soundex program
The most familiar application of Soundex is its use by the US Bureau of the Census to create an index for individuals listed in the US census records after 1880. The 1880, 1900, and 1910 US Censuses are indexed on microfilm by the National Archives using the Soundex Code. This index was prepared for Social Security purposes in the 1930's. The 1880 Soundex only indexes those households having children ten years of age and younger. The 1900 Soundex indexes all households and the 1910 Soundex indexes all households in only 21 states.
The Soundex index is grouped by all the last names (surnames) for a particular state using the phonetic Soundex Code. In order to use the index you must know the Soundex Code for the Surname you are looking for.