Levenshtein Automaton

Generation

The data structures JAR comes packaged with serialized instances of distance 1, 2, and 3 LevenshteinAutomaton. This saves developers the time and processing power of generating their own automata. The real speed behing a Levenshtein automaton is best realized when the generation of the automaton is done prior to error correction.

However, generating instances of LevenshteinAutomaton can be done by the developer. Note: distances greater than 3 may require extremely large heap spaces.

import com.infiauto.datastr.auto.LevenshteinAutomaton; int edit_distance = 2; LevenshteinAutomaton.generateLevenshtein(edit_distance);

The serialized LevenshteinAutomaton is written to the current working directory.

Error Correction

The first step to using a LevenshteinAutomaton for error correction is to organize the list of known, correct strings into a dictionary. This implementation requires all strings to be stored in an instance of the DictionaryAutomaton class.

import com.infiauto.datastr.auto.DictionaryAutomaton; import java.util.Arrays; import java.util.List; String[] words = new String[] { }; List<String> word_list = Arrays.asList(words); DictionaryAutomaton dictionary = new DictionaryAutomaton(word_list);

The second step is to load a pre-generated instance of LevenshteinAutomaton.

import com.infiauto.datastr.auto.LevenshteinAutomaton; int edit_distance = 2; LevenshteinAutomaton la = LevenshteinAutomaton.loadLevenshteinAutomaton(edit_distance);

The final step is to call the recognize method of the LevenshteinAutomaton, passing in the DictionaryAutomaton and the candidate string.

import java.util.Collection; String input_string = "misspelld"; Collection<String> corrected_strings = LevenshteinAutomaton.recognize(input_string, dictionary);

Documentation

  1. DictionaryAutomaton JavaDoc
  2. LevenshteinAutomaton JavaDoc
  3. Levenshtein Distance and Levenshtein Automata
  4. http://en.wikipedia.org/wiki/Levenshtein_transducer