How to Calculate Edit Distance Using Python
Calculate the Edit Distance
To calculate the edit distance between two strings using Python, you can use the Levenshtein
library. Here’s an example of how you can use it:
1 | # pip install python-Levenshtein |
This will output the edit distance between the two strings, which in this case is 3 (since three operations are required to transform “kitten” into “sitting”: change “k” to “s”, insert “i”, and insert “g”).
Normalise the Edit Distance
You can also use the distance method to calculate the normalised edit distance by setting the optional normalised parameter to True. This will return the edit distance as a float in the range [0, 1], where 0 means the strings are identical and 1 means they are completely different.
1 | import Levenshtein |
The output will be a float representing the normalized edit distance between the two strings. The value will be in the range [0, 1], where 0 means the strings are identical and 1 means they are completely different. In this case, the output will be a value close to 0.42, since three operations are required to transform “kitten” into “sitting”: change “k” to “s”, insert “i”, and insert “g” and the max length of two strings is 7.
Troubleshooting
If python raises an error like the one below, you should either upgrade the library or define your own function.
1 | Traceback (most recent call last): |
It looks like you are using an older version of the python-Levenshtein library that does not support the normalized parameter. The normalized parameter was added in version 0.12.0 of the library.
To fix the error, you can either upgrade to the latest version of the library by running pip install python-Levenshtein –upgrade, or you can calculate the normalized edit distance manually by dividing the edit distance by the length of the longer string.
Here’s an example of how you can do this:
1 | def get_normalized_edit_distance(s1, s2): |