Thursday, December 02, 2004

Word Distance

Copied from Javalobby

Here's a small implementation of the Levenshtein Word Distance algorythm written in Java.

The algorythm is good for measuring edit distance between two words and is normally used for spell checking and similarity sorting.


    public static int LD( String s, String t) {

int n = s.length();
int m = t.length();

if (n == 0) return m;
if (m == 0) return n;

int[][] d = new int[n + 1][m + 1];

for ( int i = 0; i <= n; d[i][0] = i++ );
for ( int j = 1; j <= m; d[0][j] = j++ );

for ( int i = 1; i <= n; i++ ) {
char sc = s.charAt( i-1 );
for (int j = 1; j <= m;j++) {
int v = d[i-1][j-1];
if ( t.charAt( j-1 ) != sc ) v++;
d[i][j] = Math.min( Math.min( d[i-1][ j] + 1, d[i][j-1] + 1 ), v );
}
}
return d[n][m];
}


Someone said the N-gram could also give very good results.

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home