Friday, November 10, 2006

Running the interwiki bot for Wiktionary

I run the interwiki functionality of pywikipedia bot on all the Wiktionaries. It is a thing that I started and it is the kind of public service that needs doing. It links all the words that are spelled the same by adding "interwiki" links. These are the things that you see at the left hand side where it is indicated that there is also information in another language.

I have done this now for over a year and what I just noticed is the amount of words that I do not understand is growing rapidly. On the one hand it is to be expected as it is in line with the rapid growth of projects like the Vietnamese wiktionary. What now starts to happen more is that multiple wiktionaries have words together. That is what I see when I watch the bot.

In a way it would be fun to have WiktionaryZ in there. Currently we have 159.004 Expressions and we have 10.557 DefinedMeanings. Based on the expressions we would be the fourth project in size, it would be more reasonable to use the DefinedMeaning for the comparison and this would have us as the 26th in size.

Comparing Wiktionary with WiktionaryZ is like apples and oranges. Where Wiktionary has each word only once, WiktionaryZ counts them as existing in a language. Where there can be many red-linked articles on a Wiktionary page, the WiktionaryZ expressions are implicitly there.

It makes better sense to appreciate what the implications are of the numbers. In lexicology size counts. Only when people have a good chance of finding the information they are looking for will they find a resource useful. It is one reason why it makes sense to concentrate on certain topics or domains. WiktionaryZ is rich in ecological terminology due to the information that we got by including the GEMET thesaurus. By working on the OLPC children's dictionary we get a lot of the basic stuff that is the bread and butter of dictionaries.

No comments: