Tk Kanji

Tk Kanji is a graphical user interface to the freely available Kanji dictionary files compiled by Jim Breen at Monash University in Melbourne. It is a very rough cut at an application which could do a lot more. For the time being, it provides a functional browser to the dictionary, a couple of kanji study drills to aid my progress through James W. Heisig's Remembering the Kanji, I: A complete course on how not to forget the Meaning and Writing of Japanese characters, a game to entertain my 3 year old, an example of an internationalized application in Tcl/Tk, and a, no doubt, large number flawed design decisions and bugs which I'll correct when I start doing a better job of earning a living.

Kanji means "Chinese Character" in Japanese. It denotes the set of Chinese pictographs and ideographs adopted for the purposes of writing Japanese starting in the 4th century of the common era. There are 2229 kanji in common use in Japan, according to the Jouyou/Jinmeiyou lists compiled by the Japanese government. The JISX-0208-1990 encoding specifies the 6355 kanji which are most likely to be encountered, which include the Jouyou/Jinmeiyou lists. The JISX-0212-1990 encoding specifies an additional 5801 kanji which are less frequently encountered. Tk Kanji allows you to browse kanjidic and kanjd212, Jim Breen's dictionaries describing the 12156 kanji covered by the JISX encodings.

Tk Kanji is a Tcl/Tk application. Tcl is a scripting language which runs on Unix, Windows 95/98/NT, and Apple MacIntosh computers. Tk is a graphical user interface toolkit based on Tcl which runs on the same platforms. Tk Kanji illustrates the speed with which cross-platform applications can be built in Tcl/Tk, the extent and ease of use of the internationalization features of Tcl/Tk, and a bug in the Tcl internationalization support.

It's hard to appreciate how rapidly a Tcl/Tk programmer can generate useful applications. My study of Kanji began when the books which Amazon shipped on July 31, 1999 arrived. I began building a Kanji study application sometime after that. But work on Tk Kanji could only begin when I finally stumbled onto Jim Breen's dictionaries on August 13, 1999. So in Tk Kanji version 0.1 you are seeing the fruits of no more than 7 days work.

It's also very hard to spot the part of Tk Kanji which makes it a Japanese competent program, so I'll tell you where it is. If you look at the procedures which read files, you'll find the lines:

fconfigure $fp -encoding $encoding
fconfigure $fp -encoding euc-jp

These lines tell Tcl the encoding used in the files it is reading. The second line explicitly specifies that the file uses the Extended Unix Code for Japanese. Knowing the encoding of a file, Tcl is able to read the file and translate its contents into Unicode, a character set which represents all the ways currently used by human beings to write their languages. That, and installation of the appropriate fonts, is all it took to bootstrap Tcl/Tk to the point where it was displaying error alerts with kanji embedded in them.

The bug in Tcl/Tk's internationalization that kanjd212 turned up is that only half of the euc-jp encoding is implemented. The Extended Unix Code is used in Chinese, Japanese, and Korean to mix single byte codes for the Latin alphabet, double byte codes for kanji, an alternate single byte code set for hangul and kana, and an alternate double byte code for even more kanji. The Japanese code is the only one that uses the alternate double byte coding. Tcl/Tk doesn't currently support the alternate double byte coding. I expect this to be fixed in a future release of Tcl/Tk. Tk Kanji uses a work around to read kanjd212, but other files will display the string "\x8f" followed by a kanji from the JISX-0208-1990 character set whenever a JISX-0212-1990 character is encountered. My apologies for any confusion this causes.

Installation

To install Tk Kanji you will need:

Credit Where Due

Thanks are due to several people and organizations.

The work of the Unicode Consortium made all the world's writing systems available for inclusion in computer programs.

The developers of Tcl/Tk made Unicode available to programs running on Unix, Windows, and MacIntosh computers.

GNU and Microsoft supplied the fonts to display the Unicode.

Jim Breen and his students and collaborators at the Nihongo Archives compiled and provided the dictionary which make the program really interesting.

Bruce Gingery, and Larry Virden provided feedback on web page configuration and installation instruction lapses.

Domo arigato.

Copyright

Tk Kanji is Copyright © 1999 by Roger E Critchlow Jr, Santa Fe, New Mexico, USA.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.