Tarja

From TORI
Jump to navigation Jump to search

Tarja is artificial, technical language designed for interpretation of Japanese.

Tarja uses ascii characters and also those Japanese characters, that have unique graphic representation at the free software popular at the beginning of century 21.

The ambiguous symbols, that do not yet have unique graphic representation, are replaced with Hiragana, with Romanji transliterations and or with words borrowed from other languages.

Grammar of Tarja is borrowed from Japanese - at least in cases that causes no ambiguity.

Pronunciation and Semantics of Tarja is kept close to Japanese - as close as possible.

The Tarja-Japanese dictionary is under construction.

Examples

Characters (X2FC3) and (X9CE5) are excluded from Tarja, as they look very similar at some software.
However, words X2FC3 and X9CE5 are elements of Tarja. Also, words tori and とり and "Bird" are elements of Tarja.

In the similar way, none of characters
X2F25
X5973
XF981
is element of Tarja, because, up to year 2025, no unique graphical representation for these characters is established; at many softwares, these characters look the same. Terms «Female», «Onna», «おんな» may be used instead. If the specific character needs to be referred in Tarja, the explicit unicode number X2F25 or X5973 or XF981 can be indicated.

Confusion

The need of Tarja had been revealed when it happened that even a native Japanese speaker looking at pictures of characters, for example,



(mentioned in the previous section) is unlikely to guess:
Which of them is character X2F25?
Which of them is character X5973?
Which of them is character XF981?

The confusions become even worse, if various encoding schemes (not only Unicode) are involved [1]

The difficulty with different Unicode characters having the same picture appears in parallel to that with Kanjis that have not the same picture, but similar pictures [2].

The partial solution is to avoid using confusing characters -
until the font designers provide a unique, exclusive, and easily recognizable image for each character, and these images become a standard, default, predetermined graphic representation of the characters.

Fiction

Initially, Tarja was invented for the post-apocaliptic sci-fi utopia «Tartaria». Tarja is supposed to be used by personages of the fantastic country.

That Utopia is constructed on the assumption, that in century 21, the next collapse of the administrative system of Moscovia (collapse of RF) happens. It is followed by the horrible war («Bigpuf») between the Russian officials; the Russian offees try to eliminate colleagues and competitors and use the «private military companies» (id eat, the personal armies) and all kinds of weapon in that war, without to follow any international convention about protection of civilians nor war prisoners.

As the result of the Bigpuf, the North, Central and East parts of Eurasia are depopulated and contaminated with dangerous viruses and bacteria, mutant animals, mushrooms and plants, toxic chemicals and unstable isotopes. There are several spots, where the contamination is not high, and some people try to live there. With help of the International organizations, the survivals arrange the primitive state «Tartaria» with simple (and a little bit barbarian) Federal Constitution, National Laws and horrible (from the point of view of so-called Western civilization) customs. Tartaria accept all migrants from any country. Many of them are native Chinese speakers or native Japanese speakers, id est, use Kanji for writing. The official language of Tartaria is English, but other languages (Russian, Ukrainian, Japanese, Chinese, Kazakh) are accepted as «reserve languages»; knowledge of at least one reserve language (in addition to the "basic English" ("Tartarian English", "Taren") is generally required for any Tartaria citizen.

Strictly speaking, the «official language» of Tartaria is not English, but some kind of Surjik with almost English grammar, omitted articles and words borrowed from other languages (and transliterated to ascii). In order to avoid confusion with traditional English, the language is called «Tareng» (or «Taren»); and it is closer to the Basic English than to English.

The same apply to other languages and, in particular, to the Tartarian Japanese. Actually, it is not Japanese at all; in order to avoid confusion with Japanese, the language is called «Tarja». Tarja appears as a kind of surjik with Japanese rules of composition and words borrowed from all languages (if the word in other language does not have Japanese homonym), written with KanjiLiberal, CJK (if has no confusion with a KanjiRadical) and/or with Hiragana and/or with Romanji (Ascii). In some cases, to specify the confusive Kanji, its ascii name or its hexadecimal number is indicated.

The concepts of the Tartarian Tarja happen to be useful in learning of real, existing Japanese. In such a way, the development of Tarja can be continued even without referring the sci-fi - at least until the moment, when exclusive, unique picture be elaborated for each 3-byte Unicode character. (Any Japanese character, in general, is encoded with 3 bytes.)

Reality

Development of events in the real history happen to be faster that it is assumed in the Utopia. The Editor expected to have a half-century to finish and debug it without rage. Those expectations fail.

Several horrible customs, expected for the end of the century 21, happen to be already realized in Russia. The Editor suspects, that the bad guys looked at the draft of Tartaria and used it as a guide in the practical activity – in the similar manner, as at the end of century 20, the Soviet veterans, Moscovian fascists had constructed the pahanatВертикаль власти») as an imitation of capitalism, using ideas of utopias «Orwell1984», «Moscow2042», fairy tales «Gelsomino nel paese dei bugiardi» and «Neznaika at Moon» («Незнайка на Луне»), comedy and parody movies («Fantomas», «It's a Mad, Mad, Mad, Mad World», «Strike First Freddy») and, of course, the Soviet propaganda about «totally criminalized» capitalistic countries.

Some ideas of the Russian invasion into Ukraine, and, especially, of the «Dvizhuha», «спецоперация» (since 2022.02.24) seem to be stolen from emulation-parody «Остов Крым» (without to cite the original).

For this reason, Editor stops to upload the new updates of «Tartaria» - at least, the parts that touch the not-traditional methods of the modern fascism, Russki mirРусский мир») and the New World OrderНовый Мировой Порядок»). The fascism can be defeated by militaries of civilized countries (see «I bombed Dresden»), by the Anti-Putin coalition, but not by the scientific open-access site that can be used both by the honest readers and by the apologets of the Russki mir (as a source of ideas for the new war crimes).

However, the Tarja by itself happens to be useful for the interpretation of Japanese and for the translation to other languages; especially if help of the automatic translator(s) is used.

The same refers to translations to Japanese: first, the sentence is generated in Tarja; then it is translated to Japanese using the primitive Tarja dictionary, as the grammar rules are supposed to coincide with those of Japanese.

Such a translation cannot compere with the automatic translations by Google translator or by ChatGPT, but Tarja, as a meta-language, helps to learn the real, existing Japanese.

For the reasons mentioned, at TORI, the Tarja is developed almost without any connection with the post-apocaliptic utopia «Tartaria».
Tarja appears as just a tool, useful for the non-trivial, outstanding research and investigations - following the general ideology of TORI.

Motivation

One of goals of creating of Tarja is to attract attention of both professional teachers of Japanese and professional programmers who develop support of non-Ascii characters, attract their attention to the problem of ambiguity of pictures of the Unicode characters; especially those that are qualified as Kanji.

Language Tarja appears as an ugly patch to cover the big hole, ana, あな, gap, fault, bug in the most of languages-supporting softwarse and the curses of Japanese for those foreigners, migrants who use computers learning Japanese.

As soon as the programmers develop different pictures for different Unicode characters, and these pictures become standard, and the pictures of the same character at different computers become more similar, and pictures of the different characters at the same computer become distinguishable, - then at least part of goals of Tarja will be achieved. This standardization of exclusive pictures for Kanjis may take a century.

While, Tarja appears as a way to boycott each Kanji that have not yet exclusive, unique and standard picture.

The most often confusion happens with similar pictures of KanjiRadical, KanjiLiberal (CJK) and KanjiConfudal.

The intermediate solution could be
1. use of significantly reduced font size for KanjiRadicals (as their pictures appear as parts of pictures of the KanjiLiberals) and
2. suppression of automatic conversion of KanjiConfudal to CJK and use of significantly bigger font size for KanjiConfudal characters, to make them visible (and easy correctable if a KanjiConfudal character appears as an error).
This could help to deal with at least plane Japanese texts that do not change the font setting inside a document.

Warning

Tarja is created as a technical language with scientific and educational purpose.

Description of Tarja is not an attempt to substitute nor to modify the existing, usual Japanese.

However, some concepts of Tarja may be useful for construction of models of Japanese and its evolution in future (increase of concentration of Romanji, reduction of concentration of confusive terms and characters, etc.).

In such a way, Editor keeps his right
to call things with their proper names,
to suggest definitions for ambiguous terms (in particular for euphemisms),
to define new terms as soon as they seem to be useful,
to construct the historic models and
to compare their predictions to later, a posteriori publications and observations.

References

  1. https://www.sljfaq.org/afaq/encodings.html Encodings of Japanese By Alexandre Elias (modified for the sci.lang.japan FAQ) (2025).
  2. https://www.nihongomaster.com/sheets/view/181/the-most-confusing-kanji The Most Confusing Kanji! (2025)

https://en.wikipedia.org/wiki/Kanji Kanji (/ˈkændʒi, ˈkɑːn-/;[1] Japanese: 漢字, pronounced [kaɲ.dʑi] ⓘ ,'Chinese characters'[2][3]) are logographic Chinese characters, adapted from Chinese script, used in the writing of Japanese.[4] They comprised a major part of the Japanese writing system during the time of Old Japanese and are still used, along with the subsequently derived syllabic scripts of hiragana and katakana.[5][6] The characters have Japanese pronunciations; most have two, with one based on the Chinese sound. A few characters were invented in Japan by constructing character components derived from other Chinese characters. After the Meiji Restoration, Japan made its own efforts to simplify the characters, now known as shinjitai, by a process similar to China's simplification efforts, with the intention to increase literacy among the general public. Since the 1920s, the Japanese government has published character lists periodically to help direct the education of its citizenry through the myriad Chinese characters that exist. There are nearly 3,000 kanji used in Japanese names and in common communication.

https://japanese.stackexchange.com/questions/42953/variations-in-the-same-kanji-how-do-you-know-which-one-to-use Variations in the "same" kanji, how do you know which one to use? Asked 8 years, 8 months ago Modified 8 years, 8 months ago (id est 2017 - for y.2025)

https://www.edrdg.org/~jwb/paperdir/kanjicomp.html Kanji and the Computer A Brief History of Japanese Character Set Standards James Breen, Monash University (2025)

Keywords

«CJK», «Confusion», «Japanese», «Kanji», «KanjiRadical», «KanjiLiberal», «KanjiConfudal», «Tarja-Japanese», «Tarja.Questions», «Tartaria», «Unicode», «UTF-8», «SomeU», «[[]]», «[[]]»,