Character

From TORI
Revision as of 14:51, 7 November 2025 by T (talk | contribs) (Created page with "{{top}} Character is a numerical code unit (usually 1-3 bytes) representing a symbolic item of writing. The universal default scheme of encoding of characters is denoted...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Character is a numerical code unit (usually 1-3 bytes) representing a symbolic item of writing.

The universal default scheme of encoding of characters is denoted with term Unicode.

Many characters have assigned a specific glyph that allows to identify the character.

To century 21, there is no one-to-one relation between the glyphs and the characters, this causes confusion.

In Utopia Tartaria, the special font Uniglif is mentioned; it provides one-to-one relation between characters and their glyphs.

In reality, to century 21, such a font is not established, and the default fonts cause a lot of confusions.
In many cases, the glyph does not identify a Unicode character; many characters have not yet assigned a unique exclusive glyph.

The technical language Tarja is sugested as modification of Japanese in order to avoid ambiguous kanjis, to avoid glyphs that have not yet assigned the unique exclusive Unicode characters.

Terminology

Distinction between glyph, character, letter, symbol, and byte is shown in table below:

Term TORI interpretation Typical misuse / ambiguity
Byte The smallest addressable unit of computer memory, 8 bits. Often confused with “character” in one-byte encodings such as ASCII.
Character Numerical code unit (usually 1-3 bytes) representing a symbolic item of writing. Sometimes called “letter” or “symbol.”
Glyph Visible image of a character in some font or handwriting. Commonly confused with “character,” especially in Unicode terminology.
Letter Concept of a particular alphabet; one letter may correspond to several characters (uppercase, lowercase, variants). Used where “glyph” or “character” would be more precise.
Symbol General sign that may represent a concept, not necessarily linguistic. Symbol may combine many characters. Each name can be qualified as a symbol Used broadly in many senses.

The table above is generated using ShatGPT; it may require correction(s) by a professional linguist.

Often terms Byte, Character, Glyph, Letter, Symbol are considered as synonyms and misused.

In TORI, a character is interpreted as primary item; the glyph appear as graphical representations of characters, designed for the Human reading.

In some cases, the glyph allows to identify its character.

In sci-fi «Tartaria», the special font Uniglif is mentioned; it provides one-to-one relation between the glyphs and the Characters.

In reality, many glyphs have not yet assigned a unique exclusive characters; this causes confusions, mistakes, errors. In particular, this refers to Kanjis used in Japanese. The teachers of Japanese, the manuals on Japanese and the dictionaries often just ignore the confusion, making the problem severe and dangerous.

Even if the confusion is revealed in time (before to lead to a catastrophe, disaster), the user of Japanese meets the problem being not prepared to it and has to investigate the case by hiself or by herself.

While no Uniglif (not any its equivalent) is available, the technical language Tarja is suggested to reduce the number of these confusions. It avoids Kanjis glyphs that have not yet assigned a unique exclusive character.

Kanji

In TORI, the additional concept «Kanji» («漢字») is used.
Kanji may refer to a 3-byte character used in Chinese and Japanese and/or Korean language; but also to a glyph that corresponds to at least one of such characters.
Term «Kanji» may also refer to the set of such characters and/or their glyphs.
Characters with numbers since X4E00 to X4DB5 are especially popular in writing; they are denoted with term «CJK»[1] and qualified as «KanjiLiberal» (Named in analogy with the KanjiRadical).
Characters with numbers X3041 - X3096 are qualified as Hiragana and
Characters with numbers X30A1 - X30F6 are qualified as Katakana.
The same refer to their glyphs; they also are denoted with terms «Hiragana» and «Katakana».

Warning

The description above is oriented mainly to the one-byte, two-byte and 3-byte characters. these characters cover the most of glyphs of English and other European languags, Russian, Hebrew, Arabic, Chinese, Korean and Japanese characters.

The special goal of the description refer to Japanese language.

The interpretation above may require correction(s) by a native Japanese speaker.

References

  1. http://www.rikai.com/library/kanjitables/kanji_codes.unicode.shtml CJK unifed ideographs - Common and uncommon kanji ( 4e00 - 9faf)

Keywords

«Character», «ChatGPT», «Chikara», «CJK», «Confuse», «Confusing Glyphs», «Glyph», «Hiragana», «Japanese», «Kanji», «KanjiConfudal», «KanjiLiberal», «KanjiRadical», «Nichi», «Onna», «Tarja», «Unicode», «Uniglif», «X2F3C» «», «X5FC3» «», «X2F92» «», «X898B» «», «XFA0A» «», «⼼confuse», «心confuse»,