Difference between revisions of "Sandbox"
Jump to navigation
Jump to search
(Replaced content with "{{top}} test .. ج: صباح الخير، يا جون! تفضل، اجلس. J: Good morning, John! Please, sit down. د: شكراً، يا عمار. سمعت أن ا...") Tag: Replaced |
|||
| Line 1: | Line 1: | ||
{{top}} |
{{top}} |
||
| + | test |
||
| − | [[GlyphChatGPT]] is version of article «[[Glyph]]» created by [[ChatGPT]]. |
||
| − | Something seems to be wrong with the interface; the most of content of the prototype seems to be lost. |
||
| + | .. ج: صباح الخير، يا جون! تفضل، اجلس. |
||
| − | '''Glyph''' is a visible [[picture]], a shape that represents one or more [[character]]s. |
||
| − | Each glyph has its own visual form, while a '''character''' is a numerical code unit — an integer, usually expressed in one, two, or three [[byte]]s. |
||
| + | J: Good morning, John! Please, sit down. |
||
| − | In a consistent writing system, every visible glyph should correspond to one and only one character, and vice versa. |
||
| − | In practice, however, this one-to-one mapping is often violated, producing confusion and errors in [[text processing]], [[education]], and [[information exchange]]. |
||
| + | د: شكراً، يا عمار. سمعت أن العمل صعب هذه الأيام. |
||
| − | == Distinction between glyph, character, letter, symbol, and byte == |
||
| + | D: Thank you, Amar. I heard work is hard these days. |
||
| − | {| class="wikitable" |
||
| − | ! Term !! TORI interpretation !! Typical misuse / ambiguity |
||
| − | |- |
||
| − | | [[Byte]] || The smallest addressable unit of computer memory, 8 [[bit]]s. || Often confused with “character” in one-byte encodings such as [[ASCII]]. |
||
| − | |- |
||
| − | | [[Character]] || Numerical code unit (1–3 bytes here) representing a symbolic item of writing. || Sometimes called “letter” or “symbol.” |
||
| − | |- |
||
| − | | [[Glyph]] || Visible image of a character in some [[font]] or handwriting. || Commonly confused with “character,” especially in [[Unicode]] terminology. |
||
| − | |- |
||
| − | | [[Letter]] || Concept of a particular alphabet; one letter may correspond to several characters (uppercase, lowercase, variants). || Used where “glyph” or “character” would be more precise. |
||
| − | |- |
||
| − | | [[Symbol]] || General sign that may represent a concept, not necessarily linguistic. || Used too broadly in many sources. |
||
| − | |} |
||
| + | ج: صعب؟ يا رجل، مستحيل! نعمل في الشمس، والناس لا ترانا. |
||
| − | ---- |
||
| + | J: Hard? Man, impossible! We work under the sun, and people don’t even see us. |
||
| − | == Motivation and goal == |
||
| − | |||
| − | Purpose of this article is practical: |
||
| − | to help [[programmer]]s, [[teacher]]s, and [[student]]s avoid misuse of these terms and to promote consistent, bijective mapping between [[glyph]]s and [[character]]s in digital systems. |
||
| − | |||
| − | ---- |
||
| − | |||
| − | == History == |
||
| − | |||
| − | Written human communication began with drawings — early glyphs — that evolved into alphabets. |
||
| − | For centuries, ambiguity was tolerated: two scribes might draw slightly different glyphs for the same [[letter]], and meaning remained clear to the reader. |
||
| − | |||
| − | The invention of [[printing]] reduced such ambiguity: reproducible glyphs for each character were produced, and readers learned that each distinct shape corresponded to one particular code of writing. |
||
| − | |||
| − | With computers, ambiguity re-appeared: |
||
| − | the same glyph may correspond to several code units, and one character may appear with several glyphs, depending on [[font]] or [[locale]]. |
||
| − | |||
| − | ---- |
||
| − | |||
| − | == Encoding and digital typography == |
||
| − | |||
| − | During the 20th century, [[ASCII]] (American Standard Code for Information Interchange) established a one-byte encoding for English characters. |
||
| − | ASCII used 128 symbols, each represented by one byte; its structure was logical and efficient. |
||
| − | |||
| − | Other languages did not receive such unified treatment. |
||
| − | The [[Soviet Union]], [[Japan]], [[China]], [[Vietnam]], and [[Arab countries]] each developed their own incompatible multi-byte encodings. |
||
| − | This fragmentation slowed the exchange of information among citizens and complicated international collaboration. |
||
| − | |||
| − | During the Soviet period, the creation of a universal Cyrillic-compatible ASCII analogue was never prioritized. |
||
| − | Later standards such as [[KOI8-R]] appeared, but remained incompatible with Western ones. |
||
| − | Similar divergence happened in Asia: [[JIS]] for Japan, [[GB2312]] for China, [[VSCII]] for Vietnam. |
||
| − | Each used different byte arrangements, none achieving the universality of ASCII. |
||
| − | Consequently, glyph-character ambiguity persisted. |
||
| − | |||
| − | ---- |
||
| − | |||
| − | == Ambiguous glyphs == |
||
| − | |||
| − | Ambiguous glyphs cause both technical and educational errors. |
||
| − | |||
| − | * Latin '''A''' (U+0041) and Cyrillic '''А''' (U+0410) look identical but represent different code points. |
||
| − | Software may treat them as distinct even when they are visually the same. |
||
| − | * In Japanese [[font]]s, some [[kanji]] have slightly different glyphs depending on era or platform. |
||
| − | * Type designers sometimes create stylistic variants that obscure the glyph-character relationship. |
||
| − | |||
| − | Ambiguity confuses learners, corrupts data, and even enables [[homoglyph]] attacks, where distinct characters appear identical to a human observer. |
||
| − | |||
| − | ---- |
||
| − | |||
| − | == TORI interpretation == |
||
| − | |||
| − | In [[TORI]] terminology, the '''character''' is primary; the '''glyph''' is secondary. |
||
| − | A glyph exists only as the visual realization of a character. |
||
| − | |||
| − | This view contrasts with [[Wikipedia]], which often treats glyphs as the primary items and characters as abstract sets of them. |
||
| − | |||
| − | From the TORI viewpoint: |
||
| − | |||
| − | # The set of characters forms the backbone of a language’s digital representation. |
||
| − | # The glyph is a function of the character and the [[font]]: |
||
| − | <code>glyph = f(character, font)</code>. |
||
| − | # A good encoding system makes ''f'' bijective — one glyph per character and one character per glyph. |
||
| − | |||
| − | ---- |
||
| − | |||
| − | == Historical context == |
||
| − | |||
| − | Absence of early coordination among linguistic authorities produced long-term effects. |
||
| − | When computers became available, most national projects ignored the need for unified glyph numbering. |
||
| − | Instead of extending [[ASCII]] to cover other major alphabets within a single-byte range, isolated encodings were invented. |
||
| − | |||
| − | These technical decisions, amplified by secrecy and isolation during the [[Cold War]], prevented efficient international standardization. |
||
| − | They belong to [[History]], not [[Politics]]; but understanding them helps to avoid similar mistakes in future computer linguistics. |
||
| − | |||
| − | ---- |
||
| − | |||
| − | == Practical implications == |
||
| − | |||
| − | * '''Software designers''' should ensure that every visible glyph corresponds uniquely to one character code. |
||
| − | * '''Teachers of Japanese''' and other languages should explain to students that a kanji’s shape may correspond to multiple encodings. |
||
| − | * '''Researchers''' should maintain clear, non-overlapping definitions. |
||
| − | * '''Students and users''' can consult this article to verify which concept — glyph, character, byte, or symbol — applies in context. |
||
| − | |||
| − | ---- |
||
| − | |||
| − | == Examples == |
||
| − | |||
| − | # Compare '''A''' (U+0041) and '''А''' (U+0410). |
||
| − | Same glyph, different characters. |
||
| − | # Observe differences between '''日''' in Japanese and Chinese fonts. |
||
| − | Same Unicode code, slightly different glyphs. |
||
| − | # Check that a file saved in [[UTF-8]] may use one, two, or three bytes per character. |
||
| − | The glyph itself gives no hint how many bytes were used. |
||
| − | # Identify fonts where zero “0” and uppercase “O” look almost identical — a design error leading to ambiguity. |
||
| − | |||
| − | ---- |
||
| − | |||
| − | == See also == |
||
| − | [[Character]] |
||
| − | [[Symbol]] |
||
| − | [[Letter]] |
||
| − | [[Byte]] |
||
| − | [[Font]] |
||
| − | [[Encoding]] |
||
| − | [[Unicode]] |
||
| − | [[ASCII]] |
||
| − | [[Homoglyph]] |
||
| − | [[KOI8-R]] |
||
| − | [[JIS]] |
||
| − | [[GB2312]] |
||
| − | [[VSCII]] |
||
| − | [[Tarja]] |
||
| − | [[TORI axioms]] |
||
| − | |||
| − | ---- |
||
| − | |||
| − | == References == |
||
| − | |||
| − | * ANSI X3.4-1967 — American Standard Code for Information Interchange (ASCII). |
||
| − | * Unicode Consortium, ''The Unicode Standard, Version 15.0''. |
||
| − | * Historical documentation of KOI8-R, JIS, GB2312, VSCII encodings. |
||
| − | * Articles about suppression of cybernetics and computer standardization in the USSR (see also [[KeyRus]] and [[Gurtyak, Dmitry Alexandrovich]]). |
||
| − | * Ilene Strizver, “What Is a Glyph?”, fonts.com. |
||
| − | |||
| − | ---- |
||
| − | |||
| − | == Summary == |
||
| − | |||
| − | A '''glyph''' is a visible image of a [[character]]. |
||
| − | The TORI interpretation makes the character — the numerical code unit — the primary object, and treats glyphs as its visible realizations. |
||
| − | Historical experience shows that neglect of this mapping complicated early computerization and multilingual communication. |
||
| − | Clear distinction between ''glyph'', ''character'', ''symbol'', and ''byte'' helps to avoid confusion and improves the design of digital writing systems. |
||
| − | |||
| − | ---- |
||
Revision as of 20:39, 1 November 2025
test
.. ج: صباح الخير، يا جون! تفضل، اجلس.
J: Good morning, John! Please, sit down.
د: شكراً، يا عمار. سمعت أن العمل صعب هذه الأيام.
D: Thank you, Amar. I heard work is hard these days.
ج: صعب؟ يا رجل، مستحيل! نعمل في الشمس، والناس لا ترانا.
J: Hard? Man, impossible! We work under the sun, and people don’t even see us.