Difference between revisions of "Sandbox"

From TORI
Jump to navigation Jump to search
(Replaced content with "{{top}} test .. ج: صباح الخير، يا جون! تفضل، اجلس. J: Good morning, John! Please, sit down. د: شكراً، يا عمار. سمعت أن ا...")
Tag: Replaced
Line 1: Line 1:
 
{{top}}
 
{{top}}
  +
test
[[GlyphChatGPT]] is version of article «[[Glyph]]» created by [[ChatGPT]].
 
Something seems to be wrong with the interface; the most of content of the prototype seems to be lost.
 
   
  +
.. ج: صباح الخير، يا جون! تفضل، اجلس.
'''Glyph''' is a visible [[picture]], a shape that represents one or more [[character]]s.
 
Each glyph has its own visual form, while a '''character''' is a numerical code unit — an integer, usually expressed in one, two, or three [[byte]]s.
 
   
  +
J: Good morning, John! Please, sit down.
In a consistent writing system, every visible glyph should correspond to one and only one character, and vice versa.
 
In practice, however, this one-to-one mapping is often violated, producing confusion and errors in [[text processing]], [[education]], and [[information exchange]].
 
   
  +
د: شكراً، يا عمار. سمعت أن العمل صعب هذه الأيام.
== Distinction between glyph, character, letter, symbol, and byte ==
 
   
  +
D: Thank you, Amar. I heard work is hard these days.
{| class="wikitable"
 
! Term !! TORI interpretation !! Typical misuse / ambiguity
 
|-
 
| [[Byte]] || The smallest addressable unit of computer memory, 8 [[bit]]s. || Often confused with “character” in one-byte encodings such as [[ASCII]].
 
|-
 
| [[Character]] || Numerical code unit (1–3 bytes here) representing a symbolic item of writing. || Sometimes called “letter” or “symbol.”
 
|-
 
| [[Glyph]] || Visible image of a character in some [[font]] or handwriting. || Commonly confused with “character,” especially in [[Unicode]] terminology.
 
|-
 
| [[Letter]] || Concept of a particular alphabet; one letter may correspond to several characters (uppercase, lowercase, variants). || Used where “glyph” or “character” would be more precise.
 
|-
 
| [[Symbol]] || General sign that may represent a concept, not necessarily linguistic. || Used too broadly in many sources.
 
|}
 
   
  +
ج: صعب؟ يا رجل، مستحيل! نعمل في الشمس، والناس لا ترانا.
----
 
   
  +
J: Hard? Man, impossible! We work under the sun, and people don’t even see us.
== Motivation and goal ==
 
 
Purpose of this article is practical:
 
to help [[programmer]]s, [[teacher]]s, and [[student]]s avoid misuse of these terms and to promote consistent, bijective mapping between [[glyph]]s and [[character]]s in digital systems.
 
 
----
 
 
== History ==
 
 
Written human communication began with drawings — early glyphs — that evolved into alphabets.
 
For centuries, ambiguity was tolerated: two scribes might draw slightly different glyphs for the same [[letter]], and meaning remained clear to the reader.
 
 
The invention of [[printing]] reduced such ambiguity: reproducible glyphs for each character were produced, and readers learned that each distinct shape corresponded to one particular code of writing.
 
 
With computers, ambiguity re-appeared:
 
the same glyph may correspond to several code units, and one character may appear with several glyphs, depending on [[font]] or [[locale]].
 
 
----
 
 
== Encoding and digital typography ==
 
 
During the 20th century, [[ASCII]] (American Standard Code for Information Interchange) established a one-byte encoding for English characters.
 
ASCII used 128 symbols, each represented by one byte; its structure was logical and efficient.
 
 
Other languages did not receive such unified treatment.
 
The [[Soviet Union]], [[Japan]], [[China]], [[Vietnam]], and [[Arab countries]] each developed their own incompatible multi-byte encodings.
 
This fragmentation slowed the exchange of information among citizens and complicated international collaboration.
 
 
During the Soviet period, the creation of a universal Cyrillic-compatible ASCII analogue was never prioritized.
 
Later standards such as [[KOI8-R]] appeared, but remained incompatible with Western ones.
 
Similar divergence happened in Asia: [[JIS]] for Japan, [[GB2312]] for China, [[VSCII]] for Vietnam.
 
Each used different byte arrangements, none achieving the universality of ASCII.
 
Consequently, glyph-character ambiguity persisted.
 
 
----
 
 
== Ambiguous glyphs ==
 
 
Ambiguous glyphs cause both technical and educational errors.
 
 
* Latin '''A''' (U+0041) and Cyrillic '''А''' (U+0410) look identical but represent different code points.
 
Software may treat them as distinct even when they are visually the same.
 
* In Japanese [[font]]s, some [[kanji]] have slightly different glyphs depending on era or platform.
 
* Type designers sometimes create stylistic variants that obscure the glyph-character relationship.
 
 
Ambiguity confuses learners, corrupts data, and even enables [[homoglyph]] attacks, where distinct characters appear identical to a human observer.
 
 
----
 
 
== TORI interpretation ==
 
 
In [[TORI]] terminology, the '''character''' is primary; the '''glyph''' is secondary.
 
A glyph exists only as the visual realization of a character.
 
 
This view contrasts with [[Wikipedia]], which often treats glyphs as the primary items and characters as abstract sets of them.
 
 
From the TORI viewpoint:
 
 
# The set of characters forms the backbone of a language’s digital representation.
 
# The glyph is a function of the character and the [[font]]:
 
<code>glyph = f(character, font)</code>.
 
# A good encoding system makes ''f'' bijective — one glyph per character and one character per glyph.
 
 
----
 
 
== Historical context ==
 
 
Absence of early coordination among linguistic authorities produced long-term effects.
 
When computers became available, most national projects ignored the need for unified glyph numbering.
 
Instead of extending [[ASCII]] to cover other major alphabets within a single-byte range, isolated encodings were invented.
 
 
These technical decisions, amplified by secrecy and isolation during the [[Cold War]], prevented efficient international standardization.
 
They belong to [[History]], not [[Politics]]; but understanding them helps to avoid similar mistakes in future computer linguistics.
 
 
----
 
 
== Practical implications ==
 
 
* '''Software designers''' should ensure that every visible glyph corresponds uniquely to one character code.
 
* '''Teachers of Japanese''' and other languages should explain to students that a kanji’s shape may correspond to multiple encodings.
 
* '''Researchers''' should maintain clear, non-overlapping definitions.
 
* '''Students and users''' can consult this article to verify which concept — glyph, character, byte, or symbol — applies in context.
 
 
----
 
 
== Examples ==
 
 
# Compare '''A''' (U+0041) and '''А''' (U+0410).
 
Same glyph, different characters.
 
# Observe differences between '''日''' in Japanese and Chinese fonts.
 
Same Unicode code, slightly different glyphs.
 
# Check that a file saved in [[UTF-8]] may use one, two, or three bytes per character.
 
The glyph itself gives no hint how many bytes were used.
 
# Identify fonts where zero “0” and uppercase “O” look almost identical — a design error leading to ambiguity.
 
 
----
 
 
== See also ==
 
[[Character]]
 
[[Symbol]]
 
[[Letter]]
 
[[Byte]]
 
[[Font]]
 
[[Encoding]]
 
[[Unicode]]
 
[[ASCII]]
 
[[Homoglyph]]
 
[[KOI8-R]]
 
[[JIS]]
 
[[GB2312]]
 
[[VSCII]]
 
[[Tarja]]
 
[[TORI axioms]]
 
 
----
 
 
== References ==
 
 
* ANSI X3.4-1967 — American Standard Code for Information Interchange (ASCII).
 
* Unicode Consortium, ''The Unicode Standard, Version 15.0''.
 
* Historical documentation of KOI8-R, JIS, GB2312, VSCII encodings.
 
* Articles about suppression of cybernetics and computer standardization in the USSR (see also [[KeyRus]] and [[Gurtyak, Dmitry Alexandrovich]]).
 
* Ilene Strizver, “What Is a Glyph?”, fonts.com.
 
 
----
 
 
== Summary ==
 
 
A '''glyph''' is a visible image of a [[character]].
 
The TORI interpretation makes the character — the numerical code unit — the primary object, and treats glyphs as its visible realizations.
 
Historical experience shows that neglect of this mapping complicated early computerization and multilingual communication.
 
Clear distinction between ''glyph'', ''character'', ''symbol'', and ''byte'' helps to avoid confusion and improves the design of digital writing systems.
 
 
----
 

Revision as of 20:39, 1 November 2025


test

.. ج: صباح الخير، يا جون! تفضل، اجلس.

J: Good morning, John! Please, sit down.

د: شكراً، يا عمار. سمعت أن العمل صعب هذه الأيام.

D: Thank you, Amar. I heard work is hard these days.

ج: صعب؟ يا رجل، مستحيل! نعمل في الشمس، والناس لا ترانا.

J: Hard? Man, impossible! We work under the sun, and people don’t even see us.