Difference between revisions of "Sandbox"

Revision as of 20:39, 1 November 2025

test

.. ج: صباح الخير، يا جون! تفضل، اجلس.

J: Good morning, John! Please, sit down.

د: شكراً، يا عمار. سمعت أن العمل صعب هذه الأيام.

D: Thank you, Amar. I heard work is hard these days.

ج: صعب؟ يا رجل، مستحيل! نعمل في الشمس، والناس لا ترانا.

J: Hard? Man, impossible! We work under the sun, and people don’t even see us.

@@ Line 1: / Line 1: @@
 {{top}}
+test
-[[GlyphChatGPT]] is version of article «[[Glyph]]» created by [[ChatGPT]].
-Something seems to be wrong with the interface; the most of content of the prototype seems to be lost.
+.. ج: صباح الخير، يا جون! تفضل، اجلس.
-'''Glyph''' is a visible [[picture]], a shape that represents one or more [[character]]s.
-Each glyph has its own visual form, while a '''character''' is a numerical code unit — an integer, usually expressed in one, two, or three [[byte]]s.
+J: Good morning, John! Please, sit down.
-In a consistent writing system, every visible glyph should correspond to one and only one character, and vice versa.
-In practice, however, this one-to-one mapping is often violated, producing confusion and errors in [[text processing]], [[education]], and [[information exchange]].
+د: شكراً، يا عمار. سمعت أن العمل صعب هذه الأيام.
-== Distinction between glyph, character, letter, symbol, and byte ==
+D: Thank you, Amar. I heard work is hard these days.
-{| class="wikitable"
-! Term !! TORI interpretation !! Typical misuse / ambiguity
-|-
-| [[Byte]] || The smallest addressable unit of computer memory, 8 [[bit]]s. || Often confused with “character” in one-byte encodings such as [[ASCII]].
-|-
-| [[Character]] || Numerical code unit (1–3 bytes here) representing a symbolic item of writing. || Sometimes called “letter” or “symbol.”
-|-
-| [[Glyph]] || Visible image of a character in some [[font]] or handwriting. || Commonly confused with “character,” especially in [[Unicode]] terminology.
-|-
-| [[Letter]] || Concept of a particular alphabet; one letter may correspond to several characters (uppercase, lowercase, variants). || Used where “glyph” or “character” would be more precise.
-|-
-| [[Symbol]] || General sign that may represent a concept, not necessarily linguistic. || Used too broadly in many sources.
-|}
+ج: صعب؟ يا رجل، مستحيل! نعمل في الشمس، والناس لا ترانا.
-----
+J: Hard? Man, impossible! We work under the sun, and people don’t even see us.
-== Motivation and goal ==
-Purpose of this article is practical:
-to help [[programmer]]s, [[teacher]]s, and [[student]]s avoid misuse of these terms and to promote consistent, bijective mapping between [[glyph]]s and [[character]]s in digital systems.
-----
-== History ==
-Written human communication began with drawings — early glyphs — that evolved into alphabets.
-For centuries, ambiguity was tolerated: two scribes might draw slightly different glyphs for the same [[letter]], and meaning remained clear to the reader.
-The invention of [[printing]] reduced such ambiguity: reproducible glyphs for each character were produced, and readers learned that each distinct shape corresponded to one particular code of writing.
-With computers, ambiguity re-appeared:
-the same glyph may correspond to several code units, and one character may appear with several glyphs, depending on [[font]] or [[locale]].
-----
-== Encoding and digital typography ==
-During the 20th century, [[ASCII]] (American Standard Code for Information Interchange) established a one-byte encoding for English characters.
-ASCII used 128 symbols, each represented by one byte; its structure was logical and efficient.
-Other languages did not receive such unified treatment.
-The [[Soviet Union]], [[Japan]], [[China]], [[Vietnam]], and [[Arab countries]] each developed their own incompatible multi-byte encodings.
-This fragmentation slowed the exchange of information among citizens and complicated international collaboration.
-During the Soviet period, the creation of a universal Cyrillic-compatible ASCII analogue was never prioritized.
-Later standards such as [[KOI8-R]] appeared, but remained incompatible with Western ones.
-Similar divergence happened in Asia: [[JIS]] for Japan, [[GB2312]] for China, [[VSCII]] for Vietnam.
-Each used different byte arrangements, none achieving the universality of ASCII.
-Consequently, glyph-character ambiguity persisted.
-----
-== Ambiguous glyphs ==
-Ambiguous glyphs cause both technical and educational errors.
-* Latin '''A''' (U+0041) and Cyrillic '''А''' (U+0410) look identical but represent different code points.
-  Software may treat them as distinct even when they are visually the same.
-* In Japanese [[font]]s, some [[kanji]] have slightly different glyphs depending on era or platform.
-* Type designers sometimes create stylistic variants that obscure the glyph-character relationship.
-Ambiguity confuses learners, corrupts data, and even enables [[homoglyph]] attacks, where distinct characters appear identical to a human observer.
-----
-== TORI interpretation ==
-In [[TORI]] terminology, the '''character''' is primary; the '''glyph''' is secondary.
-A glyph exists only as the visual realization of a character.
-This view contrasts with [[Wikipedia]], which often treats glyphs as the primary items and characters as abstract sets of them.
-From the TORI viewpoint:
-# The set of characters forms the backbone of a language’s digital representation.
-# The glyph is a function of the character and the [[font]]:
-   <code>glyph = f(character, font)</code>.
-# A good encoding system makes ''f'' bijective — one glyph per character and one character per glyph.
-----
-== Historical context ==
-Absence of early coordination among linguistic authorities produced long-term effects.
-When computers became available, most national projects ignored the need for unified glyph numbering.
-Instead of extending [[ASCII]] to cover other major alphabets within a single-byte range, isolated encodings were invented.
-These technical decisions, amplified by secrecy and isolation during the [[Cold War]], prevented efficient international standardization.
-They belong to [[History]], not [[Politics]]; but understanding them helps to avoid similar mistakes in future computer linguistics.
-----
-== Practical implications ==
-* '''Software designers''' should ensure that every visible glyph corresponds uniquely to one character code.
-* '''Teachers of Japanese''' and other languages should explain to students that a kanji’s shape may correspond to multiple encodings.
-* '''Researchers''' should maintain clear, non-overlapping definitions.
-* '''Students and users''' can consult this article to verify which concept — glyph, character, byte, or symbol — applies in context.
-----
-== Examples ==
-# Compare '''A''' (U+0041) and '''А''' (U+0410).
-   Same glyph, different characters.
-# Observe differences between '''日''' in Japanese and Chinese fonts.
-   Same Unicode code, slightly different glyphs.
-# Check that a file saved in [[UTF-8]] may use one, two, or three bytes per character.
-   The glyph itself gives no hint how many bytes were used.
-# Identify fonts where zero “0” and uppercase “O” look almost identical — a design error leading to ambiguity.
-----
-== See also ==
-[[Character]]
-[[Symbol]]
-[[Letter]]
-[[Byte]]
-[[Font]]
-[[Encoding]]
-[[Unicode]]
-[[ASCII]]
-[[Homoglyph]]
-[[KOI8-R]]
-[[JIS]]
-[[GB2312]]
-[[VSCII]]
-[[Tarja]]
-[[TORI axioms]]
-----
-== References ==
-* ANSI X3.4-1967 — American Standard Code for Information Interchange (ASCII).
-* Unicode Consortium, ''The Unicode Standard, Version 15.0''.
-* Historical documentation of KOI8-R, JIS, GB2312, VSCII encodings.
-* Articles about suppression of cybernetics and computer standardization in the USSR (see also [[KeyRus]] and [[Gurtyak, Dmitry Alexandrovich]]).
-* Ilene Strizver, “What Is a Glyph?”, fonts.com.
-----
-== Summary ==
-A '''glyph''' is a visible image of a [[character]].
-The TORI interpretation makes the character — the numerical code unit — the primary object, and treats glyphs as its visible realizations.
-Historical experience shows that neglect of this mapping complicated early computerization and multilingual communication.
-Clear distinction between ''glyph'', ''character'', ''symbol'', and ''byte'' helps to avoid confusion and improves the design of digital writing systems.
-----

Difference between revisions of "Sandbox"

Revision as of 20:39, 1 November 2025

Navigation menu

Search