BookPDF Available
Sergey Pshenichnikov
Algebra of text
iEi,f(i)
qX. . . =X. . .
Moscow
2022
Contents
Foreword 6
0 Text algebra without formulas 7
0.1 Coordinatingtexts............................................. 7
0.1.1 Rules ............................................... 7
0.1.2 Examples............................................. 8
0.1.3 Requirements for coordinate objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
0.2 Matrixunits................................................ 10
0.2.1 Denition............................................. 10
0.2.2 Product .............................................. 11
0.2.4 Order ............................................... 14
0.2.5 Subtractionanddivision ..................................... 15
0.2.6 Phantom multiplier comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
0.2.7 Transformations and equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
0.3 MatrixTexts................................................ 16
0.3.1 The hyperbinary coordinate formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
0.3.2 Properties............................................. 16
0.3.3 Fragments............................................. 17
0.3.4 Exampleofalinguistictext.................................... 17
0.3.5 Example of matrix mathematical text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
0.3.6 Example of matrix Morse code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
0.4 Algebraoftext .............................................. 17
0.4.1 Deﬁnitions of algebraic systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
0.4.2 Freesemi-module......................................... 18
0.4.3 FragmentAlgebra ........................................ 18
0.5 Algebraicstructurization ......................................... 20
0.5.1 Structurization .......................................... 20
0.5.2 Exampleofalinguistictext.................................... 20
0.5.3 An example of a mathematical text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
0.5.4 ExampleofMorsecode ..................................... 21
2
Algebra of text – Pshenichnikov S. B.
0.6 Contextcategory ............................................. 22
0.6.1 Denitions ............................................ 22
0.6.2 Example ............................................. 23
0.7 Concordanceofmeaning ......................................... 24
0.7.1 Contextual concordance of words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
0.7.2 MeaningfulNetherchains .................................... 27
0.7.3 MeaningEqualizers ....................................... 28
1 Text Coordination 30
1.1 Rules ................................................... 30
1.1.1 TextsandAlgebra ........................................ 30
1.1.2 Coordinating ........................................... 31
1.1.3 Theaimofalgebraization .................................... 31
1.1.4 Dictionariesandalphabets .................................... 32
1.1.5 Repetitions ............................................ 32
1.1.6 Meaningfulmarkup........................................ 33
1.1.7 Coordinatingrules ........................................ 33
1.2 Examples ................................................. 34
1.2.1 Similarityandsameness ..................................... 34
1.2.2 Exampleonanabacus ...................................... 34
1.2.3 Chessexample .......................................... 35
1.2.4 Exampleofalanguagetext.................................... 35
1.2.5 An example of a mathematical text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.2.6 Example Morse-Weil-Gerke code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.3 Requirements for coordinate objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2 Matrix units 42
2.1 Denition ................................................. 42
2.2 Theproduct................................................ 42
2.2.1 Thedeningrelation ....................................... 42
2.2.2 Indexes .............................................. 43
2.2.3 Idempotentandnilpotent..................................... 44
2.2.4 Distribution............................................ 45
2.2.5 Phantom ............................................. 45
2.2.6 Hyperbinaryfactoriality ..................................... 45
2.2.7 Classication........................................... 46
2.2.8 Constituents............................................ 47
2.2.9 Similarityontheleft ....................................... 47
2.2.10 Similarityontheright ...................................... 47
2.2.11 Left-similarity transitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3
Algebra of text – Pshenichnikov S. B.
2.2.12 Transitivity of similarity to the right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.3.1 Problem.............................................. 48
2.3.3 Addition by agreement (concordance) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.3.4 Thedeningrelation ....................................... 50
2.3.5 Singularityofzero ........................................ 50
2.3.6 Samehyperbinarynumbers ................................... 50
2.3.7 Totalsumofhyperbinaries.................................... 51
2.3.8 Phantomterms .......................................... 52
2.4 Order ................................................... 52
2.4.1 Magnitude ............................................ 52
2.4.2 Neterchains............................................ 53
2.5 Subtractionanddivision ......................................... 54
2.5.1 Subtraction ............................................ 54
2.5.2 Division.............................................. 54
2.6 Phantom multiplier comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.7 Transformationsandequations ...................................... 56
2.7.1 Transformations ......................................... 56
2.7.2 Equations............................................. 57
3 Matrix texts 58
3.1 The Hyperbinary Coordination Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.2 Properties ................................................. 60
3.3 Excerptsfrom............................................... 61
3.4 Exampleofalinguistictext........................................ 62
3.5 Example of a matrix mathematical text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.6 ExampleofMatrixMorseCode ..................................... 64
4 Algebra of text 66
4.1 Deﬁnitions of Algebraic Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2 Freehalf-module ............................................. 67
4.3 FragmentAlgebra............................................. 67
5 Algebraic structurization 73
5.1 Structuring ................................................ 73
5.2 Exampleofalanguagetext ........................................ 73
5.3 An example of a mathematical text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.4 AnexampleofMorsecode........................................ 77
6 Context category 80
4
Algebra of text – Pshenichnikov S. B.
6.1 Denitions ................................................ 80
6.2 Example ................................................. 82
7 Concordance of meaning 86
7.1 Contextualwordconcordance....................................... 86
7.2 MeaningfulNeuterChains ........................................ 91
7.3 Theequalizersofsense.......................................... 93
Conclusion 96
A Collective Meaning Recognition 99
A.1 HistoricalExamples ........................................... 99
A.1.1 The Persians and the Scythians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
A.1.2 BabylonandRome ........................................ 100
A.1.3 Prototype algorithm for collective understanding . . . . . . . . . . . . . . . . . . . . . . . . 101
A.2 Hyperbinaryquestions. .......................................... 102
A.2.1 Closely related approaches and concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
A.3 Hyperbinary Algebra of Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
A.3.1 QuestionLogic .......................................... 103
A.3.2 Denition............................................. 104
A.3.3 Hyperbinarization ........................................ 105
A.4 Hyperbinaryevaluations ......................................... 105
A.4.1 CloseApproaches ........................................ 105
A.4.2 Hyperbinarization ........................................ 106
A.5 Hyperbinary philosophical categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
A.5.1 PhilosophicalCategories..................................... 108
A.5.2 Hyperbinarization ........................................ 110
A.6 Mentoring ................................................ 110
A.6.1 MentalCloning.......................................... 111
A.6.2 Hyperbinarization ........................................ 111
Bibliography 112
5
Foreword
This text integrates, supplements, and corrects the materials presented earlier [1–15].
My teachers are Georgy Alexandrovich Zaitsev and Pobisk Georgievich Kuznetsov (see Wikipedia). Algebraist and
philosopher. The material in this book is therefore composite. I did not dare to publish it for a long time. I corrected and
edited. I wanted to wait for mental maturity before senile dementia. I hope I made it in time.
I ﬁrst heard Pobisch’s phrase "mathematical objects do not change over time" from him back in my youth. It was
very oﬀensive for words whose meaning is constantly changing. I observed unsuccessful attempts to create universal
languages. Another statement by PG, which he attributed to Fichte, helped: "If the problem is not solved, it is probably
set incorrectly. The solution may not be to simplify natural language, to turn it into another special and unambiguous
language, but to transform the text into a mathematical object that allows for multiple meanings.
Forty years ago I was lucky enough to construct a method for the exact linearization of systems of nonlinear algebraic
equations based on the corresponding algebra of hypercomplex numbers [16–26]. The elements of this algebra can be
represented by matrices. The original scalar equations turn into matrix equations, but linear in the unknowns.
In text algebra, scalar sign sequences are also transformed into matrix texts, with which algebraic operations can be
performed, e.g. dividing one text by another "in columns". Texts can be handled like numbers and the results of
number theory can be used to analyze and synthesize texts. But taking into account that they are hypercomplex numbers
(matrices).
The goal of text algebraization is to be able to calculate from solutions of systems of nonlinear algebraic equations:
text meaning (similarities and diﬀerences of texts from other texts)
structuring variants (diﬀerent fragmentation of the text and changing the order of the text-forming fragments)
general meaning (structurization invariants) (a summary of a text that does not depend on structuring variants)
context dictionaries (tables of relations between words and their various contexts)
semantic (contextual) translation (other text on the basis of context dictionaries)
caption names of text fragments (on the basis of the calculated meaning and context dictionaries)
restructuring
(other structuring according to the target requirement)
summaries (title, introduction, introduction, conclusion, abstract, outline, presentation)
versions of the text according to the target requirements (text based on a given short content)
6
Chapter 0
Text algebra without formulas
0.1. Coordinating texts
0.1.1. Rules
0.1.1.1. Texts and algebra
Numbers and letters are two kinds of ideal objects-signs for studying relations of real objects. Understanding (interpre-
tation) of texts is personiﬁed - it depends on a person’s genotype and phenotype. Also, the meaning of words can change
over time. All words of contextual language are homonyms. A word has as many properties (relationships between
words) as there are contexts in the entire corpus of natural language.
People understand numbers in roughly the same way and regardless of where and when they are used. The language of
numbers is universal, universal, and eternal.
Algebra (a symbolic generalization of arithmetic) and text (a sequence of symbols) are so far two very diﬀerent tools of
cognition.
0.1.1.2. Coordination
Application of mathematical methods in any subject area is preceded by coordinatization, which begins with digitization.
Coordinatization is the replacement, modeling of the object of research by its digital copy. This is followed by the correct
replacement of the model numbers themselves with symbols and the determination of the properties and regularities of
combinations of these symbols.
If correct coordinatization is applied to the text, the text can be reduced to algebra.
For successful algebraization, it is extremely important to describe the coordinatization rules and the properties of the
coordinating objects in such a way as to reduce the variability of their choices.
0.1.1.3. Purpose of Algebraization
The goal of text algebraization is to be able to compute from solutions of systems of algebraic equations the meaning of
text, variants of structuring, vocabularies, summaries and versions of text by the target function.
Texts are understood as sequences of signs (letters, words, notes, etc.). There are ﬁve types of sign systems: natural,
ﬁgurative, linguistic, records, and codes.
0.1.1.4. Dictionaries and Alphabets
The carrier of a character sequence is all its characters without repetition. A carrier may be called an alphabet or a
dictionary of a sign sequence.
Words are sequences of letters or elementary phonemes. The meaning of a letter is only in its form or sound. There is
no contextual dependence of the letters of the alphabet.
7
Algebra of text – Pshenichnikov S. B.
Ultimate context dependence is present in the words of homonyms of the natural language. For example, the Russian
word "kosa" has four diﬀerent meanings. It is believed that a ﬁfth of the vocabulary of the English language is occupied
by homonymy.
0.1.1.5. Repetition
Text is a character sequence with at least one repetition. Vocabulary is an iconic sequence without repetition. The
presence of repetitions allows one to reduce the number of used sign-words (reduce the vocabulary). But then the
repeated signs can diﬀer in meaning. The meaning of a word depends on the words around it. Problems of understanding
words and text have as their cause that this some part of meaning (meaning) is determined or guessed subjectively and
ambiguously. Diﬀerent readers and listeners understand diﬀerent meanings for the same word.
When words are repeated in a text, there are preferable, in the opinion of the author of the text, connections-relationships
of the repeated word with other words. These relationships are recorded as the new meaning of the repeated word.
0.1.1.6. Semantic markup
If a particular text does not have explicit repetitions, it does not mean that they are not hidden in semantic (contextual)
form. The meaning can be repeated, not only the character-sign (word) denoting it. The context here is a fragment of
text between repeating words. If the contexts of diﬀerent words are similar, then diﬀerent words are similar in the sense
of their common contexts, the word-signs of those contexts. Contexts are similar if they have at least one common
sign-word.
The context is not only for two words repeated in a row. The meaning or context of a word can be pointed to by referring
to any suitable fragment of text, not necessarily located in the immediate vicinity of the repeated word. In this case, the
text loses its linear order, like making words out of letters. If there are no such fragments, the meaning of the word is
borrowed by reference to a suitable context from another corpus text (library).
Common words in contexts, in turn, also have their contexts - the notion of a reﬁned word context arises.
0.1.1.7. Coordination rules
In a ﬁnite character sequence, each character has a unique number that determines the place of the character in the
sequence. No two characters can be in the same place. But the text requires another index to indicate the repetition of a
sign in the text. This second index creates equivalence relations on a ﬁnite set of words. It is reasonable to match the
sign with some two-index object (for example, some matrix). The ﬁrst index of the matrix indicates the number of the
sign in the sequence. The second index indicates the number of this sign, ﬁrst encountered in the sequence. The carrier
(dictionary) of a character sequence (text) is its part of characters with the same indices. Missing word numbers in the
dictionary can be eliminated by continuous numbering.
Text coordinating rules:
The ﬁrst index of the coordinating text (matrix) is the ordinal number of the word in the text, the second index is the
ordinal number of the same word ﬁrst encountered in the text. If the word has not been previously encountered, the
second index is equal to the ﬁrst index.
The dictionary is the original text with deleted repetitions. It is possible to order the dictionary with exclusion of gaps in
word numbering. For two or more texts that are not a single text, the word order in each text is independent. In two texts,
the initial words are equally ﬁrst. Just as in two books, the beginning pages begin with one.
The common dictionary of a set of texts is the dictionary of all texts after their concatenation. It is possible to order the
dictionary with deletion of gaps in the numbering of words.
0.1.2. Examples
0.1.2.1. Similarity and sameness
According to G. Frege any object having relations with other objects and their combinations has as many properties
(meanings) as these relations of similarity and sameness (tolerance and equivalence). The part of the values taken into
account is called the sense by which the object is represented in a given situation. The naming of an object by a number,
a symbol, a word, a picture, a sound, a gesture to describe it brieﬂy is called the sign of the object (this is one of the
meanings).
8
Algebra of text – Pshenichnikov S. B.
Each of the all possible parts (boolean set) of an object’s meanings (meaning) corresponds to one sign. This is the main
problem of meaning recognition, but it is also the basis for making do with minimal sets of signs. It is impossible to
assign a unique sign to each subset of values. The objects of information exchange are minimal sets of signs (notes,
alphabet, language dictionary). The meaning of signs is usually not calculated, but is determined by the contexts
(neighborhoods) of the sign so far intuitively.
0.1.2.2. Example on the abacus
A solution to the problem of sign ambiguity is the semantic markup of text. The semantic markup can be explained
on the example of marginal unambiguity. On Russian abacus the text is a sequence of identical signs (knuckles). The
vocabulary of such a text consists of a single word. This is even stronger than in Morse code, where the dictionary
consists of two words. Without semantic markup, it is impossible to use such texts. Therefore, the vocabulary changes
and the characters are divided into groups - ones, tens, hundreds, etc. These group names (numbers) become unique
word numbers. The vocabulary are numbers from zero to nine. Each knuckle, too, can be represented so far by an
undeﬁned matrix on such a Cartesian abacus.
The transformation of identical objects into similar ones has taken place. The measure of similarity is the coordinate
values of the words. In addition to positional, repetitions of dictionary digits occur when arithmetic operations are
performed. Equivalence relations are established: if after an arithmetic operation the number 9+1 is obtained, then 0
appears in that position and 1 is added to the next digit. On the abacus, all the knuckles are shifted to the original (zero)
position, and one is added in the next digit (wire). Some matrix transformation is performed on the matrix abacus.
If one sets a measure of the sameness of signs, then the ratio of tolerance (similarity) can be transformed again into the
ratio of equivalence (sameness) by this measure. For example, by rounding numbers. One can recognize the diﬀerence
between tolerance and equivalence by the violation of transitivity. For tolerance relations it can be violated. For example,
let an element A be similar to B in one sense. If the sense of B does not coincide with the sense of element C, then A
can be similar to C only in the part of intersection of their senses (part of properties). The transitivity of relations is
restored (closed), but only for this common part of sense. After the sameness is achieved, A will be equivalent to C. For
example, the above transformation (closure) on some coordinates provides arithmetic operations on a matrix abacus.
0.1.2.3. Chess example
or chess, the vocabulary of their matrix text of the game is the numbers of one of the pieces of each color and the
move separator (from 1 to 11). The word of the chess text is also a kind of matrix. The ﬁrst coordinate i is unique and
is the cell number on the chessboard (from 1 to 64). The second coordinate j is a number from the dictionary. The
chess matrix text at any moment of the game is the sum of matrices, each showing a piece on the corresponding place
on the chessboard. The repetitions in the text appear both because of duplication of pieces and because of constant
transitions during the game from similarity to sameness and vice versa for all pieces except the king. The game consists
in implementing the most eﬀective such transitions and the actual classiﬁcation of the pieces. Pawns that are identical in
the beginning then become similar only by the move rule, and sometimes a pawn becomes identical with a queen.
The tool of matrix text analysis is a transitivity control to check the diﬀerence between similarity and sameness. Lack
of transitivity control is an algebraic explication of misunderstanding for language texts, loss in chess, or errors in
numerical calculations.
Relational transitivity is a condition for transforming a set of objects into a mathematical category. The semantic markup
of a text can become the computation of its categories by means of transitive closure. The category objects are the
contexts of matrix words, the morphisms are the transformation matrices of these contexts.
0.1.2.4. Example of a language text
Example text:
A set is an object that is a set of objects. A polynomial is a set of monomial objects which are a set of objects-somnomials.
Text in normal form is coordinated according to the above rules. The vocabulary of a text is the text itself, but without
repetitions. Text coordination is its indexing and matching of indexed matrix words.
0.1.2.5. Example of a mathematical text
As an example of a mathematical text selected formulas for the volume of the cone, cylinder and torus. The formulas
are treated as texts. This means that signs included in texts are not mathematical objects and there are no algebraic
operations for them.
9
Algebra of text – Pshenichnikov S. B.
For the semiotic analysis of formulas as texts, the repetition of signs is important. The repetitions determine the patterns.
Formulas are presented according to the rules of coordinating in index form in a single numbering, as if they were not
three texts, but one. The coordinated text is written through matrices in tabular form.
0.1.2.6. Example of a Morse-Weil-Herke code
This example is chosen because of the extreme brevity of the dictionary. In Morse code, the character sequences of 26
Latin letters can be considered as texts consisting of words - dots and dashes. The order of words (dots and dashes) is
extremely important in each individual text (alphabet letter). In linguistic texts, the order is also important ("mom’s
dad" is not "dad’s mom", but there are exceptions ("languid evening" and "languid evening").
The dictionary and carrier of Morse code is a sequence of two character-characters – ("dot" and "dash") that coincides
with the letter A. The order of the characters in the dictionary or the carrier is no longer important. Therefore, the
carrier may also be the letter N. One letter is the carrier (dictionary), the remaining 25 letters are code texts. Deﬁning
the 26 letters of Morse code as texts of words is unusual for linguistic texts. In linguistic texts, words are composed of
letters. But for codes, as relations of signs, the composition of letters (cipher) from words is natural.
Each code word (of dots and dashes), as some object, has two coordinates. The ﬁrst coordinate is the number of the
word in this letter (from one to four). The second coordinate is the number in the dictionary (1 or 2). The dictionary is
the same for all 26 texts.
All the 26 texts (Latin letters) are independent of each other: the presence of dots or dashes in one text (as letters) and
their order have no eﬀect on the composition of the other text (another letter). Therefore the numbering of the ﬁrst
character in Morse code in all letters begins with one according to the third coordinate rule.
Each point or dash, of which a letter consists, taking into account their order, according to the coordinating rule, is
assigned a coordinating object - a matrix, the choice of which must satisfy certain requirements.
0.1.3. Requirements for coordinate objects
Coordination for texts consists in matching the words of the text with some "number-like objects" that satisfy three
general requirements:
The objects must be individual like numbers;
The objects must be abstract (the volume of the concept is maximal, the content of the concept is minimal);
Algebraic operations (addition, multiplication, comparison) can be performed over the objects.
The text-appropriate objects in algebra are two-index matrix units:
They are individual-all matrix units are diﬀerent as matrices.
An arbitrary
n
-order matrix can be represented through a decomposition by matrix units. Matrix units are
the basis of a complete matrix algebra and a matrix ring. This means that the maximum concept volume
requirement is satisﬁed. Matrix units contain only one unit – the content is minimal.
All algebraic operations necessary for the coordinate object can be performed with matrices.
0.2. Matrix units
In the section on the basis of matrix units (hyperbinary numbers) the necessary algebraic systems for transformation
of coordinated texts into matrix ones are constructed and investigated. The matrix representation of texts allows to
recognize and create the meaning of texts by means of mathematical methods.
0.2.1. Deﬁnition
Matrix units are matrices in which the unit is at the intersection of the row number (ﬁrst index) and the column number
(second index). In the following, only square matrix units are considered.
The number of all square matrix units (full set) is equal to the total number of elements of a square matrix.
Hereinafter matrix units are considered as a matrix generalization of integers 0 and 1. The main diﬀerence between
such hyperbinary numbers and integers is the noncommutativity of their product.
10
Algebra of text – Pshenichnikov S. B.
0.2.2. Product
0.2.2.1. Constitutive ratio
The product of matrix units is diﬀerent from zero (zero matrix) only if the internal indices of the product matrices are
equal. Then the product is a matrix unit with the ﬁrst index of the ﬁrst factor and the second index of the second factor.
Some matrix units can be called simple matrix units by analogy with simple integers, and others can be called composite
matrix units because they are products of simple ones.
A complete set of matrix units can be obtained from simple matrix units, which are called the formants of the complete
set.
Matrix units are treated precisely as a matrix generalization of integers. Left and right noncommutative divisors of
hyperbinary numbers can be diﬀerent, and there are divisors of zero when each multiplier (the divisor of the product)
is diﬀerent from zero, but their product is equal to a zero matrix. This property of matrices essentially distinguishes
them from integers, for which there are no divisors of zero. But many concepts of modular arithmetic (comparisons of
integers modulo) remain valid for hyperbinary numbers, but only because of their matrix form. The elements of such
matrices (zero and one) have no such properties.
Simple matrix units (analog of simple integers) in the full set are recognized by the ratio of indices.
0.2.2.2. Indexes
The indices of simple matrix units are of two kinds: the units in such binary matrices are immediately above or below
the main diagonal of the square matrix. The elements with the same indices (diagonal matrix units) are located on the
main diagonal, and they are not simple.
In composite matrix units, the diﬀerence of the ﬁrst and second indices is either zero (diagonal matrix units), or the
diﬀerence of the indices is greater than one in absolute value. In composite matrix units their units are outside the two
diagonals where the units of simple matrix units are located.
The indices of composite matrix units are all pairs of indices of elements of a square matrix of dimension n except for
pairs of indices of simple matrix units.
The ratio of indices determines the value of the product of two identical matrix units. Unlike integers, the square of any
hyperbinary number is either zero (nilpotent numbers) or the same number (idempotent hyperbinary numbers).
0.2.2.3. Idempotent and nilpotent
Idempotency is a property of an algebraic operation and an object, when repeatedly applied to an object, to produce the
same result as the ﬁrst application of the operation. For example, it is the addition of a number to zero, multiplication by
one, or raising to the power of one.
Diagonal matrix units are idempotent. Squares of diagonal matrix units are matrices themselves due to equality of
internal indices. The product of diagonal matrix units with diﬀerent indices is zero. Such algebraic objects are known as
orthogonal projectors.
A nilpotent element is an element of algebraic structure, some degree of which turns to zero. All matrix units (hyperbinary
numbers) except for idempotent ones are nilpotent matrix units. Their second degree is converted to zero. A pair of
identical nilpotent matrix units (hyperbinary numbers) are divisors of zero.
The ratio (distribution) of prime and composite hyperbinary numbers in the full set is determined by their dimension n
(the corresponding dimension of matrix units).
0.2.2.4. Distribution
The distribution of prime and composite matrix units is as follows. The number of matrix units with elements above and
below the main diagonal is simple out of the total number of complete matrix units. The remaining matrix units are
composite matrix units (products of simple ones).
The peculiarity of the system of hyperbinary numbers is that they have left and right multipliers (phantom), multiplication
by which does not lead to their change.
11
Algebra of text – Pshenichnikov S. B.
0.2.2.5. Phantomness
Phantom units are such multipliers of matrix units that when multiplied do not lead to a change in the matrix units.
Phantom is a generalization of unipotency. Matrix units have a countable set (an inﬁnite set that can be numbered by a
natural number series) of phantom left and right factors. The phantom multipliers do not lead to a change in the matrix
unit and are analogous to the unit for integers.
In contrast to the case of integers and their dull multiplication by one, the phantom multipliers of matrix units are
countably manifold. If the occurrence of a particular phantom multiplier has signs of a pattern, then matrix units can be
compared by their phantom multipliers. Phantom multipliers are some free index (coordinate) parameters of hyperbinary
numbers.
The motivation for using phantom multipliers is that the relations between matrix units can be extended by corresponding
equivalence and similarity relations between their phantom multipliers. Diﬀerent matrix units with the same or similar
phantom multipliers can be compared modulo this phantom multiplier. Conversely, identical matrix units may diﬀer by
their phantom multipliers.
If a one-to-one correspondence between a matrix unit and its phantom multiplier is deﬁned, then this multiplier can be
the module of comparison of matrix units.
The presence of such phantom multipliers will further be used to compare words by their contexts and to compose
systems of matrix text equations. Contexts for words will be their corresponding phantom multipliers.
One-valuedness of decomposition of integers into prime factors (factoriality) for matrix units is generalized taking into
account their noncommutativity and the need to restrict the ambiguity of decompositions.
0.2.2.6. Hyperbinary factoriality
Matrix units have a countable set of decompositions into factorizers. This means that there is no single-valuedness of
factorization for matrix units. This property will prove useful for comparisons of text fragments at any distance from
each other.
Among the decompositions of matrix units it is possible to deﬁne some canonical decompositions generalizing the
decompositions of integers into prime factors. Such decompositions are algebraically richer than decompositions of
integers due to noncommutativity of hyperbinary numbers.
There is the following classiﬁcation of canonical expansions.
0.2.2.7. Classiﬁcation
There are three classes of canonical decompositions of matrix units. A decomposition is called canonical if the co-
multipliers are simple matrix units. The property determining the canonical decompositions is the maximal closeness in
coordinates of the multipliers of the decomposition of composite hyperbinary numbers into prime numbers.
In general case there are three classes of canonical expansions of arbitrary matrix units depending on the ratio of indices:
0.2.2.7.1. The ﬁrst index is greater than the second. This is the ﬁrst class of decomposition into prime matrix units
- here the ﬁrst index is greater than the second one strictly by one in each factor. It is impossible to be less than that.
0.2.2.7.2. The ﬁrst index is less than the second. This is the second class of decomposition into prime matrix units
- the ﬁrst index is less than the second in each factor. The ﬁrst index is strictly less than the second index by one in each
factor.
0.2.2.7.3. The indices are equal This is the third class of decomposition into simple matrix units - the ﬁrst index
in the ﬁrst factor is less than the second one strictly by one, and in the second factor the ﬁrst index is greater than the
second one by one and equal to the second index of the ﬁrst factor.
The decomposition is singular and therefore it is canonical.
All simple matrix units are the complete system of formants of the complete set of matrix units.
12
Algebra of text – Pshenichnikov S. B.
0.2.2.8. Formators
A comparatively small number of simple matrix units will allow to write any texts only with the help of such formants
which are formants for all matrix units (complete set). The formants are the alphabet of matrix texts, and monomials, as
products of formants, are the words.
A complete system of formants consists of simple matrix units. Compound matrix units (monomials) are called basis
units of a complete set in systems of hypercomplex numbers, e.g. alternions.
These formants, like basis elements, are linearly independent. There is no set of any numbers other than zero such that
any partial sum or sum of all elements and basis elements equals the zero matrix. This follows from the fact that the
units in all matrix units of any formant and basis element are in diﬀerent places in the matrix and it is impossible to
achieve a zero matrix sum by using numbers (integers or real numbers) as coeﬃcients before the summands.
The multiplicity of integers for hyperbinary numbers is inherited in terms of similarity.
0.2.2.9. Similarity on the left
Matrix units having the same second indexes are multiples (similar) from the left.
0.2.2.10. Similarity on the right
Matrix units having the same ﬁrst indexes are multiples from the right.
The transitivity of multiplicity for triples of integers is inherited for triples of hyperbinary numbers.
0.2.2.11. Left similarity transitivity
The similarity relations of matrix units are transitive: if the ﬁrst matrix unit is similar to the second matrix unit on the
left and the second matrix unit is similar to the third matrix unit, then the ﬁrst matrix unit is similar to the third matrix
unit on the left.
0.2.2.12. Transitivity of similarity to the right
If the ﬁrst matrix unit is similar to the right of the second matrix unit and the second matrix unit is similar to the third
matrix unit, then the ﬁrst matrix unit is similar to the right of the third matrix unit. The similarity relations on the right
are transitive.
The transitivity property of similarity of matrix units will be used further in the construction of the category of context.
0.2.3.1. Problem
The result of matrix unit multiplication is a matrix unit. Matrix units are closed with respect to the multiplication
operation. Therefore the algebraic system of matrix units by multiplication is a monoid of matrix units or a semigroup
with a unit matrix (a common neutral element).
The result of adding matrix units will no longer be a matrix unit in the general case.
Matrix units are matrix monomials (monomials). They are either simple matrix units or their products. A matrix
polynomial (polynomial) is the sum of matrix monomials.
Any n×n binary matrix (basis element) can be represented as a polynomial with respect to simple matrix units (formants).
Matrices over the ring of integers and the ﬁeld of real numbers are not considered here. But the binary matrices
(consisting of 0 and 1) needed to create matrix texts should also have appropriate constraints.
Text binary matrices should not have more than one unit in a row in accordance with the rules of text coordinating. The
ﬁrst coordinate of words is unique - it is the ﬁrst index of matrix units and the number of the row where the unit is
located. When adding identical matrix units, for example, the result ceases to be a binary matrix at all. Therefore, the
addition of textual matrix units must be deﬁned in a special way.
There are diﬀerent rules for adding binary numbers.
13
Algebra of text – Pshenichnikov S. B.
When multiplying square binary matrices of the same dimensionality, the result will always be binary matrices. The
analogue of the binary multiplication operation in the language of logical functions is logical multiplication. The truth
table for this operation coincides with the usual 0 and 1 multiplication.
Logical addition can be considered as a pretender to the rule of addition of text binary matrices. In this case the result
of their element-by-element addition and multiplication will again be a binary matrix.
In its turn, logical addition (disjunction) is of two kinds: strict disjunction (addition modulo 2); unstrict (weak)
disjunction.
The table of truth for them diﬀers by the addition of units. In set theory addition modulo 2 corresponds to the operation
of symmetric diﬀerence of two sets. Strict disjunction has the meaning of "either this or that". Non-strict disjunction
has the meaning "either this, or that, or both at once". In terms of set theory, a non-strict disjunction is analogous to a
of appearance of elements in absolute value greater than one, but as a result of matrix addition, units may appear in
several places of the rows of the sum matrix. According to the text coordinate rule, this means that two or more words
can be in the same place of the text (deﬁned by the ﬁrst coordinate).
The three types of addition operations diﬀer in the rules for matching the summand to the summand. If some rule is
accepted (concordance), then the addition rule is uniquely deﬁned.
The three types of addition can be combined into one concordance addition (concordant addition). On the basis of
concordant addition it is possible to deﬁne matrix unit concordance addition. Such multiplication must be closed by
The sum by concordance of two text binary matrices is a text binary matrix. The result of concordance addition is the
An algebraic system of matrix units by addition is a monoid of matrix units or a semigroup with zero matrix (a common
neutral element).
Before investigating the division operation of hyperbinary numbers, it is necessary to determine the order relation for
them. Division of hyperbinary numbers, as with integers, is generally possible, but only as a division with a remainder,
which by deﬁnition must be less than the divisor. The property of hyperbinary numbers to be smaller or larger needs to
be determined.
0.2.4. Order
On the set of hyperbinary numbers it is possible to deﬁne a relation of order through the relation of membership, like
integers. Any integer is a sum of units. One number is greater than another if the latter is contained by a fraction of those
units in the former and belongs to it. The same approach is used for hyperbinary numbers with phantom multipliers,
which is an example of the usefulness of their use.
0.2.4.1. Magnitude
The value on the left-hand side of the hyperbinary numbers is the trace of their left-hand phantom multiplier matrices.
The value to the right of the hyperbinary numbers is the trace of the matrices of their right phantom multipliers. Then,
a hyperbinary number is larger (smaller) on the left or on the right if the advising traces of their phantom multiplier
matrices are larger or smaller.
The scalar measure of the value µis a necessary feature of ordering of hyperbinary numbers, but not suﬃcient.
0.2.4.2. Neter chains
The value
µ
does not distinguish the distribution of units on the diagonal of the matrix. As already mentioned, the
hyperbinary number is similar on the left and right of its phantom multipliers. This means that the phantom multipliers
generate sets of hyperbinary numbers, which diﬀer from each other by corresponding similarity coeﬃcients. These
similarity coeﬃcients are themselves hyperbinary numbers.
14
Algebra of text – Pshenichnikov S. B.
There arise sets generated on the left or on the right by the corresponding phantom hyperbinary numbers. The set of all
such subsets (booleans) are arranged in chains with their generating diagonal elements - one, the sum of two, the sum of
three, and so on. These are increasing chains by the number of generating elements. For textual hyperbinary chains are
broken because their dictionaries (phantom multipliers) are ﬁnite. Chains with such properties are neter chains.
There is a simple method for constructing such neuter chains for hyperbinary numbers:
1. The product of the generating phantom multipliers of neighboring links is diﬀerent from the zero matrix.
2. The value µof the link must be smaller than µof the next link.
The increasing chains of the Nether booleans generated by the left and right phantom multipliers are a suﬃcient
indication of ordering of hyperbinary numbers.
0.2.5. Subtraction and division
0.2.5.1. Subtraction
The operation of subtracting text hyperbinary numbers is not generally deﬁned, as it is for positive integers. The result
can be a negative number. But subtraction of identical positive integers is always deﬁned.
The same is true for hyperbinary numbers. The diﬀerence matrix of diﬀerent matrix units will generally contain negative
numbers and then it is not a binary matrix. But the result of subtracting the same matrix units is a zero matrix. It is a
binary matrix.
0.2.5.2. Division
The division operation for hyperbinary numbers, like for integers, is undeﬁned. For integers, the division operation
is replaced by the corresponding multiplication operation, which is called division with a remainder (division by
Square matrix units are singular (have no inverse matrices). For hyperbinary numbers, there is a matrix counterpart to
division with a remainder of integers.
0.2.6. Phantom multiplier comparisons
Comparisons of integers are generalized to the case of hyperbinary numbers.
A diagonal hyperbinary number is called a module comparison of two hyperbinary numbers if the diﬀerence of the right
(left) diagonal phantom multipliers of those two hyperbinary numbers is divided without remainder by that module.
The set of all hyperbinary numbers comparable modulo is called the modulo deduction class. Thus, the comparison is
equivalent to the equality of deduction classes.
Any hyperbinary number of the class is called a modulo deduction. Let there be a residue from division of any member of
the chosen class, then the deduction equal to the residue is called the smallest nonnegative deduction, and the deduction
smallest in µis called the absolutely smallest deduction.
Since comparability modulo is an equivalence relation on the set of integers, the classes of deductions modulo are
equivalence classes; their number is equal to the measure µof their phantom multipliers.
0.2.7. Transformations and equations
0.2.7.1. Transformations
There is always a quadruple of such hyperbinary numbers that the product of three numbers equals the fourth hyperbinary
number. Such equality is a general formula for transforming any number from this quadruple
0.2.7.2. Equations
The formula for converting hyperbinary numbers is an equality on the four numbers. It can be thought of as an equation
where each set of hyperbinary numbers and their components can be an unknown matrix number and the remaining set
can be a given number. A system of linear or nonlinear equations can be made on diﬀerent words (matrix units), their
place in the text, phantom multipliers and summands, if on the set of equations the equivalence relations of hyperbinary
15
Algebra of text – Pshenichnikov S. B.
numbers on their phantom multipliers and summands are given or deﬁned. In this case, diﬀerent words are considered
equal if their phantom elements (phantoms) are similar, and vice versa, if words are similar (repeated in diﬀerent places
in the text), then their phantoms may diﬀer. The same applies not only to words, but also to text fragments. For example,
a phantom (hyperbinary number) may be common to all fragments and words of a text, such as a text abstract as its
invariant in transformations and the meaning of the text. In turn, this common phantom can be an unknown for the
corresponding system of equations.
In the case of such equivalence classes of textual hyperbinary numbers, the equations become entangled on equivalent
hyperbinary numbers.
Unlike polynomial systems of equations over a ﬁeld of numbers, in systems of hyperbinary equations the given and
unknown variables are noncommutative. A method for solving such systems of equations will be proposed in the
following.
0.3. Matrix Texts
0.3.1. The hyperbinary coordinate formula
In accordance with the rules of coordinatization texts are transformed into matrix texts by the following formula. Each
text word with some ordinal number corresponds to a square matrix unit with two indices, where the second index is a
function of the ﬁrst index, and this ﬁrst index is the word number. The function takes two values: if the word has not
occurred earlier in the text, the second index is assigned a value equal to the word number in the text; if the word has
occurred earlier in the text under some number, the second index is equal to this number.
A matrix text is a special matrix polynomial - a special case of a hyperbinary number. The sum of monomials in this
polynomial should be treated as a concordance summation. After matching, this hyperbinary number acquires the
properties corresponding to the rules of coordinating texts.
A matrix text consists of the sum of matrix words (monomials), in part of which a second index (repetition of words in
the text) may be repeated. This sum is a matrix polynomial and a hyperbinary number (after coordination), since each
of its summands is a matrix monomial, which may be a simple matrix unit or their product (composite matrix unit). In
this case the monomials must correspond to the coordinate formula.
The right matrix dictionary is a matrix text with excluded monomials with diﬀerent indices and consists of matrix units
with the same indices. The left matrix dictionary is the full sum of matrix units with the same indices, each of which is
a word number in the text. The dimensionality of the square matrix units of the text and dictionaries is equal to the
maximum dimensionality of any of them.
There is not more than one unit in each row of the text matrices and dictionaries, the remaining elements are equal to
zero. This property is a consequence of the uniqueness of the ﬁrst index in all matrix words of the text in accordance with
the coordinate rules and formula. In which place of the matrix row one is located is determined by the corresponding
second index.
In the matrix of dictionaries the corresponding words of the text the units are on the main diagonal. The remaining
elements of the diagonal and matrix are zero. In the matrix of the left dictionary there are ones on each place of the
main diagonal, the matrix is unitary. The right dictionary is not a unit matrix.
Separator (space) of words in ordinary texts turns into the matrix addition operation. Inversely, the original text is
reconstructed of the matrix text by indexes "forgetting" the algebraic properties (by turning the addition operation into a
divider-space).
The order of elements in matrix texts is no longer essential, unlike regular texts. The summands can be swapped,
but without changing their indices. Consequently, algebraic transformations can be performed with matrix texts (e.g.,
similarity reduction) as in the case of numerical polynomials.
0.3.2. Properties
The product on the left of the full left dictionary by the whole matrix text is, of course, the whole matrix text, because
the full left dictionary is a unit matrix (phantom multiplier). The part of the left dictionary is a projector. Multiplication
of this projector by the text on the left will extract from the whole matrix text the part of the text corresponding to this
projector.
16
Algebra of text – Pshenichnikov S. B.
The product on the right of the full right dictionary by the whole matrix text is the whole matrix text, since this dictionary
contains all the second indices present in the text, and there are no other second indices in the text monomials that
would not be present in the right dictionary. The right dictionary is the right phantom multiplier for the matrix text as a
hyperbinary number. At that, the right dictionary, unlike the left dictionary, is not a unit matrix.
The squares of the matrix text and dictionaries are the text itself and the dictionaries. The product of the right dictionary
on the left one is the right dictionary. The product of the left dictionary on the right one is the right dictionary.
0.3.3. Fragments
Each word of a matrix text is its minimal fragment. The sum of all minimal fragments is the text itself. In general, the
fragments of a matrix text are the polynomials resulting from the product of the left-hand part of the full left-hand
dictionary by the whole matrix text. For the sum of any text fragments to be this text it is necessary to understand
addition as a matching addition. After such a concordance the intersection of fragments will be excluded.
The algebraic goal of transformations of matrix texts is a reasonable (with the help of phantom multipliers) fragmentation
of the original text with a signiﬁcant reduction of the number of fragments used compared to the combinatorial evaluation.
0.3.4. Example of a linguistic text
0.3.5. Example of matrix mathematical text
0.3.6. Example of matrix Morse code
0.4. Algebra of text
0.4.1. Deﬁnitions of algebraic systems
A semigroup is a non-empty set in which any pair of elements taken in a certain order has a new element called their
product, and for any three elements the product is associative. Matrix units by multiplication form a semigroup. For
matrix units the condition of associativity is satisﬁed because they are square matrices of the same dimensionality.
Matrix units have no inverse (singular). The presence of a neutral element (unit matrix) and an inverse element is not
required for a semigroup (unlike a group). A semigroup with a neutral element is called a monoid. Any semigroup
that does not contain a neutral element can be turned into a monoid by adding to it some element permutation with all
elements of the semigroup, for example a unit matrix of the same dimension for a semigroup of matrix units.
A ring is an algebraic structure, a set in which a reversible addition operation and an multiplication operation, similar in
properties to the corresponding operations on numbers, are deﬁned for its elements. The result of these operations must
belong to the same system. Integer numbers form a ring. Integer numbers can be multiplied and added, the result is an
integer. For integers, there are opposite numbers on addition (negative integers) - addition is reversible. Integers are an
inﬁnite commutative ring with one, with no divisors of zero (integral ring). Two elements are elements of an integer
ring or the ﬁrst element (divisor) divides the second if and only if there exists a third element such that the product of
the ﬁrst number by the third is equal to the second number.
A ring of integers is a Euclidean ring. A Euclidean ring is a ring in which the elements are analogous to the Euclidean
algorithm for division with a remainder. The Euclidean algorithm is an eﬃcient algorithm for computing the greatest
common divisor of two integers (or the common measure of two segments).
For a ring, an ideal is a subcolumn closed with respect to multiplication by the elements of the ring. An ideal is called
left (respectively right) if it is closed with respect to multiplication from the left (respectively right) by the elements of
the whole ring. A ﬁnitely generated ideal of an associative ring is an ideal that is generated by a ﬁnite number of its
elements. The simplest example of an ideal is a subcollection of even numbers in a ring of integers.
Rings are distinguished by a characteristic - the smallest integer
k
, such that the product of each element by such k
(the sum of k instances of this element) equals the zero element of the ring. If no such
k
exists, then the ring has
characteristic zero. For example, a ﬁnite ﬁeld (ﬁnite number of elements) of characteristic 2 is a ﬁeld consisting of two
elements 0 and 1. The sum of the two units here is zero.
A semi-ring is two semigroups (additive and multiplicative) connected by the law of distributivity of multiplication with
respect to addition on both sides. For example, the natural numbers form a semicircle. The result of multiplication of
natural numbers will be a natural number. But because there are no negative numbers, there are no elements opposite to
natural numbers with respect to addition.
17
Algebra of text – Pshenichnikov S. B.
Algebra is a ring that has these same elements multiplied by the elements of some ﬁeld. A ﬁeld is also a ring, but such
that its elements are permutable when multiplied by each other and the elements are inverse (the product of an element
by its inverse is a unit element).
Module over a ring is one of the basic concepts in general algebra, which is a generalization of vector space (module
over a ﬁeld) and abelian group (module over a ring of integers). In vector space, a ﬁeld is a set of numbers, such as real
numbers, to which vectors can be multiplied. This operation satisﬁes the corresponding axioms, such as distributivity of
multiplication. Modulus, on the other hand, only requires that the elements on which vectors are multiplied form a ring
(associative with unity), such as a ring of matrices, not necessarily a ﬁeld of real numbers.
An ideal (right or left) can be deﬁned as a submodule of a ring considered as a right or left module over itself.
A half-module is similar to a module, but it is a module over a half-ring (no inverse elements).
A free module is a module over a ring if it has a generating set of linearly independent formants generating that module.
The term free means that this set remains generating after linear transformations of the formants. Every vector space,
for example, is a free modulus.
A free half-module is a free module over a half-ring.
Algebraic systems are an inverted hierarchical system of concepts (an inverted pyramid), where natural numbers are at the
base and various number-like objects on top, with their properties deﬁned by axioms and correspondences ("forgetting"
some part of properties) between them. For example, complex numbers turn into real numbers by forgetting the imaginary
unit, hypercomplex numbers turn into complex numbers by forgetting their matrix nature. Free semi-modules turn into
vector spaces when vector coordinates are real numbers, not hypercomplex matrix numbers, and they have no inverse in
0.4.2. Free semi-module
A text algebra is a free noncommutative semi-module (an associative algebra with a unit) whose elements (matrix units
of the text) are commutative in addition and noncommutative in multiplication, and satisfy two relations. The ﬁrst
relation determines the multiplication of matrix units (semigroup by multiplication). The second relation determines the
addition of matrix units by agreement. The result of such addition satisﬁes the rules and formula for text coordination
(semigroup by addition). The sum of the text matrix units will be the text matrix unit.
A semigroup is two semigroups. The multiplicative semigroup and the additive semigroup are related by the law of
distributivity of multiplication with respect to addition on both sides, since their elements are square binary matrices
which are distributive with respect to their joint multiplication and addition.
0.4.3. Fragment Algebra
Matrix text fragments have the following algebraic properties:
1. The divisor, divisor and quotient are deﬁned for any matrix text fragments almost the same way as for integers. The
fragments are hyperbinary numbers. Each fragment has a corresponding left and right phantom multiplier.
2. The relation of divisibility (or multiplicity) of fragments is reﬂexive, like for integers (a fragment divides by itself).
However, the matrix of a quotient is not always unambiguous. One-valuedness and diagonality of the quotient matrix
are restored by using matching addition. The reason of multivaluedness is possible repetitions of indices in matrix units
of text fragments.
3. The divisibility (multiplicity) relation is transitive,
4. – 6. Describe the properties of multiplication and addition of divisibility relations similar to integers.
7. Properties of right and left multiplication of multiples by combinations of matrix units (matrix polynomials) distinguish
divisibility (multiplicity) of integers and divisibility (multiplicity) of number-like elements (hyperbinary numbers) as
polynomials of matrix units. Integers always exist when multiples of numbers on the left or right are multiplied by
integers. In the case of hyperbinary numbers they do not always exist.
8. and 9. The sign of divisibility of matrix text fragments is the divisibility (multiplicity) of their right and left
dictionaries.
10.–18. Deﬁnitions and signs of common divisor, NOD, mutual simplicity, common multiple.
18
Algebra of text – Pshenichnikov S. B.
19, 20. The left ideal of a matrix text is the corpus of all texts (all possible ﬁrst coordinates) which can be composed
from the words of a given right dictionary (second coordinates). Indeed, the left ideal is the set of all matrix polynomials,
which are multiplied from the left by the right dictionary. The multiplication results in polynomials that have second
coordinates only such as are available in the dictionary. Also, when any polynomial on the left is multiplied by another
polynomial, the result of the multiplication is such a matrix polynomial that all its second indices are a subset of
the second indices of the polynomial to which the left is multiplied. Any matrix polynomial generates a left ideal of
polynomials that have the same right dictionary or smaller. When textual matrix polynomials are added by agreement,
the result is a textual polynomial: the polynomial matrix is binary and there is at most one unit on each row of the
matrix.
If a textual hyperbinary number (after adding the monomials that make it up) is multiplied to the left or right by any
element of a matrix semicircle, this hyperbinary number generates a left or right ideal - all matrix units multiple to the
left or right of the given matrix unit. This means that multiplying an even number by any integer results in an even
number.
The main left and right ideals are generated by each matrix unit of the dictionaries. The left and right ideals of a matrix
semicircle are generated by the sum of the generating elements of the principal ideals.
21. Ideals of matrix texts, by analogy with ideals of integers, allow to investigate not only speciﬁc texts and fragments,
but also their sets (classes). The theorems for ideals of texts are the same as for ideals of integers, but taking into account
that matrix words are noncommutative and some of them are divisors of zero.
22. The notion of divisibility of matrix texts is generalized to the divisibility of ideals of matrix texts. The properties of
divisibility of matrix fragments of the text take place in the division of ideals. The notions of NOD and NOC are also
generalized to the case of ideals of matrix texts.
23. Comparisons of integers are also generalized to the case of matrix texts. Fragments of matrix texts are comparable
modulo (measure) of some fragment if the residues from dividing these fragments by a given fragment are multiples. If
the residues are multiples then they have the same dictionaries. Therefore fragments are comparable modulo a given
fragment if the residues from division of the fragments on the given fragment have the same dictionaries. Comparability
of texts modulo some text can be interpreted as follows. Let there be a corpus of English. Six books are chosen which
most correspond to the six basic plots of Shakespeare. The matrix text of these six books is a common fragment. Then the
six books that have multiples of the residuals from dividing their matrix texts by the common fragment are comparable.
This means that it is possible to make a catalog of books for those who are interested not only in Shakespearean plots.
And the multiplicity of residues is a classifying feature for this catalog. There are six classes of residues in this example.
By taking only three books, for example, one can compare the entire corpus of English with only three stories out of
six. If one has ten favorite books or authors, one can classify the corpus of language in terms of diﬀerences from this
topthen.
24. For classes of deductions (residues) of matrix texts, operations of modular arithmetic are performed, taking into
account that, as for ideals, matrix words and fragments are noncommutative and some of them can be divisors of zero.
25. The notion of solving comparisons also generalizes to matrix texts. To solve a system of comparisons modulo means
to ﬁnd all classes of deductions such that any combinations of matrix fragment units from these classes satisfy the
comparison equation.
The unknowns in the comparison equation are the coordinates of the matrix units in the text fragments. The result of
solving a system of comparisons is such a replacement of words and/or places of words in the text that the comparison
equation is satisﬁed. For example, if a person has read ten books, then the remaining books are edited into the vocabulary
and phrases of those ten books by the comparison solution. If there is a partial solution to the comparison, then
the general solution is the class of deductions for which the partial solution (e.g., the working version of the text) is
representative of that class. Then the current version of the matrix text corresponds to the set of possible matrix texts
corresponding to the solutions of the comparison system. This property of matrix texts can be used in the creation of
texts by predicting the variant continuation of a fragment of text (autauthor).
26. Euclid’s algorithm for polynomials of matrix units is simpler than for integers. The incomplete quotient is found
in one step and depends on the number of common second coordinates. These common coordinates are deﬁned as
the incomplete quotient of the dictionaries of polynomials that are divisible. The incomplete quotient of the fragment
dictionaries is uniquely found in contrast to the incomplete quotient of the fragment dictionaries because there are no
repetitions in each of the fragment dictionaries.
The ring of integers is Euclidean. The free noncommutative semi-modulus of hyperbinarian numbers is Euclidean.
19
Algebra of text – Pshenichnikov S. B.
0.5. Algebraic structurization
0.5.1. Structurization
Structure - the totality and arrangement of the links between the parts of the whole. The signs of a structured text
are: headings of diﬀerent levels of fragments (paragraphs, chapters, volumes, the whole text); summaries (preface,
introduction, conclusion, abstract, abstract - extended abstract); context and frequency dictionary; dictionaries of
synonyms, antonyms and homonyms; marking of text-forming fragments with separators (commas, dots, signs of
paragraphs, paragraphs, chapters).
The listed structural features are the corresponding parts (fragments) of the text. For polynomial representation of matrix
text some such parts are corresponding noncommutative Gröbner-Shirshov bases of free noncommutative semi-modular
hyperbinary numbers (text algebra).
0.5.2. Example of a linguistic text
Algebraic structuring of the example text is done by transforming, using properties of matrix units, the original matrix text
in additive form into multiplicative form (similar to division of ordinary polynomials "in column"). The corresponding
commutatives are the noncommutative analog of the Gröbner-Shirschov basis for commutative polynomials. The
diamond lemma is satisﬁed - the summands have meshing to the right of the second index, but they are solvable.
During transformation (reduction) a transformation of the vocabulary of the text takes place. In the new vocabulary (the
basis of the ideal) there are new words. The words as signs are the same, but the meaning of repeated words in the text
changes. Words are deﬁned by contexts. Words are close if their contexts contain at least one word in common. Contexts
are the more close the more common words from the corresponding dictionary (common second indices) they contain.
In natural languages, the multiplicity of word contexts is the cause of ambiguity in understanding the meaning of words.
The meaning according to Frege is the corresponding part of the meanings of the sign (word). The meanings of a word
are all its contexts (properties).
The right dictionary in the beginning of structurization was a dictionary of signs-words. In the process of structurization
it is converted into context-dependent matrix constructions of n-grams (combinations of word-signs, taking into account
their mutual order and distance in the text). The semantic partitioning of the text is based on extending the original
vocabulary of the text with homonyms (the signs are the same, the context meaning is diﬀerent), and the text itself is
already constructed using such an extended vocabulary from the noncommutative Gröbner-Shirshov basis.
The marked text, after the ﬁrst separation of homonyms and their introduction into the extended dictionary, can be
algebraically structured again for a ﬁner semantic partitioning.
The extended dictionary (Gröbner-Shirschov basis) together with the contexts of repeated words is called the matrix
context dictionary of the text. The matrix synonym dictionary is a fragment of the context dictionary for words that have
similar contexts in semantic distance, but diﬀerent, like the signs in the right dictionary. Semantic distance measures a
measure of synonymy.
The matrix dictionary of homonyms is a fragment of the context dictionary for words that are the same as signs, but
with zero semantic distance.
A matrix dictionary of antonyms is a fragment of the context dictionary for words with opposite contexts. A sign of
opposites in linguistic texts is the presence of negative words (particles, pronouns, and adverbs) in contexts.
The hierarchical headings of the matrix text are fragments of the Gröbner-Shirschov basis, which have the corresponding
frequency of words of the synonymic dictionary. For example, for the example of a linguistic text, the highest heading is
two bigrams "set object" "object set".
The preface, the introduction, the conclusion, the abstract, the abstract are the headings supplemented with the elements
of the Gröbner-Shirschow basis of lower frequency. and the deductions included in the basis (as in the Buchberger
algorithm).
The repetitive words are deﬁned by the frequency matrix dictionary of the text, which is equal to the product of the
transposed text by the matrix text itself.
The list of contexts is deﬁned by the context matrix dictionary, which is equal to the product of the matrix text by the
transposed text. The context matrix dictionary is a dictionary of intervals between repeating words of the text. The
context of non-repeating words is the whole character sequence containing them. The context of the dictionary is the
vocabulary.
20
Algebra of text – Pshenichnikov S. B.
The text can be restructured with a fragment of the baseline. For example, the novel War and Peace can be restructured
into a medical theme by using a dictionary fragment related to the scene of ﬁeld surgery, and laying out the entire text
on the module of this fragment of the general Gröbner-Shirschov basis. In doing so, the supreme title may change. The
existing title of the novel (the supreme title) is considered controversial. The word "peace" has two diﬀerent meanings
(an antonym of "war" and a synonym for "society"). In 1918, the dictionary of the Russian language was changed.
The letters "
ъ
"and "i" disappeared. Two words "world" and "mir" became one word, possibly changing the author’s
meaning of the novel’s title. Using algebraic structurization, it is possible to calculate the text title as a function of the
text, using the two texts (and the two calculated context dictionaries of the novel) before and after the spelling reform.
Two texts under algebraic structurization turn into one text with a unique ﬁrst coordinate of matrix words as follows. Let
each unique ﬁrst coordinate of a word turn into two indexes. The ﬁrst is the number of the text, the second is the number
of the word in this text. Then pairs of indexes of two texts are numbered with one index and turned into one character
sequence with unique numbers (concatenation of texts).
The meaning of the text, its understanding is determined by the motivation and personal context vocabulary of the
reader. If they are determined, it is possible to restructure the author’s text, presented in matrix form, into a text as
understandable to the reader as possible (in his personal Gröbner-Shirshov basis), but with elements of the unknown,
stated in the reader’s personal language, and with additions or clariﬁcations of his personal context vocabulary.
Personal adaptation of texts on the basis of its restructuring is possible. To understand a text is to put it into one’s own
words - the basic technique of semantic reading. For texts in matrix form, to understand it means to decompose and
restructure the author’s text on its Gröbner-Shirshov basis.
Restructuring requires an algebraic structuring of the corpus of texts to compose the above vocabularies of the corpus
of language. In this case the ideals and classes of deductions of the matrix ring of the corpus of matrix texts should be
constructed and investigated beforehand. In the Bergman-Kohne structural theory, free (ﬁnitely generated) matrix rings
are related (connected) with rings of noncommutative polynomials over corpora as commutative regions of principal
ideals with rings of polynomials from one variable over the ﬁeld.
In a free semicircle between the polynomials of a text, there are relations deﬁned by interval and semantic extended
vocabularies of the corpus of texts. A particular matrix text can be deﬁned by a system of polynomial equations on text
coordinates (the unknowns in the equations are monomials with unknown indices; the noncommutative coeﬃcients
given in the equations are monomials with known indices). Some of them will be given by extended dictionaries or
inequalities to fragments, and some of them will be unknowns. In this case it is possible to set headings and summaries
by equations, and to compute draft text from systems of polynomial equations (inverse problem of structurization -
restructuring). It is possible to ﬁnd the necessary redistribution of text-forming fragments, to replace some dictionaries
with others, to change the signiﬁcance of repeated words, and to deﬁne neologisms.
0.5.3. An example of a mathematical text
The method of algebraic structuring of texts allows us to ﬁnd appropriate classiﬁers and dictionaries for texts of diﬀerent
nature. That is, to classify texts without a priori setting the features of classiﬁcation and naming the classes. Such
classiﬁcation is called categorization or a posteriori classiﬁcation. Using the example of a mathematical text, ﬁve
classiﬁcation attributes, their combinations and corresponding classes are calculated. The names of classes coincide
with the names of features and their combinations.
0.5.4. Example of Morse code
Morse code is algebraically structured into three ideals (classes) by the corresponding noncommutative Gröbner-Shirshov
bases.
The title of those letters that have a dash sign on the ﬁrst place of the 4-digit sequence pattern is:
_BCD__G___K_MNO_Q__T___XYZ (13 letters)
The title of those letters that have a "dot" sign on the second place of the 4-character sequence pattern:
_BCD_F_HI_K__N____S_UV_XY_ (13 letters)
The title of those letters that have a "dash" sign in the third place of the 4-character sequence pattern:
__C__F___J K ___OP____U_W_Y_ (9 letters)
21
Algebra of text – Pshenichnikov S. B.
0.6. Context category
0.6.1. Deﬁnitions
A matrix text word’s context is its fragment - the sum of matrix units (words) between two matrix words-repeaters.
Context is all words of a matrix text between repeating characters of the dictionary. For example, between repeating
words, repeating dots, signs of paragraphs, chapters, volumes of language texts or phrases, periods and parts of musical
works.
The signs of text fragments look the same, but they are also marks homonyms - their context is the corresponding
fragments. The context of a linguistic fragment (explication or explanation) can be not only linguistic text, but also
audio (for example, music), ﬁgurative (photo) or joint (video). The context of a musical text can be a linguistic text (e.g.,
a libretto).
Matrix words correspond to their matrix contexts, represented as algebraic objects. All possible relations between these
objects are the subject of analysis in determining the meaning of words. Category theory is useful for the study of such
constructions because it is based on the notion of transitivity.
The category of the text sign context is deﬁned as follows:
1. Category objects are pairwise multiples of contexts.
2.
For each pair of multiple objects there exists a set of morphisms (right and left parts), each morphism
corresponds to a single context.
3.
For a pair of morphisms there is a composition (the product of square matrices of two partials) such that if
one partial of the ﬁrst and second contexts and the second partial of the second and third contexts are given,
then the partial of the ﬁrst and third contexts equals the product of matrices of these two partials (taking into
account the right and left products) – the condition of transitivity.
4.
A unit matrix is deﬁned for each object as an identical morphism. Categorical associativity follows from
associativity of matrix multiplication.
The intersection (common words) of matrix dictionaries is their product. The proof follows from the deﬁning property
of matrix units and the deﬁnition of dictionaries. When the matrix units of dictionaries are multiplied (the lower indices
are the same in each unit), the product of their matrix words (units) with diﬀerent indices is equal to zero. In the product
there remain only common words with matching indices of all the multipliers.
The union of any pair of dictionaries is their sum minus the intersection (deleted repetitions of matrix units).
The minimal right-hand dictionary of the matrix text fragment is such a dictionary of the text that the dictionary and
the text are mutually multiples. For mutually multiples of the text and the right dictionary nonzero matrices of their
privates exist. The privates exist if the matrix units of the text and the right dictionary contain the same number of
second indices (coordinates) and do not contain any other second indices.
Minimal dictionaries do not contain matrix words (second indices of matrix units) that are absent in the corresponding
text fragment.
The equivalence classes of contexts are deﬁned by the common minimal right dictionaries. If a pair of contexts has a
common minimum dictionary, then these contexts are mutually multiple. Hence, there are their mutual transformations
(matrices).
If the sign-word contexts have a minimum common right vocabulary, then they are multiples of each other. Hereinafter,
the dictionaries of text fragments mean their minimal dictionaries.
If the given contexts are multiplied by the right dictionary such that each resulting context has a right dictionary
(minimum), they are called reduced contexts. During the reduction (multiplication by the dictionary on the right) the
part of matrix units with second indices, which are not in the corresponding dictionary, is removed in each of the given
fragments. If any of the obtained fragments lacks at least one of the dictionary indices, it should not get into the reduced
set.
Contexts with shared vocabularies, for example, after reduction of some word-sign from the dictionary, are objects of
the category of that sign.
A transitive closure can be deﬁned for any set of fragments by specifying for them a common vocabulary that is less
than or equal to any fragment by the order of the corresponding Neuter chain.
22
Algebra of text – Pshenichnikov S. B.
0.6.2. Example
The same example of a linguistic text is used, in which there are four identical as signs for the word "set. These four
signs, in turn, have four contexts and their four vocabularies.
The problem is to calculate the sameness and diﬀerence of the four words "set" depending on the sameness and diﬀerence
in some measure (modulo) of their contexts. The sameness of contexts is determined by the presence of common
vocabularies, which are used as a module for comparing contexts. The diﬀerence is determined by the deductions of the
contexts by the same module. Deductions will deﬁne their equivalence classes (classes of deductions) and categories of
deductions, since transitivity closure can also occur for them.
The general vocabulary of the four contexts is constructed as their product. Transitive closure on the common vocabulary-
module leads to the removal of "superﬂuous" words.
Thus, the reduced (reduced) contexts of the sign-word "set" are the four corresponding matrix words. These words
have the same matrix unit of the sign-word "object" in the uniﬁed matrix dictionary (see Dictionary Uniﬁcation). The
category of this sign is computed: the four matrix morphisms and their composition. The composition is an expression
in the language of category theory of the interval partitioning of the word "set" (chapter on algebraic structurization),
and the reduction is an example of solving a system of comparisons modulo minimal dictionary. The usefulness of using
category theory is that its approach is more general and allows the use of methods from diﬀerent sections of algebra.
Thus, all four fragments of the text are the same (equivalent) in the sense of the matrix-word sign "set" (comparable
modulo this word). There are four matrix-morphisms transforming these texts into each other. By analogy with a library
catalog, all these four texts (objects of the matrix-word "set" category) are in the same catalog box with the name of the
matrix-word "set". This is an example of a crude keyword classiﬁcation of texts. The contextual meaning of words is
not taken into account, all such words as signs are the same, and all cases of their occurrence in the text can be added up
to calculate the signiﬁcance of keywords by frequency of use.
The obtained result means that in the ﬁrst approximation all four words "set" are contextually related to the word "object".
The words "set" can be the same or diﬀerent as long as their reduced (reduced) contexts are the same or diﬀerent.
For matrix texts, modulo comparisons are performed. The residues of division of fragments of matrix texts into other
fragments (modules) can have residues (subtractions), which, as well as modules, are classifying signs.
The sign of divisibility (multiplicity) of fragments of matrix texts is the divisibility (multiplicity) of their right dictionaries.
The residues of division of the dictionaries (deductions of dictionaries) of fragments are the dictionaries of the residues
of division of these fragments.
To calculate the similarities and diﬀerences of the words you need to compare the corresponding four contexts modulo
matrix word "object". Four deductions of each context modulo matrix word "object" are calculated.
It follows from the result that all four contexts are incomparable modulo matrix word "object". The deductions are not
pairwise multiples and do not form any class of deductions in pairs. This means that all "set" words are diﬀerent in sense
(context). The similarity is found in the next step of calculations (for deductions) by calculating common vocabularies
for pairs of deductions and performing the reduction. There is no common dictionary for all deductions. This is the
reason why there is no common class of deductions and no corresponding category of the matrix word "object". But
some three pairs of deductions have three corresponding common vocabularies. Then these pairs of deductions, after
reduction, form classes and categories of deductions with the matrix word names "this", "being" and "point". The
directory with the matrix name "this" contains the ﬁrst and second fragments, the directory with the matrix name "to
be" contains the ﬁrst and third fragments, and the directory with the matrix name "dot" contains the second and fourth
fragments.
The matrix word "polynomial" is the annulator (divisor of zero) of the ﬁrst, third, and fourth fragments.
The matrix word "monom" is an annulator (divisor of zero) of the deductions of the ﬁrst, second and fourth fragments.
The matrix word "coinvariant" is an annulator (divisor of zero) of the deductions of the ﬁrst, second and third fragments.
These are context-free matrix words (the last three summands in the context dictionary of the chapter on algebraic
structurization) - when multiplying a deduction by an annulator the product is diﬀerent from zero if the deduction
contains this annulator.
So the problem statement of the above example was to calculate the sameness and diﬀerence of the four matrix words
"set" depending on the sameness and diﬀerence of their four contexts (fragments) by some measure (modulo).
23
Algebra of text – Pshenichnikov S. B.
The solution is obtained: the corresponding four matrix words (as their four contexts) are comparable in modulo of the
matrix word "object" and are not comparable (diﬀerent) in modulo of the matrix words "polynomial", "monomial" and
"coinvariant".
It means that the reduction should not be done by the common dictionary consisting of one matrix word-sign "object".
As it turns out, this word-sign has diﬀerent meanings in diﬀerent places of the text. We calculate the extension of the
original vocabulary into the appropriate context vocabulary. In the chapter on algebraic structurization this was done
using the Gröbner Shirshov noncommutative basis.
The original dictionary is converted into a context dictionary. To the four matrix signs-words "object" additional
matrix words "polynomial", "monomial" and "multiplier" were added with the help of category calculation. With these
additional words the three non-diagonal matrix words "object" diﬀer from each other.
The above categorization is a categorization of matrix texts by dictionaries. In categorization, classes and their names
are calculated as algebraic functions of the text. The categorization was computed by dictionaries, since the categorizing
features (category names) were determined by mutual intersection of dictionaries. This categorization does not take
into account the order of words in the text, but can be further used in the construction of a more subtle categorization
that takes into account the mutual order of words. Modules of comparison in this case will not be parts of dictionaries,
but fragments of contexts. Repetition of words in contexts may appear when replacing fragments of dictionaries with
fragments of texts. There is an ambiguity in the division (construction of category morphisms). That’s why at ﬁrst a
comparison is made modulo dictionaries, similarities and diﬀerences (divisors and remainders) on this measure are
determined. Then, after establishing the similarities and diﬀerences of the word-repeats in the contexts, the dictionary
comparison module is replaced by a text fragment, which already takes into account the order of words. The category
names become the text fragments.
The general method for computing the classifying features gives an analogue of CRT for matrix texts. The Chinese
remainder theorem (CRT) for matrix texts is formulated as follows. Let be given:
1. Pairwise non-multiple minimal dictionaries of matrix text fragments (already agreed).
2. Right dictionary of some set of similar texts as a sum of minimal dictionaries.
3. Right vocabulary of a set of other similar texts, smaller in measure (following vocabulary).
4.
The set of second texts is a subset of the ﬁrst texts in the sense that their right-hand dictionaries are part of the
right-hand vocabulary of the ﬁrst texts.
5.
A tuple of subtractions, where its elements are comparisons of each text from the second set with the union of
all the second texts by the moduli of the dictionaries of the ﬁrst set of texts.
Then there exists a one-to-one correspondence between the texts of the second set of texts with this tuple of deductions.
It is proved by induction using the deﬁnition of multiplicity of polynomials of matrix units and minimality of the
dictionary.
The tuple of deductions is a classifying feature of all possible multiplicities of the texts having the vocabulary of the
second set of texts or any part of it. It is by this correspondence that it is expedient to build classiﬁers of linguistic and
other sign sequences.
0.7. Concordance of meaning
Earlier texts (sign sequences with repetitions) were transformed (coordinated) into algebraic systems with the help
of matrix units as word images. Coordinatization is a necessary condition of algebraization of any subject area. A
function (arrow) is a matrix coordinatization of a text. One can perform algebraic operations with words and fragments
of matrix texts as with integers, but taking into account the noncommutativity of multiplication of words as matrices.
Structurization of texts is reduced to calculation of ideals and categories of texts in matrix (hyperbinary) form.
Here the notion of a matrix word in context is deﬁned. Words-signs in repetition may have diﬀerent fragments of text
between them (contexts), and words that are the same in spelling and sound - have diﬀerent meanings (as homonyms).
In a text, all repeated words can be homonyms if their contexts diﬀer by an appropriate measure (modulo). Conversely,
words diﬀerent in spelling and sound can have similar contexts and diﬀerent measures of synonymy. The frequency of
keywords in semantic analysis is more appropriately deﬁned as the frequency of contexts comparable by an appropriate
measure than as the frequency of word-signs, like letters of the alphabet. When calculating the semantic frequency of
words taking into account the context, diﬀerent word-signs with the same contexts should be summed in the frequency
calculation and, conversely, the same word-signs with diﬀerent contexts should be excluded.
24
Algebra of text – Pshenichnikov S. B.
Matrix words are complemented by context multipliers. These multipliers, due to the properties of matrix units, do not
lead to the change of words as signs, but contain signs aﬀecting the meaning of the deﬁned words. Phantom context
multipliers are present in matrix words, but do not aﬀect the signs. Multipliers contain relations (according to Frege)
with other signs (part of the properties of these signs is their meaning in a given context). The semantic similarity and
diﬀerence of words can then be calculated by comparing (matching) these phantom multipliers-contexts.
Algebraic operations with matrix words in context require concordance (concordance) - the semantic concordance of
signs and text fragments, which depends on the measure (module) of concordance. Matrix words can add up to a text if
their contexts have a common meaning (module). The invariants of matrix texts, which preserve their meaning when
words and text fragments are replaced with consonant ones, are increasing Nether chains. Nöther chains allow one to
make systems of algebraic equations for transformations of texts preserving their meaning.
0.7.1. Contextual concordance of words
Suppose there are two repeated words of text, whose ﬁrst coordinate of the second word is greater than the ﬁrst coordinate
of the ﬁrst word, and a matrix text fragment between these words (context). In this case according to the coordinate
rules and formula all the second indices of the fragment are less than the ﬁrst coordinate of the second word. Then the
product to the left of this context fragment on the second word is zero.
The word in the context is the product on the left of the sum of the context fragment and the unit matrix by this matrix
word.
The context fragment plus the unit matrix is the left phantom multiplier of the matrix word. The phantom multiplier does
not result in a word change, but can be used to compare two (not necessarily repeated) words by comparing their matrix
contexts. Such a semantic comparison of text words by context (meaning) will hereafter be referred to as concordance
(concordance) by word meaning.
Two words can be concordant (coordinated) either by the intersection of word contexts or by their union. In what follows,
only the intersection of contexts will be considered. Algebraically the descriptions for union and intersection are the
same. For application, their purpose is diﬀerent. A human, due to natural physical limitations, can hold only a few
entities (about seven) at a time in the process of comprehension. Such an operation of thinking as abstraction is used to
reduce the variety of the world to this number. Concordance by intersection is a mathematical explication of the process
of abstraction in the form of reduction. The limiting case of abstract concepts of natural language are logical categories
(Aristotle, Kant, Hegel). Hierarchical continuity of concepts (words) is necessary for the construction of part-whole
relations (relations of understanding).
Concordance by association increases the essences. But their number matters only for humans. For machine languages,
this limitation is insigniﬁcant. Therefore, concordance on association can be applied to machine interactions, and to the
future collective intelligence of the human population, for which collective understanding technologies must be created.
At present, acceptable understanding is achieved in collectives of programmers. For collectives of ﬁve or more, such as
medics, there is no single term that they understand in the same way. In mathematics, seemingly the universal language
of humanity, with ideal objects not changing over time, specialization has reached such a level that fully understood by
territorially distributed teams of three or four people.
Concordance by intersection would be called simply concordance.
Two words are concordant (matched) by the intersection of the right dictionaries of their contexts, if the intersection
(product) of the two dictionaries is diﬀerent from zero. Words are concordant in the sense that their contexts have a
common vocabulary. Contexts after reduction are congruent. Reduction (see "Context Category") is the product to
the right of each context by this common intersectional vocabulary. Each reduced context contains all words from the
common dictionary. N words are concordant if each word pair is concordant.
The relation of concordance is an equivalence relation, because the conditions of reﬂexivity and symmetry for matrices
are satisﬁed, and the transitivity of the relation follows from the deﬁnition of word concordance.
The measure (module) of concordance is the common vocabulary. It is this modulus that explains the emergence of the
term "modulo concordance" by analogy with the term "modulo comparison" for integers. Just as diﬀerent integers can
be equal modulo, so diﬀerent (as characters) words in a text can be equivalent (interchangeable) modulo concordance.
This means that if words have concordant contexts, then the words have concordant meaning and can be considered
equivalent (interchangeable in meaning in the text).
Words and their sums can be concordant modulo. On concordance relations, like equality and comparison modulo,
it is possible to make systems of concordance equations. The unknowns can be deﬁnable and determinable words,
concordance moduli, contexts, and text fragments. Concordance equations allow to calculate answers to such questions:
25
Algebra of text – Pshenichnikov S. B.
in what sense (here the unknown is the module of concordance) are words and texts concordant? If the meaning (modulus)
is given, what set of words do we replace with other words? In this way, it is possible to compute word deﬁnitions and
sense versions of texts. Find interchangeable words, compute semantic markup and text structuring, annotation text
drafts, and semantic text translation (even of the same language). New functions of text editors and readers, messengers
and social networks can be based on these computational capabilities. In the latter case it is possible, by compiling a
personal contextual dictionary of a user-participant according to his messages, to accompany the communication with
semantic translation of text and sound through personal contextual languages of other participants.
Concordant addition of a pair of words is an expression in which in the common left phantom multiplier two fragments
are added and multiplied to the right (reduced) by the common vocabulary. A unit matrix is then added to this expression
of the sum of the fragment-contexts. This ﬁnal expression is the phantom multiplier and the concordant context of the
concordant sum of the two words. The module of concordance is the common vocabulary of the two contexts. The
concordant sum of
n
words is deﬁned similarly. The common dictionary is the product of their n right dictionaries of
n
contexts.
Two words are concordant if the right dictionaries of their contexts have a non-zero overlapping area. But each word of
these contexts, in turn, is also a word in context. Therefore, the mutual concordance of the word being deﬁned with the
words being deﬁned is necessary. Such reﬂexivity, is the reason for the ambiguity of natural language and interpretations
of texts ("I think that they think that I think, . . .").
A mathematical explication of reﬂexion is the latent semantic nonlinearity of linearly ordered word-signs. Perhaps,
in the future, linguistic texts will cease to be linear and one-dimensional. Note texts, for example, are 5-dimensional,
although they can also be transposed into one-dimensional stan-"thread", but this will turn note texts into monstrously
incomprehensible codes with dictionaries comparable to the dictionaries of language texts. Such one-dimensional music
texts, like language texts, would require a semantic gestalt translation, not just a personal intonation translation, as for
5-dimensional music texts. Future multidimensional language texts will be able to point to meaning chains to reveal the
meaning of words and text fragments, rather than recognizing them intuitively or with the help of meaningful (quick)
A fragment in the deﬁnition of a word in context, in turn, can be seen as a concordant sum of matrix words, since each
summation word in this fragment also has its own context. Then the word can be appropriately deﬁned in such a reﬁned
context. The word in the reﬁned context is a matrix bilinear form.
Two words are concordant in reﬁned contexts if the intersection (product) of all vocabularies of all contexts of both
words is diﬀerent from zero. N words can be concordant in reﬁned contexts if each pair is concordant. The modulus of
concordance is the product of all vocabularies of all contexts of all forms.
There can be concordant sums of words (text fragments) over reﬁned contexts if each pair of sums is concordant.
A pair of word summaries is concordant if the product of the vocabularies of all contexts of all words of the summation
pair is non-zero.
If the modulus of concordance, as the product of the dictionaries of all reﬁned contexts of all words as bilinear forms, is
nonzero, then the text of these words is concordant.
All words and fragments of the matrix text can be decomposed into concordance classes.
Each word corresponds to its phantom multiplier on the left deﬁned above. Each text fragment corresponds to its
vocabulary (phantom multiplier on the right). These multipliers exist, but do not change the word or fragment. At the
same time such multipliers are uniquely determined from the text by its fragments. The absence of multiplier inﬂuence
on signs is a necessary condition, but not suﬃcient for concordance relations. A suﬃcient condition is that multipliers
not aﬀecting signs and fragments of signs are a one-valued function (property) of the text.
Each pair of words in the text corresponds to a concordance module (kappa) - the product of all vocabularies of all
reﬁned contexts of both words.
Each pair of text fragments corresponds to a kappa concordance module – the product of all the dictionaries of all the
speciﬁed contexts of all the words of the pair of fragments.
Each pair of word and text fragment corresponds to a kappa concordance module - the product of all the dictionaries of
all the reﬁned contexts of the word and text fragment.
Conversely, each kappa module (class name) corresponds to a set of reﬁned contexts, a set of words corresponding to
these contexts and a set of text fragments with a vocabulary equal to kappa. All these three sets are mutually concordant
and all their elements are elements of the same kappa concordance class. The set of all concordance classes modulo
26
Algebra of text – Pshenichnikov S. B.
kappa is the Boolean set of all n words of the vocabulary of the text or all its partial sums (fragment dictionaries). The
number of all partial sums is two to the power of n.
The belonging of such elements to the same class means that there are matrices of transformation of elements into
each other. Indeed, if the set of reﬁned contexts, the set of words corresponding to these contexts by and the set of text
fragments have a single dictionary equal to kappa, then all these elements are similar to each other. In this case, the
common object of transformation in reﬁned contexts and text fragments are matrix polynomials (hyperbinary numbers).
Mutual transformations of reﬁned contexts, words corresponding to these contexts and text fragments having the
vocabulary equal to kappa are as follows:
1. Conversion of a pair of reﬁned contexts of the form.
Let there be two matrix texts. Because they belong to the same class, they have the same kappa modulus or, what is the
same, they have the same right dictionaries. But matrix texts having the same dictionaries form ideals of the matrix
semicircle (are multiples of the dictionary). There is always a matrix polynomial, whose multiplication of the left by
one reﬁned fragment results in the corresponding reﬁned fragment. To the precision of the matrix multiplier (quotient)
the two reﬁned fragments are indistinguishable (interchangeable).
2. Conversion of words in a reﬁned context.
Let there be two words in context. Since the words are concordant (have a common kappa dictionary as the product of
all the dictionaries of all the reﬁned contexts), the words are concordant by kappa. Like integer comparisons, matrix
unit concordance can be written appropriately through equality.
3. Conversion of words and contexts.
Let there be a word and a context. A word and a context are concordant if they share a common kappa modulus. There
is also a notation via equality as a formula for computing the naming of a text fragment by a word belonging to the
kappa concordance class. And vice versa, the deﬁnition of a word by a text.
0.7.2. Meaningful Nether chains
The kappa concordance classes are distinguished by the words included in the kappa dictionary. Let a sequence of
dictionaries be given such that neighboring dictionaries diﬀer by one word. The concordance class (title kappa) for each
kappa is the set of all words in the reﬁned context, all reﬁned contexts and all text fragments sharing a common kappa
dictionary. Elements (header kappa) are mutually substitutable by the formulas.
Let there be concordance classes as chains of "includes in" and "includes itself". In such chains of dictionaries there is
an increase of words in each dictionary from left to right or a decrease.
The sequence of such non-empty subsets of a corpus of texts based on a corpus dictionary of all texts is increasing,
because each one is a subset of the next one.
Conversely, the sequence of subsets is decreasing, since each of them contains the next subset.
A sequence is said to stabilize after a ﬁnite number of steps if there exists such n that for all subsequent numbers of
the subset chain they coincide. This holds for matrix texts - there is no greater dictionary than the dictionary of all
texts. The set of subsets of a given set (capital kappa) satisﬁes the condition of cliﬃng of increasing chains, since any
increasing sequence becomes constant after a ﬁnite number of steps.
Any decreasing sequence becomes constant after a ﬁnite number of steps, since the dictionary (header kappa) has a
minimal set – one word, hence the corresponding set of subsets satisﬁes the condition of breaking of decreasing chains.
In general algebra, objects are called nether objects if they satisfy chain breaking conditions. Amalia Emmy Nöther
made masterful use of the cliﬀ chain technique in her many cases. Objects such as concordance classes are also Nether
chains.
Nether chains can also be deﬁned for word order in a text. Relative word order is essential for texts. For example,
"incidental in the necessary" diﬀers in meaning from "necessary in the necessary" or "mom’s dad" and "dad’s mom".
For musical texts and codes, the order of characters is as signiﬁcant as the characters themselves.
The concordance module is a fragment of the vocabulary of the text. For a dictionary, the word order is insigniﬁcant.
Therefore, the concordance class contains elements without taking into account the order of words in text fragments.
The word order is taken into account through the available subclasses of the concordance class. As a result of the
calculations, the corresponding concordance class and the three features of its three subclasses that take into account
word order are described. The order subclasses are deﬁned by ascending or descending Neuter chains for the ﬁrst
27
Algebra of text – Pshenichnikov S. B.
coordinates of matrix monomials in the left word dictionaries. For the left-hand dictionaries there are also Neuther
chains as for the right-hand dictionaries, and this is how the meaning of the word order in the matrix texts is taken into
account.
Neuter chains for words and their order are semantic invariants of the text, preserved by appropriate concordant word
substitutions in the text (retelling the text in one’s own words), substitutions of fragments with words (abstracting and
annotating), substitutions of words with fragments (bot-authorship). The invariance comes from the fact that the neoteric
chains are constructed by the left or right dictionaries of matrix polynomials. The invariance on the neoteric chains
of the right dictionaries means that the places of words in the text are not important for the meaning of the text, what
matters is the system of their context correspondence as a function of embedding (taking into account the order of
words within n-grams). Invariant on the Nöther Chains of the Left Dictionaries means that for the structure of the text
the words from the right dictionary are not important, the system of their structural correspondence as a function of
embedding the left dictionaries of the text-forming fragments (the structural pattern of the text) is important.
The Neuter chains of the text are more preferable for semantic analysis than the frequent keywords, because they take into
account the contexts of words, and also reveal the regularities of disclosure of the system of concepts in the text through
the sequence of nesting of their content (context) - this is the above-mentioned hierarchical continuity of concepts
(words). Logical, ethical and aesthetic categories of natural languages can be computed as Neuther chains of meaning.
If the Nöther chains of meaning are deﬁned as target functions (sequences of embeddings), it is possible to compose
systems of equations on variables of bilinear forms. Because the variables are pairwise meshed with each other (pairwise
nested in Nether chains) a system of quadratic equations on words in reﬁned context, their contexts and text-forming
fragments as unknowns of such equations can be compiled.
0.7.3. Meaning Equalizers
In category theory the following model is called an equalizer (a generalization of the equation) as applied to matrix
text fragments. Let four fragment objects and their dictionaries be given. The objects are connected by a pair of
morphisms. The second dictionary is part or all of the vocabulary of the ﬁrst dictionary. The third object-fragment and
its morphism-transformation to the ﬁrst object-fragment (inclusion function) is called an equalizer of the ﬁrst pair of
morphisms if the matrix products to the right of each morphism of this pair on this inclusion function are concordant.
For any other fourth object, the right products of each morphism of the same pair on the inclusion function of the fourth
object in the ﬁrst one are also concordant and there exists a single morphism of the third object in the fourth such that
the matrix product of the ﬁrst object morphism in the third one by the morphism.
The essential diﬀerence between the above deﬁnition for the equalizer of matrix fragments and the canonical deﬁnition
of the equalizer for the Set category, for example, is the replacement of the equality relation by the concordance relation.
But since equality and concordance relations are equivalence relations (they have properties of reﬂexivity, symmetry
and transitivity), such replacement is admissible and satisﬁes the axioms of the category.
The reason for using concordance is as follows. For the ﬁrst two objects-fragments of the category it is required to ﬁnd
the third fragment of the text and its corresponding matrix polynomial-transformation such that, when multiplied by it
on the right, the ambiguity connected with the repetitions of the indices in the fragment monomials is eliminated. Since
in the monomials of matrix polynomials both coordinates refer to the position of words in the text, multiplying the
ﬁrst pair of morphisms by the inclusion function is the consistent rule of choice for repeated words, which eliminates
ambiguity.
If words are considered in a reﬁned context, the semantic distinction of repeated words in the text and their concordance
by reﬁned contexts are used to achieve this unambiguity.
The system of equations for fragments in reﬁned contexts (a word is a special case of a fragment) can be compiled in
three ways:
By the correlation of the concordance of text fragments in speciﬁed contexts. For example, it is the concordance
of the title of the text and the whole text or parts of the text (paragraphs, chapters, etc.), parts of the text (for
example, the abstract and the whole text, the ﬁrst paragraphs of paragraphs, etc.). The listed combinations of
fragments are labeled and are the corresponding equation numbers in the equation systems of the text.
According to the Nöther chains of text fragments and their ordering. The equations in this case are recurrence
and are deﬁned by corresponding formulas. Recurrence on the ﬁrst coordinates determines the sequence of
text fragments (structural pattern of the text). Recurrence on the second coordinates determines the sequence
of fragments by continuity of meaning (contextual table of contents of the whole text and its sections). Each
Nether chain deﬁnes an equation in a system of equations.
28
Algebra of text – Pshenichnikov S. B.
The combination of the two points above.
Systems of matrix equations have the general form of a bilinear form and, depending on which fragments in the
corresponding bilinear form are taken as unknowns, are either systems of linear or quadratic equations. The set and
unknown quantities are matrices. For the linear case, there are matrix versions of the Gaussian method for solving
systems of linear matrix equations. For systems of quadratic matrix equations, there is also a generalization of the
Gaussian method of eliminating the unknowns and reductions in systems of equations with many unknowns to an
equation with one unknown and formulas for the relationship between the unknowns.
It is possible to reduce a system of quadratic equations to a system of linear equations without loss of generality and
accuracy by using alternions, which are hypercomplex numbers consisting of hyperbinary numbers.
An example of an exact linearization is given.
For accurate linearization and solution of systems of concordant equations, it is necessary that matrix words and texts
commutate with the alternions and that the squares of the unknowns exist. The ﬁrst requirement is satisﬁed by using the
corresponding property of the Kronecker (direct) product of matrices, the second requirement for matrix texts is always
satisﬁed since the square of a matrix text fragment is the same fragment.
29
Chapter 1
Text Coordination
1.1. Rules
1.1.1. Texts and Algebra
The successes of the human population are based on the discovery of words and numbers, which are also words. But
numbers are not just words. The eﬀectiveness of human actions depends on the degree of understanding of words,
because "a person can understand only what is expressed in words" .. At that, understanding (interpretation) of the
words themselves is personiﬁed and historical. People understand numbers in approximately the same way, regardless of
the place and time of their use. The language of numbers is universal, universal and eternal. People may not know each
other’s language, but if there is a motivation (for example, money), then they will agree on the language of numbers.
The interpretation of words other than numbers depends on a person’s genotype and phenotype. Each person will put
what he/she has read and heard "in his/her own words" diﬀerently. Also the interpretation (meaning) of words may
change over time. The same text will be explained diﬀerently after some time if a person has grown up and read more
books. Biblical parables two thousand years ago had a diﬀerent meaning than their modern interpretation. And only
with the help of numbers (special words) were ideal (algebraic and geometric) objects created, which do not change with
time (P.G. Kuznetsov) and are equally understood - this is their main purpose and diﬀerence from the words of natural
language. The concept of a triangle has not changed in the four thousand years since it was discovered in Babylon.
Numbers and letters are two kinds of ideal objects-signs for studying relations of real objects. Algebra, as a branch of
mathematics, makes no distinction between numbers and letters. In texts (sign sequences), letters are combined into
words of language or formulas.
Algebra (a symbolic generalization of arithmetic) and text (a sequence of symbols) are so far two very diﬀerent tools of
cognition.
In mathematics, the few properties (relations) of objects that are explicitly deﬁned by axioms and theorems in terms
of vocabulary are explored. Axioms are symbols of faith in mathematics. Theorems are derived from axioms and are
recognized as properties of objects using rules of inference (formal logic). The meaning of the mathematical object of a
theory and its relations with other objects of that theory does not change over time.
In linguistic texts, the analogue of axiomatics refers to the entire corpus of language. Each word can be related to each
of its contexts in the literature where the word occurs. This is the main diﬀerence between mathematical texts and
natural language texts. In this sense, each word of natural language has an analogy with an axiom, and the context of the
word has an analogy with a theorem. At the same time, one word may correspond to opposite "theorems" in meaning,
for homonym words, for example. And this is not a disadvantage of natural language, but its deﬁning property.
The meaning of language words is in their contexts, which are also historically changeable. All words of a contextual
language are homonyms. A word has as many properties (relations between words) as there are contexts in the whole
corpus of language. To understand an individual text and manage meaning without explicit references to other texts
cannot be avoided to limit polysemy. If references are missing, the reader or listener implicitly uses his or her personal
associations (references) with other texts to recognize the meaning of the information received. Discrepancies in such
references cause general misunderstanding.
30
Algebra of text – Pshenichnikov S. B.
References limit the meanings (all properties) of a word and give it meaning (part of the meanings) in Gottlieb Frege’s
semantic triangle. References for texts perform the role of logic in mathematics. References, bookmarks, footnotes,
notes, and hidden fragments formalize a language text for understanding, but do not yet become mathematics (algebra
in particular).
1.1.2. Coordinating
A necessary condition of any algebraization is coordinatization, which begins with digitization of the object under study
(recalculation) .
Coordinatization is replacement, modeling of object of research by its digital copy. This is followed by the correct
replacement of the model numbers themselves with symbols and the determination of the properties and regularities of
the combinations of these symbols.
If correct coordinatization (matching a mathematical object under study with a set of numbers or symbols) is applied to
a text, then the text can be reduced to algebra. The necessary requirements for the "correctness" of the coordinatization:
the mathematical objects serving as coordinates must be individual (distinguishable); the new coordinating objects
must be as general as possible and then have maximum applicability. In this sense, the new number-like objects as
coordinating tools should not be inferior to their ancestors, the natural numbers. Operations (addition, multiplication,
comparison) can be performed on these new objects. In the future, for algebraization of texts, such objects will be
hyperbinary numbers.
Coordination is connected with measurements. To make measurements, it is necessary to describe the procedure for
making them - the rules of coordinatization. Therefore to make coordinatization of subject domain means to describe
rules of measurements. Galileo, the founder of coordinatization, deﬁned its idea as follows: to measure everything that
is measurable, and to make everything that is not measurable, measurable. As a result, analytic geometry was created,
where points were coordinated by pairs or triples of real numbers, and curves and surfaces by coeﬃcients of polynomials.
Later it turned out that the numbers then in use were not suﬃcient to fulﬁll Galileo’s task. It was impossible to express
the roots of a polynomial by a real number. New numbers were needed - complex numbers.
The crisis of measurements was always solved by the invention of new extensions of the number concept. So a series
from natural numbers to complex numbers arose. Completely diﬀerent number-like objects are matrices . They
create their own line of numbers, which are called hypercomplex numbers. A particular case of hypercomplex numbers
is hyperbinaries. It is these that will be used later for algebraization of texts.
For successful algebraization it is crucial to describe the coordinate rules and properties of coordinating objects in such
a way as to reduce the variability of their choice.
1.1.3. The aim of algebraization
The goal of text algebraization is to be able to calculate from solutions of systems of algebraic equations:
text meaning
(similarities and diﬀerences of texts from other texts)
structuring variants
(diﬀerent fragmentation of the text and changing the order of the text-forming fragments)
general meaning (structurization invariants)
(a summary of a text that does not depend on structuring variants)
context dictionaries
(tables of relations between words and their various contexts)
semantic (contextual) translation
(other text on the basis of context dictionaries)
caption names of text fragments
(on the basis of the calculated meaning and context dictionaries)
(headings of text fragments for diﬀerent variants of structuring)
restructuring
(other structuring according to the target requirement)
31
Algebra of text – Pshenichnikov S. B.
summaries
(title, introduction, introduction, conclusion, abstract, outline, presentation)
versions of the text according to the target requirements
(text based on a given short content)
The functions listed above are possible new user functions of voice assistants and text applications (e.g. text editors,
readers and future sense translators), are not a substitute for creativity, but only a technical support tool.
Texts are deﬁned as sequences of characters (letters, words, notes, etc.). The character sequence
x1, x2, . . .
– s a function
of natural numbers N:xn=f(n), where xn– elements of sign systems.
There is a hierarchy of ﬁve types of sign systems :
natural (e.g., zoo and herbarium)
ﬁgurative (such as road signs and emoji)
language (words)
records (e.g. genetic sequences of biopolymers, cartography, sheet music, dance recording systems, mathemati-
cal - letters, numbers, symbols)
codes (relations of signs).
The sign systems listed above are separate species of visual and auditory systems. The sign systems of taste and smell
exist only as natural species so far. For the sense of touch, there is an embossed-dot tactile Braille code.
1.1.4. Dictionaries and alphabets
The carrier of a character sequence is all its characters without repetition. The carrier can be called an alphabet or a
dictionary of a sign sequence. It is more appropriate to call the carrier in the following as an alphabet is a special case
of the dictionary when there is no dependence of the property (meaning) of the symbol on its place in the sequence and
the surrounding characters (contexts).
Letters in a language are basic symbols (signs). They may diﬀer in spelling (handwriting, font), but the sound
pronunciation of a letter (the meaning of a phoneme) does not depend on its position in the sign sequence. Pronunciation
can also vary (due to speech defects or voice pitch, for example), but the main thing is that phonemes can be standardized
with other more elementary "letters".
So in the Morse code, for example, the letters themselves are already recorded through extremely elementary phonemes
(a short sound and a longer sound equal to three durations of a short sound). Latin letters in Morse code then become
texts consisting of dots and dashes as words (letters as words). In this case personal pronunciation is irrelevant, because
sound recognition depends on only one physical parameter - time as the duration of the sound. Thus, the letters of the
alphabet, e.g. Latin or Russian, can be recorded as sequences of dots and dashes (texts).
Words are sequences of letters or elementary phonemes. The meaning of a letter is only in its form or sound. A letter is
also itself a word of dots and dashes. The elementary alphabet in this case consists of two "letters" - dots and dashes.
The form of such an elementary letter (its meaning) does not depend on the surrounding letters and is determined only
by the relative duration. There is no context dependence of the letters of the alphabet.
Ultimate context dependence is present in the words homonyms of the natural language. For example, the Russian word
"kosa" has four diﬀerent meanings. It is believed that a ﬁfth of the vocabulary of the English language is occupied by
homonymy (deer - deer and/or respected person).
1.1.5. Repetitions
Text - a character sequence with at least one repetition. Vocabulary is a character sequence without repetitions. The
presence of repetitions allows you to reduce the number of used character-words (reduce the vocabulary). But then the
repeated signs can diﬀer in meaning. The meaning of a word depends on the words around it. Here, the meaning is
understood as a part of the connections (relations) of a sign with other signs. All possible relations of a given sign to
other signs are called the meaning of the sign. The meaning of a sign-word is some part of its meanings. Similarities
and distinctions of concepts "sign", "meaning" and "meaning" were investigated by H.Frege, as already mentioned, by
means of a semantic triangle. The problems of understanding the word and text have as their reason that this some part
of meanings (meaning) is deﬁned or guessed subjectively and ambiguously. Diﬀerent readers and listeners understand
diﬀerent meanings for the same word.
32
Algebra of text – Pshenichnikov S. B.
When words are repeated in a text, there are preferable, in the opinion of the author of the text, connections-relationships
of the repeated word with other words. These relations are recorded as a new meaning of the repeated word. The new
meaning manifests itself in the allocation of all meanings-properties of the object-words that are essential in this part
(fragment) of the text. The use of repetition is the main method of establishing patterns, the meaning of objects and
their relationships.
When teaching, repetition is used especially explicitly ("repetition is the mother of learning"). In music, the author’s
repetitions (characteristic musical fragments) are called patterns. But each repetition can lead to a change in the meaning
of the text. For context-independent sign sequences, the repetition of a sign does not change the meaning of that sign.
For example, no matter how many times a "dash" is repeated in texts using Morse code, the dash’s duration will still
remain equal to the three durations of a "dot".
1.1.6. Meaningful markup
If a particular text does not have explicit repetitions, it does not mean that they are not hidden in semantic (contextual)
form. The meaning can be repeated, not only the character-sign (word) denoting it. The context here is the fragment
of text between the repeating words. This is a narrowing of the meaning of the canonical concept of context. This
use of the concept of "context" will be needed to calculate the coordinates of a text fragment when relating it to the
context deﬁnition of a word. For the canonical deﬁnition of "context" the context boundary can be assigned only
expertly (subjectively). That is why context dictionaries, dictionaries of synonyms and antonyms are compiled expertly
and intuitively. The above deﬁnition of context gives a chance to compute semantic dictionaries (context, synonyms,
antonyms and homonyms), because the concept of "context" is formalized by indicating its boundaries between recurrent
words.
Moreover, the context is not only of two words repeated in a row. The closeness of words according to their location
is more important for a person than, perhaps, true, but more distant in the sign sequence semantic closeness. This
phenomenon can be explained by the innate mental limitation of any human, who is unable to hold more than seven
entities on average  in the consciousness simultaneously to recognize the meaning of objects and their relations.
The meaning or context of a word can be pointed to by referring to any suitable fragment of text, not necessarily located
in the immediate vicinity of the repeated word. In this case, the text loses its linear order, similar to composing words
from letters. If there are no such fragments, the meaning of the word is borrowed by reference to a suitable context from
another text of the corpus (library).
Meaningful markup of ordinary one-dimensional, linear texts is already possible with the text editor through links,
bookmarks, footnotes, notes, and hidden fragments. But it will still be an imitation of a solution. Reading such graphically
complex text takes some getting used to. The future multidimensional semantic format of linguistic text has yet to be
created. A prototype of such a solution could be familiar musical texts on a 5-dimensional stanza. In this case, the
coordinatization of the text will not become two-dimensional, but multidimensional (tensor). Each coordinate-index
will indicate the meaning of words or fragments of the test as connections to other contexts. The ﬁve-dimensional
dimensionality of the meaning space is optimal for the physiologically limited mental capacity of any human being.
Inversely, if somehow the contexts of diﬀerent words are similar, then diﬀerent words are similar in the sense of their
common contexts, the word-signs of those contexts. Contexts are similar if they have at least one common sign-word.
The measure of contextual proximity of words can be determined, for example, by well-established measures of similarity
of genetic sequences.
Common words in contexts, in turn, also have their contexts - the notion of a reﬁned word context arises. This allows
for meaningful translation of texts. For example, one can translate the text of the "Creed" into the language of Hegel’s
logical categories - "I recognize the axioms of equivalence: the universal and singular cause is the singular necessity/
The ﬁnite subject is the inﬁnite object/ The subject of the formed content is inﬁnite in time and space".
1.1.7. Coordinating rules
In a ﬁnite signiﬁer sequence
x1, x2, . . . , xn(1.1.7.1)
each sign has a unique number that determines the place of the sign in the sequence. No two characters can be in the
same place. But the text requires another index indicating the repetition of a sign in the text. This second index creates
equivalence relations on the ﬁnite set (1.1.7.1). The elements of
(1.1.7.1)
. The elements of
(1.1.7.1)
with one index are
all diﬀerent. If some signs in (1.1.7.1) oincide, the second index must indicate this. Thus, the sign xiit is advisable to
match some two-index object Ei,j (e.g., some matrix)
33
Algebra of text – Pshenichnikov S. B.
xiEi,j (1.1.7.2)
The ﬁrst index of the matrix
Ei,j
indicates the number of the character in the sequence. The second index is the number
of that character ﬁrst encountered in the sequence. Then the repetitions of characters have the following form, for
example, sequences of seven characters
E1,1, E2,1, E3,3, E4,4, E5,1, E6,3, E7,4(1.1.7.3)
In the text
(1.1.7.3)
the sign or now matrix
E1,1
repeated on the second and ﬁfth places, the sign
E3,3
is repeated in the
sixth place, the sign
E4,4
– on the seventh. Signs
E2,2
,
E5,5
,
E6,6
,
E7,7
do not exist. The carrier (dictionary) of the
sequence (1.1.7.3) is its part of signs with the same indices
E1,1, . . . , E3,3, E4,4,...,...,... (1.1.7.4)
The dictionary
(1.1.7.4)
of the character sequence
(1.1.7.3)
– is the sequence
(1.1.7.3)
, itself, but with deleted repetitions,
which are marked as «
. . .
». Then the second coordinate in
(1.1.7.3)
can be considered a word number in the dictionary.
Missing word numbers in the dictionary can be eliminated by continuous numbering.
Text coordinating rules:
Rule 1. The ﬁrst index of the text coordinating object
Ei,j
– is the ordinal number of the word in the text, the second
index is the ordinal number of the same word ﬁrst encountered in the text. If the word has not been encountered before,
the second index is equal to the ﬁrst index.
Rule 2. The dictionary is the original text with deleted repetitions. It is possible to order the dictionary with exclusion
of gaps in word numbering.
Rule 3. For two or more texts which are not a single text, the word order in each text is independent. In two texts the
initial words are equally ﬁrst. Just as in two books, the beginning pages begin with one.
Rule 4. The common dictionary of a set of texts is the dictionary of all texts after their concatenation. It is possible to
arrange the dictionary with deletion of gaps in the numbering of words.
1.2. Examples
1.2.1. Similarity and sameness
One of the mathematical models of sign sequences with repetitions (texts) can be a multiset. Multisimilarity was deﬁned
by D. Knuth in 1969 and was later studied in detail by A.B. Petrovsky .
The universal property of a multiset is the existence of identical elements. A set is a limiting case of a multiset with
unit multiplicities of elements. A set with unit multiplicities corresponding to a multiset is called its generating set or
domain (dictionary). A set with zero multiplicity is an empty set.
The problem in this case is to determine the sameness of elements. The sameness depends on the considered properties
of these elements. Cucumbers and watermelons are externally identical in color, but it is diﬃcult to call them identical
in gastronomic use, although the botanical description is much the same.
According to G. Frege any object having relations with other objects and their combinations has as many properties
(meanings) as these relations. The part of the values taken into account is called the sense by which the object is
represented in a given situation. The naming of an object by a number, symbol, word, picture, sound, gesture to describe
it brieﬂy is called the sign of the object (this is one of the meanings).
Each of the all possible parts (boolean set) of an object’s meanings (meaning) corresponds to one sign. This is the main
problem of meaning recognition, but it is also the basis for making do with minimal sets of signs. It is impossible to
assign a unique sign to each subset of values. The objects of information exchange are minimal sets of signs (notes,
alphabet, language dictionary). The meaning of signs is usually not calculated, but is determined by the contexts
1.2.2. Example on an abacus
A solution to the problem of sign ambiguity is the semantic markup of text. The semantic markup can be explained
on the example of marginal unambiguity. On Russian abacus the text is a sequence of identical signs (knuckles). The
34
Algebra of text – Pshenichnikov S. B.
vocabulary of such a text consists of a single word. This is even stronger than in Morse code, where the dictionary
consists of two words. Without semantic markup, it is impossible to use such texts. Therefore, the vocabulary changes
and the characters are divided into groups - ones, tens, hundreds, etc. These group names (numbers) become unique
word numbers. The vocabulary of
D
are numbers from 0 to nine. Each knuckle can also be represented by an undeﬁned
matrix
Ei,j
from
(1.1.7.2)
on such a Cartesian abacus. For example, the number 2021 on the matrix abacus is the sum
of four matrices
E1000,2+E100,0+E10,2+E1,1,
where the lower indices are the Cartesian coordinates of the matrix word (a number in this case). The transformation
of identical objects into similar ones has taken place. The measure of similarity is the values of word coordinates. In
addition to positional, repetitions of dictionary digits occur when arithmetic operations are performed. Equivalence
relations are established
Ei,0Ei+1,1
If after the arithmetic operation the number 9+1 is obtained, then 0 appears in this position and 1 is added to the next
digit. On the abacus all knuckles are shifted to the initial (zero) position, and one is added in the next digit (wire). On
the matrix abacus the transformation
Ei+1,1=Ei+1,iEi,0E0,i
f we deﬁne a measure of the sameness of the signs, then the relation of tolerance (similarity) can be again turned into a
relation of equivalence (sameness) according to this measure. For example, by rounding numbers. One can recognize the
diﬀerence between tolerance and equivalence by the violation of transitivity. For tolerance relations it can be violated.
For example, let an element
A
be similar to
B
in one sense. If the sense of
B
oes not coincide with the sense of element
C
, then
A
can be similar to
C
only in the part of intersection of their senses (part of properties). Transitivity of relations
is restored (closed), but only for this common part of sense. After the sameness is achieved, A will be equivalent to
C
.
For example, the above transformation (closure) on some coordinates provides arithmetic operations on a matrix abacus.
1.2.3. Chess example
Another example of contextual dependence of signs is chess. It is even stronger in double chess . In this modiﬁcation
of chess, a ﬁnite number of double moves are allowed to be made during the game at any given moment. The game
remains non-contradictory. The rest of the rules are the same as in standard chess except for two: the ﬁrst move is single
and castling at check is allowed. The author of the game when all moves are doubles is Professor G. Zaitsev. A.
For chess the vocabulary of the matrix text of the game is the numbers of one of the pieces of each color and the separator
of moves (from 1 to 11). The word of the chess text is also a kind of matrix
Ei,j
. The ﬁrst coordinate
i
is unique and
is the cell number on the chessboard (from 1 to 64). The second coordinate
j
is the number from the dictionary. The
chess matrix text at any moment of the game is the sum of matrices, each showing a piece on the corresponding place
on the chessboard. The repetitions in the text appear both because of duplication of pieces and because of constant
transitions during the game from similarity to sameness and vice versa for all pieces except the king. The game consists
in implementing the most eﬀective such transitions and the actual classiﬁcation of the pieces. Pawns that are identical in
the beginning then become similar only by the move rule, and sometimes a pawn becomes identical with a queen.
The tool of matrix text analysis is a transitivity control to check the diﬀerence between similarity and sameness. Lack
of transitivity control is an algebraic explication of misunderstanding for language texts, loss in chess, or errors in
numerical calculations.
Relational transitivity is a condition for turning a set of objects into a mathematical category. The semantic markup of a
text can become the computation of its categories by means of transitive closure. The category objects are the contexts
of matrix words, the morphisms are the transformation matrices of these contexts.
1.2.4. Example of a language text
Example text:
A set is an object that is a set of objects. A polynomial is a set of monomial objects,
which are a set of objects-sominators. (1.2.4.1)
You can write the text
(1.2.4.1)
n lemmatized (normal) form. It is not necessary, but it signiﬁcantly reduces the size of
the dictionary. In Russian, the following morphological forms are considered normal: nouns - nominative case, singular;
adjectives - nominative case, singular, masculine; verbs, participles, participles - the verb in the inﬁnitive (indeﬁnite
form) of the imperfect form.
35
Algebra of text – Pshenichnikov S. B.
Recording the text (1.2.4.1) in normal form:
Set is an object being a set object point polynomial is a set object
monomial is a set object multiplier point (1.2.4.2)
Text in normal form is coordinated according to the rule 1:
(set)1,1(this)2,2(object)3,3(being)4,4(set)5,1(object)6,3(«dot»)7,7
(polynomial)8,8(this)9,2(set)10,1(object)11,3(monom)12,12(being)13,4
(set)14,1(object)15,3(coinvariant)16,16(«dot»)17,7
(1.2.4.3)
In
(1.2.4.3)
the ﬁfth and sixth words in are already present in the text under numbers 1 and 3. Therefore the words
(. . .)5,5,(. . .)6,6are not in the text (1.2.4.3).
The text dictionary (1.2.4.3) – is the text itself (1.2.4.1), but without repetitions
(set)1,1(this)2,2(object)3,3(being)4,4. . . . . . («dot»)7,7
(polynomial)8,8.........(monom)12,12 .........(coinvariant)16,16 . . . (1.2.4.4)
Or after the removal of the spaces and dots
(set)1,1(this)2,2(object)3,3(being)4,4(«dot»)7,7
(polynomial)8,8(monom)12,12(coinvariant)16,16
(1.2.4.5)
The dictionary (1.2.4.5) has gaps in numbering. By rule 2 the dictionary then takes the form
(set)1,1(this)2,2(object)3,3(being)4,4(«dot»)5,5
(polynomial)6,6(monom)7,7(coinvariant)8,8
(1.2.4.6)
For such a compacted dictionary (1.2.4.6) the indexed text will change (1.2.4.3)
(set)1,1(this)2,2(object)3,3(being)4,4(set)5,1(object)6,3(«dot»)7,5
(polynomial)8,6(this)9,2(set)10,1(object)11,3(monom)12,7(being)13,4
(set)14,1(object)15,3(coinvariant)16,8(«dot»)17,5
(1.2.4.7)
Text coordination (1.2.4.2) – its indexing (1.2.4.3) or (1.2.4.6) and matching the indexed words with matrices Ei,j .
The words in parentheses in
(1.2.4.3)
and
(1.2.4.6)
by
(1.1.7.2)
can be matched with as yet undeﬁned coordinating
two-index matrix objects Ei,j.
For (1.2.4.3):
Table 1: Relationship word-coordinating object
Word Object Word Object Word Object
(set)1,1 E1,1(this)2,2 E2,2(object)3,3 E3,3
(being)4,4 E4,4(set)5,1 E5,1(object)6,3 E6,3
(«dot»)7,7 E7,7(polynomial)8,8 E8,8(this)9,2 E9,2
(set)10,1 E10,1(object)11,3 E11,3(monom)12,12 E12,12
(being)13,4 E13,4(set)14,1 E14,1(object)15,3 E15,3
(coinvariant)16,16 E16,16 («dot»)17,7 E17,7
For (1.2.4.7):
36
Algebra of text – Pshenichnikov S. B.
Table 2: Relationship word-coordinating object
Word Object Word Object Word Object
(set)1,1 E1,1(this)2,2 E2,2(object)3,3 E3,3
(being)4,4 E4,4(set)5,1 E5,1(object)6,3 E6,3
(«dot»)7,5 E7,5(polynomial)8,6 E8,6(this)9,2 E9,2
(set)10,1 E10,1(object)11,3 E11,3(monom)12,7 E12,7
(being)13,4 E13,4(set)14,1 E14,1(object)15,3 E15,3
(coinvariant)16,8 E16,8(«dot»)17,5 E17,5
Coordinated by matrices Ei,j the text for (1.2.4.3) in tabular form is
E1,1E2,2E3,3E4,4E5,1E6,3E7,7E8,8. . .
. . . E9,2E10,1E11,3E12,12 E13,4E14,1E15,3E16,16 E17,7
(1.2.4.8)
Text dictionary (1.2.4.8) with deleted repetitions:
E1,1E2,2E3,3E4,4E7,7E8,8E12,12 E16,16 (1.2.4.9)
Coordinated by matrices Ei,j the text for (1.2.4.7) from Table 2 is
E1,1E2,2E3,3E4,4E5,1E6,3E7,5E8,6E9,2. . .
. . . E10,1E11,3E12,7E13,4E14,1E15,3E16,8E17,7
(1.2.4.10)
Text dictionary (1.2.4.10) with continuous numbering without gaps:
E1,1E2,2E3,3E4,4E5,5E6,6E7,7E8,8(1.2.4.11)
1.2.5. An example of a mathematical text
As an example of a mathematical text, the formulas for the volume of a cone are chosen
VK
, cylinder
VC
and torus
VT
:
VK=1
3πR2
1H1, VC=πR2
2H2, VT=π2(R3+R4)r(1.2.5.1)
Formulas
(1.2.5.1)
are treated as texts. This means that the characters included in texts are not mathematical objects and
there are no algebraic operations for them. For example,
R2
1
R1R1
,
πR1
– not the product of two numbers, but simply
a sequence of two signs.
Signs in
(1.2.5.1)
:
R1
and
H1
– base radius and height of the cone,
R2
and
H2
– base radius and height of the cylinder,
R3– inner radius of the torus, R4– outer radius of the torus, r– radius of the torus generating circle, π– is π.
For the semiotic analysis of formulas as texts, it is important to have repetitions of signs. The repetitions determine
the patterns. In formulas
(1.2.5.1)
there are actually more sign repetitions than the speciﬁed sign repetitions
π
. Signs
R1
,
R2
,
R3
,
R4
,
H1
,
H2
and
r
– are lengths of segments. Then one of the signs, for example
r
, is prime (a length
reference), and the other signs are composite (composed of prime signs):
R1=ar
,
R2=br
,
R3=cr
,
R4=dr
,
H1=er,H2=f r. Then the right-hand sides of formulas (1.2.5.1) are rewritten
1
3πararer
πbrbrfr
ππ (c+d)rr
(1.2.5.2)
View (1.2.5.2) can be called the normal form of mathematical texts.
Then (1.2.5.1) is represented by the rule 1 in index form in uniform numbering, as if it were not three texts, but one:
37
Algebra of text – Pshenichnikov S. B.
(1/3)1,1(π)2,2(a)3,3(r)4,4(a)5,3(r)6,4(e)7,7(r)8,4
(π)9,2(b)10,10(r)11,4(b)12,10 (r)13,4(f)14,14(r)15,4
(π)16,2(π)17,2(c+d)18,18 (r)19,4(r)20,4
(1.2.5.3)
Dictionary for (1.2.5.3) by rule 2
(1/3)1,1(π)2,2(a)3,3(r)4,4(e)7,7(b)10,10(f)14,14 (c+d)18,18 (1.2.5.4)
Dictionary (1.2.5.4) in continuous numbering without gaps
(1/3)1,1(π)2,2(a)3,3(r)4,4(e)5,5(b)6,6(f)7,7(c+d)8,8(1.2.5.5)
Then for the dictionary (1.2.5.5) the indexed (coordinated) text (1.2.5.3) changes
(1/3)1,1(π)2,2(a)3,3(r)4,4(a)5,3(r)6,4(e)7,5(r)8,4
(π)9,2(b)10,6(r)11,4(b)12,6(r)13,4(f)14,7(r)15,4
(π)16,2(π)17,2(c+d)18,8(r)19,4(r)20,4
(1.2.5.6)
The coordinated text for (1.2.5.3), written through the matrices Ei,j , has the form
E1,1E2,2E3,3E4,4E5,3E6,4E7,7E8,4E9,2E10,10 . . .
. . . E11,4E12,10 E13,4E14,14 E15,4E16,2E17,2E18,18 E19,4E20,4
(1.2.5.7)
The dictionary of the matrix text (1.2.5.7) has the form
E1,1E2,2E3,3E4,4E7,7E10,10 E14,14 E18,18 (1.2.5.8)
The coordinated text for (1.2.5.6), written through the matrices Ei,j , has the form
E1,1E2,2E3,3E4,4E5,3E6,4E7,5E8,4E9,2E10,6. . .
. . . E11,4E12,6E13,4E14,7E15,4E16,2E17,2E18,8E19,4E20,4
(1.2.5.9)
The dictionary of the matrix text (1.2.5.9) has the form
E1,1E2,2E3,3E4,4E5,5E6,6E7,7E8,8(1.2.5.10)
All three texts
(1.2.5.2)
are independent of each other. Then the properties of the texts of this example allow us to apply
the coordination rule 3. The indexed texts of formulas
(1.2.5.2)
have the form immediately for the dense dictionary
(1.2.5.8)
(1/3)1,1(π)2,2(a)3,3(r)4,4(a)5,3(r)6,4(e)7,5(r)8,4
(π)1,2(b)2,6(r)3,4(b)4,6(r)5,4(f)6,7(r)7,4
(π)1,2(π)2,2(c+d)3,8(r)4,4(r)5,4
(1.2.5.11)
Each formula in
(1.2.5.11)
is numbered independently from other formulas and the ﬁrst sign of each formula has a ﬁrst
index of one.
The coordinated text for (1.2.5.11), written through matrices Ei,j , has the form
E1,1E2,2E3,3E4,4E5,3E6,4E7,5E8,4E1,2E2,6. . .
. . . E3,4E4,6E5,4E6,7E7,4E1,2E2,2E3,8E4,4E5,4
(1.2.5.12)
The matrix dictionary for (1.2.5.12) is the same as the dictionary (1.2.5.10) for (1.2.5.9).
38
Algebra of text – Pshenichnikov S. B.
1.2.6. Example Morse-Weil-Gerke code
This example is chosen because of the extreme brevity of the dictionary. In Morse code, the character sequences of 26
Latin letters can be considered as texts consisting of words - dots and dashes. The order of words (dots and dashes) is
extremely important in each individual text (alphabet letter). In linguistic texts, the order is also important ("mom’s
dad" is not "dad’s mom", but there are exceptions ("languid evening" and "languid evening").
The dictionary and carrier of Morse code is a sequence of two character-characters – ("dot" and "dash") that coincides
with the letter A. The order of the characters in the dictionary or the carrier is no longer important. Therefore, the
carrier may also be the letter N. One letter is the carrier (dictionary), the remaining 25 letters are code texts. Deﬁning
the 26 letters of Morse code as texts of words is unusual for linguistic texts. In linguistic texts, words are composed of
letters. But for codes, as relations of signs, making letters (cipher) out of words is natural.
Table 3: Morse code: Latin letters as sign sequences (texts)
Letter Text Letter Text Letter Text Letter Text
1 A · 8 H ···· 15 O – – – 22 V ···
2 B ··· 9 I · · 16 P ·– – ·23 W ·– –
3 C ··10 J ·– – – 17 Q – – · 24 X ··
4 D · · 11 K · 18 R ··25 Y ·– –
5 E ·12 L ·· · 19 S ··· 26 Z – – · ·
6 F · · ·13 M 20 T
7 G – – ·14 N ·21 U · ·
Each letter-text in Morse code has no more than four words (dots or dashes). The pattern of the letter looks like
Word 1 Word 2 Word 3 Word 4 (1.2.6.1)
Each codeword (of dots and dashes), as some object, has two coordinates. The ﬁrst coordinate is the number of the word
in this letter (from one to four). The second coordinate is the number in the dictionary (1 or 2). The dictionary is the
same for all 26 texts.
All the 26 texts (Latin letters) are independent of each other: the presence of dots or dashes in one text (as letters) and
their order have no eﬀect on the composition of the other text (another letter). Therefore the numbering of the ﬁrst
character in Morse code in all the letters begins with one according to the coordinating rule 3.
Each letter in Morse code is coordinated by matching each point or dash of which the letter consists with a coordinating
object, a matrix Ei,j aking into account their order according to the coordinatization rule 3.
For example, the letter 3 – is:
· −
Then the coordinate letter A – is:
E1,1E2,2
Here the ﬁrst indexes in
E1,1
and
E2,2
mean that the template
(1.2.6.1)
is ﬁlled only in the ﬁrst two cells and the ﬁrst
indexes are the numbers of these cells. The second indexes are numbers of words in the Morse code dictionary. The dot
is number one, the dash is number two.
The letter B is:
− · · ·
The coordinate letter B is
E1,2E2,1E3,1E4,1
n the general case, the Morse code table coordinated by objects Ei,j, has the form like in the Table 4.
39
Algebra of text – Pshenichnikov S. B.
Table 4: Morse code: letters as objects made up of Ei,j
Letter Code/Object Ei,j Letter Code/Object Ei,j
1 A · −
E1,1E2,214 N − ·
E1,2E2,1
2 B · · ·
E1,2E2,1E3,1E4,115 O −−−
E1,2E2,2E3,2
3 C − · − ·
E1,2E2,1E3,2E4,116 P · − − ·
E1,1E2,2E3,2E4,1
4 D − · ·
E1,2E2,1E3,117 Q − − ·
E1,2E2,2E3,1E4,2
5 E ·
E1,118 R · − ·
E1,1E2,2E3,1
6 F · · − ·
E1,1E2,1E3,2E4,119 S ···
E1,1E2,1E3,1
7 G − − ·
E1,2E2,2E3,120 T
E1,2
8 H ····
E1,1E2,1E3,1E4,121 U · · −
E1,1E2,1E3,2
9 I · ·
E1,1E2,122 V · · ·
E1,1E2,1E3,1E4,2
10 J · − − −
E1,1E2,2E3,2E4,223 W · − −
E1,1E2,2E3,2
11 K − · −
E1,2E2,1E3,224 X − · · −
E1,2E2,1E3,1E4,2
12 L · − · ·
E1,1E2,2E3,1E4,125 Y · − −
E1,2E2,1E3,2E4,2
13 M − −
E1,2E2,226 Z − − · ·
E1,2E2,2E3,1E4,1
1.3. Requirements for coordinate objects
Coordination for texts consists in matching text words with some "number-like objects"
Ei,j
, satisfying three general
requirements :
Objects Ei,j must be individual like numbers;
Objects Ei,j must be abstract (the volume of the concept is maximal, the content of the concept is minimal);
Algebraic operations (addition, multiplication, comparison) can be performed over objects Ei,j .
Appropriate for the text objects in algebra are two-index matrix units Ei,j:
They are individual - all matrix units are diﬀerent as matrices.
An arbitrary
n
-order matrix can be represented through a decomposition by matrix units. Matrix units are the
basis of the full matrix algebra  and the matrix ring . This means that the requirement of the maximum
volume of the concept is fulﬁlled. Matrices contain only one unit - the content is minimal.
All algebraic operations necessary for the coordinate object can be performed with matrices.
Thus, the matrix units fully satisfy the three necessary requirements to the objects of algebraization of texts. Further it
will be shown that it is possible to turn texts and any sign sequences into mathematical objects with the help of square
matrix units.
40
Algebra of text – Pshenichnikov S. B.
Matrix units
Ei,j
– are matrices in which one is at the intersection of
i
row and
j
column, the remaining matrix elements
are zero. For example, for square matrices of dimension 2
E1,2=
0 1
0 0
, E2,1=
0 0
1 0
(1.3.1)
In
E1,2
the unit, as the only nonzero element of the matrix, is on the intersection of the ﬁrst row of the second column.
41
Chapter 2
Matrix units
In this section the necessary algebraic systems for converting coordinated texts into matrix ones will be constructed and
investigated on the basis of matrix units (hyperbinary numbers). The matrix representation of texts allows to recognize
and create the meaning of texts with the help of mathematical methods.
2.1. Deﬁnition
Matrix units
Ei,j
– are matrices in which one is at the intersection of
i
row and
j
column, and the remaining matrix
elements are zero. In the following, only square matrix units are considered.
For example, all four square matrix units of dimension 2 have the form
E1,1=
1 0
0 0
, E1,2=
0 1
0 0
, E2,1=
0 0
1 0
, E2,2=
0 0
0 1
,(2.1.1)
The number of all square matrix units of dimension n (the full set) is
n2
. This is the total number of elements of a square
matrix.
In what follows, matrix units are treated as a matrix generalization of the integers 0 and 1. The main diﬀerence between
such hyperbinary numbers and integers is the noncommutativity of their product.
2.2. The product
2.2.1. The deﬁning relation
The general relation deﬁning the product of any quadratic matrix units of their complete set is of the form ( Free Rings)
Ei1,i2Ei3,i4=δi2,i3Ei1,i4,(2.2.1.1)
where
δi2,i3
– the Kronecker symbol equal to one if the indices
δi2,i3
are the same, or this character is zero if the
i2, i3
indices are diﬀerent.
The relation
(2.2.1.1)
allows us to obtain the result of multiplication of two matrices by matrix unit indices without
using the explicit form of matrices, e.g.
(2.1.1)
, and without using their matrix product. The product of matrices AB
consists of all products of vector rows of matrix A and vector columns of matrix B. Matrix calculations are considered
to be the most suitable for modern classical computers and matrix (tensor) processors, as well as for future quantum
computers, but with (2.2.1.1) the calculations can be greatly simpliﬁed.
Since matrix unit indices are coordinates of words in text, it is possible to perform algebraic operations because of
(2.2.1.1)
as symbolic calculations (known as computer algebra systems) without explicit representation of hyperbinary
numbers by matrices. A similar deﬁning relation for the sum of matrix units will be proposed later on.
It follows from
(2.2.1.1)
that the product of matrix units
Ei1,i2Ei3,i4
s diﬀerent from zero (zero matrix) only if the
internal indices of the product are equal
(i2=i3)
. Then the product will be a matrix unit with indices
i1
,
i4
. For
example,
E1,2E2,1=E1,1.
42
Algebra of text – Pshenichnikov S. B.
If i2̸=i3, then the product of Ei1,i2Ei3,i4is always equal to zero matrix. For example:
E1,2E1,2= 0
The matrix representation of hyperbinary numbers can, of course, also be used. Elements
Ei1,i2
and
Ei3,i4
in
(2.2.1.1)
are multiplied by the rule of square matrix multiplication. For example,
E1,1=E1,2E2,1=
0 1
0 0
0 0
1 0
, E2,2=E2,1E1,2=
0 0
1 0
0 1
0 0
,(2.2.1.2)
Matrices
E1,2
and
E2,1
can be called simple matrix units by analogy with simple integers, and the matrices
E1,1
and
E2,2– are composite matrix units since they are products of simple ones.
The complete set of matrix units
(2.1.1)
can be obtained from simple matrix units
E1,2
and
E2,1
, which form a complete
set.
For example, simple matrix units of dimension 3 have the form
E1,2=
010
000
000
, E2,1=
000
100
000
, E2,3=
000
001
000
, E3,2=
000
000
010
,(2.2.1.3)
The remaining ﬁve matrix units of the full set of dimension 3 are composite.
Matrix units are considered ex