A dataset on human capital in the former Soviet Union area; Sources, methods, and first results
ABSTRACT To date, the rise and fall of the (former) USSR has triggered a lot of research. Many have focused on the accumulation of physical capital, growth, and consumption. Recently, also the accumulation of human capital has increasingly been incorporated in this picture. However, few datasets exist that cover this crucial variable for this vast area. Therefore, our main objective is to introduce a new dataset that contains human capital related time series for the USSR (and the Newly Independent States (NIS) after its dissolution), constructed mostly on an annual basis. These data were drawn from various primary sources, available datasets and secondary literature where our focus was on constructing a dataset as clear, transparent and consistent as possible. It is our hope that, by supplying these data in electronic format, it will significantly advance quantitative economic history research on Russia and all over the former Soviet Union area (FSU) and will inspire further research in various new fields relating to intellectual production. The data presented in this paper follow after the discussion of the information value of the primary sources utilised, and the various problems that arose when linking and splicing the data from various sources. After constructing series of human capital indicators we perform a time-series and spatial analysis in order to identify the long-term trends of education penetration and of the human capital development in the FSU area with a strong emphasis on inequality issues between the NIS. Applying these results in a simple growth accounting framework provides us with some preliminary insights on the role of human capital in economic development in the FSU area.
C CG GE EH H W
Wo or rk ki in ng g P Pa ap pe er r S Se er ri ie es s
A dataset on human capital in the former Soviet Union area
Sources, methods, and first results
Dmitry Didenko, Vnesheconombank
Peter Foldvari, Debrecen University
Bas van Leeuwen, Utrecht University
Working paper no. 35
© 2012 by Authors. All rights reserved. Short sections of text, not to exceed two paragraphs, may be
quoted without explicit permission provided that full credit, including © notice, is given to the source.
A dataset on human capital in the former Soviet Union area
Sources, methods, and first results*
Bas van Leeuwen3
To date, the rise and fall of the (former) USSR has triggered a lot of research. Many have focused
on the accumulation of physical capital, growth, and consumption. Recently, also the accumulation
of human capital has increasingly been incorporated in this picture. However, few datasets exist that
cover this crucial variable for this vast area. Therefore, our main objective is to introduce a new
dataset that contains human capital related time series for the USSR (and the Newly Independent
States (NIS) after its dissolution), constructed mostly on an annual basis. These data were drawn
from various primary sources, available datasets and secondary literature where our focus was on
constructing a dataset as clear, transparent and consistent as possible. It is our hope that, by
supplying these data in electronic format, it will significantly advance quantitative economic history
research on Russia and all over the former Soviet Union area (FSU) and will inspire further research
in various new fields relating to intellectual production. The data presented in this paper follow after
the discussion of the information value of the primary sources utilised, and the various problems
that arose when linking and splicing the data from various sources. After constructing series of
human capital indicators we perform a time-series and spatial analysis in order to identify the long-
term trends of education penetration and of the human capital development in the FSU area with a
strong emphasis on inequality issues between the NIS. Applying these results in a simple growth
accounting framework provides us with some preliminary insights on the role of human capital in
economic development in the FSU area.
Keywords: human capital, education, book production, economic development, socialism,
JEL Codes: P23, P24, E24, N14, N15
Corresponding author: Dmitry Didenko, Didenko_D_V@veb.ru
Acknowledgements: The findings, interpretations, and conclusions are the authors’ own views
which may not be shared by the institutions of their affiliation. The authors acknowledge the
financial support from the Netherlands Organisation for Scientific Research (NWO) under the
1 Senior Analyst, State corporation ‘Bank for Development and Foreign Economic Affairs (Vnesheconombank)’
2 Associate Professor, University of Debrecen (Hungary), post-doc researcher, Utrecht University (the Netherlands).
3 Postdoc Researcher, Utrecht University (the Netherlands).
It is undisputed that human capital plays an important role in economic growth and human
development. It is seen as indicative of long run growth, reduction in corruption, participation in
decision making, etc (e.g. Lucas 1988; Romer 1990; Perotti 1996; Alesina and Perotti 1996).
However, especially for the former socialist countries, very little information on this variable is
available. Recently, some papers on long run development of human capital and growth have
appeared dealing with China and Eastern Europe (e.g. Foldvari and Van Leeuwen 2009; 2011; Van
Leeuwen and Foldvari 2011; Van Leeuwen, Van Leeuwen-Li, and Foldvari 2011), but research on
how it affects economic development in these countries is still in its infancy.
This is especially true for the former Soviet Union area (FSU)4 where the standard datasets
do hardly ever include human capital. For example, the dataset ‘Soviet Economic Statistical Series’
constructed by the Slavic Research Center at Hokkaido University, is primarily focused on external
trade while Easterly and Fisher (1995) do not include human capital as a monetary measure. Even
the big international datasets from Cohen and Soto (2007) and Morrisson and Murtin (2009) do not
include estimates for the USSR (although Morrisson and Murtin in their paper do make some
In Section 2 we develop a new and consistent dataset on human capital and related measures
for the USSR and the Newly Independent States after its dissolution. We constructed the data series
of various human capital indicators (both in natural- and monetary units), basically on an annual
basis stretching back in most cases to 1920s, and in some instances even to the 19th century
Russian Empire. To this dataset we added population (which is a crucial variable in many human
capital estimates) in age-cohort breakdown, as well as comparable macroeconomic indicators like
GDP, fixed (physical) capital stock, size of the general government expenditures, and the total wage
bill. These data were drawn from various primary and secondary sources (including available
datasets and literature) where our focus lay in constructing a dataset as clear, transparent, and
consist as possible. Section 3 discusses the construction of the human capital indicators as well as
their spread throughout the FSU area, while Section 4 deals with economic development and spatial
growth of human capital in the FSU comparing it with China. We end with a brief conclusion.
2. Primary and secondary sources, description, and data discussion
2.1 General description of the sources
The starting point in constructing the dataset consisted of the official statistics, available datasets
and the research literature based on them (Table 1). The official statistical data are easiest to reach.
Indeed, as pointed out in Davis and Wheatcroft (1994) as well as in other literature starting at least
from Gerschenkron (1947), the Soviet official series contain the information that at least was not
intentionally falsified in a straightforward way as the government statistical offices preferred either
to not to publish the unpleasant data or to adjust the methodology to let the resulting figures look
The basic official publication used for this study is the statistical yearbook “The national
economy of the USSR”. In addition, the USSR statistical office also published topical volumes like
“Labour”, “Construction of culture”, “Culture, education and science”, “Females and children”,
4 ‘The former Soviet Union’ (the FSU or ex-USSR) is the mostly common term used hereinafter for all time periods and
for all territorial coverage of both the Russian Empire, Soviet states after its fall, the USSR and the Newly Independent
States after its collapse. The terms ‘USSR’ or ‘Soviet Union’ are used for the period of 1922-1991 only when this state
existed within its actual borders. The term ‘Newly Independent States’ refers to multiple of existing states on the
territory of the former USSR, both to the period after its dissolution and to the period when they were the Soviet
republics, basically within their current borders. Russia refers to the territory basically within the borders of the
contemporary Russian Federation, in various periods.
Table 1: Basic human capital related indicators for the FSU area available in the dataset
Category Indicator Period Basic Sources and Literature
HSE IDEM (2011),
Mironov (1985, 1991, 1994, 2003)
Except the NIS
other than Russia
on distribution of
1-year cohorts of
population at age
on inputs. Except
the NIS other than
Russia for 1990-
Age heaping 1897-
Russian Empire Statistical Office
(Troinitskii, N.A., ed., 1905), SRSO,
HSE IDEM (2011),
Poliakov, ed. (1992, 1999, 2007)
Soviet and Russian Ministries of Finance
(NarKomFin, MinFin, Kaznacheistvo
SU–HSE (2005, 2007, 2010, 2010a),
UIS UNESCO (2011),
De Witt (1961), Noah (1966), Plotnikov
(1954), Subbotina (1965)
Except the NIS
other than Russia
SRSO, CIS Statistical Committee
Chapman (1963), Zaleski (1980)
For the entire
USSR and for
Soviet Statistical Office,
Krumin, ed. (1923, 1924)
Andreev et al. (1993, 1998), Gel'fand
(1992), Maddison (2010), Volkov (1930) Except the NIS
HSE IDEM (2011),
Poliakov, ed. (1992, 1999, 2007)
Becker (1969), Bergson (1961), Gregory
(1982), Easterly and Fischer (2001),
Harrison (1998), Maddison (2010),
Markevich and Harrison (2011),
Ponomarenko (2002), Steinberg (1990)
other than Russia
Size of the
Khanin (1991), Steinberg (1990)
For the entire
Category Indicator Period Basic Sources and Literature Notes
Easterly and Fischer (2001), Moorsteen
and Powell (1966)
Gross stock, until
ca. 1990 includes
the NIS other than
Russia for 1990-
World Bank (2011),
Bergson (1961), Moorsteen and Powell
(1966), Steinberg (1990)
World Bank (2011),
Becker (1969), Bergson (1961), Steinberg
World Bank (2011),
Chapman (1963), Gregory (1982)
* Soviet and Russian Statistical Offices – respectively of the USSR and Russia5.
since end 1950s normally once per decade. Besides these publications, the government financial
office (Ministry of Finance since 1946) published the national budget execution reports on a 5-
yearly basis since 1962 (providing annual historical data for the latest 5-year period and back to
1940 with 10- and 5-year intervals). Such publications had not been regular before. In the late
1980s they launched such reporting on an annual basis. Prior to mid-1930s the budget reporting was
ordered by ministry (as it was in the Imperial period) which is not comparable with the later
publications that preferred the functional (by topic) structure. The financial office also published
topical volumes on educational-, cultural services-, and research expenditures twice (in 1939 and
The population data were obtained from the published census data. There were
9 comparable censuses in the FSU: 1897, 1920, 1926, 1937, 1939, 1959, 1970, 1979 and 1989.
Almost all of their aggregate data were officially published some years after the respective censuses
except 1937 and 1939. However the questions varied from census to census and so did the depth of
coverage in age and regional breakdown. The population censuses covered the whole country
territory within its actual borders except the one in 1920 that included the civilian and military
population of the European part of Russia and the national regions controlled by autonomous
communist governments but did not cover even the entire Russian territory (i.e. it excluded most of
the Caucasus and Central Asia). Finally, we used some official volumes (e.g. “Labour in the USSR”
of 1975 and 1983 editions) which were not available to the scholars at the time of their publication
but have been disclosed after the Soviet Union collapsed.
2.2. Population size and literacy
Most of the censuses aggregated data are available in the electronic publication effected by the
Institute of Demography at the National Research University Higher School of Economics (HSE
IDEM). Besides the published data, they also include additional information from the archived
records. Therefore HSE IDEM (2011) provides more detailed information for some years than the
5 Tsentral’noe statisticheskoe upravlenie – TsSU (1918-1922 in Russia, 1923-1930 both in the USSR and Russia),
Tsentral’noe upravlenie narodnokhoziaistvennogo uchota – TsUNKhU (1931-1948 in the USSR), Upravlenie
narodnokhoziaistvennogo uchota RSFSR – UNKhU (1931-1948 in Russia), Tsentral’noe statisticheskoe upravlenie –
TsSU (1948-1987 both in the USSR and Russia), Gosudarstvennyi komitet po statistike – GKS (1987-1991 in the
USSR, 1988-2004 in Russia), Federal State Statistics Service – Rosstat (2004-present in Russia).
published census volumes do. This primarily relates to the data on age distribution of the male and
female population and its literacy that are generally available with 1-year breakdown in HSE
IDEM. That allows us to calculate the first indicator of literacy and numeracy: age heaping, which
requires single year observations. Innumeracy (age heaping) is measured as the excess of people
reporting their ages ending on multiples of -5 and -0 (i.e. 25, 30, 35 etc). This measure is then
converted into the ABCC index, proposed by A’Hearn et al. (2009), which captures the percentage
of persons correctly reporting their ages. Since, numeracy measurement is based on indirect
responses; it captures functional skills instead of formal ones. Moreover, it was less politically
sensitive topic and therefore probably less upward-biased than literacy.
Our second measure of education (i.e. literacy) is better recorded in the printed volumes.
These volumes use various age-group breakdowns but no less than 5-year cohorts as a rule and 1-
year for typical schooling ages. In order to check their reliability, we compared the data from HSE
IDEM (2011) after their aggregation with those from the published census volumes for selected
years (1897, 1926, 1937 and 1939 (the Russian Empire / USSR as a whole)). The discrepancies that
were found (mainly for 1937) do not seem to be significant. The data for the 1959, 1970, 1979 and
1989 censuses were assessed as even more reliable and, therefore, we decided not to make a check
for their consistency.
For both age heaping and literacy, all these census official publications contained rather
detailed information on the whole country except those of 1920, 1937 and 1939. The population
generally included both available and permanent residents. However detailed figures were
published until 1970 for the available population only. As regards the 1979 census we have both the
data on 5-year age-cohorts of the constant population from the official publication (GKS, 1989) and
more detailed data (1-year age-groups) of the available population from HSE IDEM (2011).
Therefore we chose to use the census data for the total FSU as regards the available population. The
difference is not so large when taking the USSR as a whole but is evidently more (though not
substantially) different for its constituent republics. However during 2002 census the Russian
statistical office switched to counting population by permanent residents only thus making the
historical age-cohort data not fully comparable with the last ones. The same principle was applied
during the 2010 Russian census.
For 1920 the data on total population were not comprehensive as the civil war was going on
and some territories were not controlled by the central government in Moscow. In addition, some
data were also lost when being stored and processed. Therefore, the final data on 1920 census
detailed by age cohorts are available only for 44 regions of the European part of Russia (43.3% of
the total population within the borders of the USSR in 1925-1939 as estimated by Andreev et al.,
1993). Therefore in order to arrive at the 1920 literacy level we used the assessments from either
later official publications or from the research literature. We did the same for the earlier period as
regards the European part of Russia, having taken the data from Mironov (1985, 1996, 2003) with
both time-series interpolation and retropolation of the data that were either missing or inconsistent
with the other estimates.
The 1937 census was found inappropriately conducted and was cancelled by the government
order soon after preliminary calculation of its outcomes as they appeared to be below the
government’s expectations. The census 1939 main outcomes were originally published in the
official media but the information was clearly insufficient. More or less detailed information on this
census was published only in 1990-2000s in academic volumes based on the sources extracted from
For the total population series with adjusted census data in respective years and in the years
between the censuses we used the data from Volkov (1930) and Andreev et al. (1993) that were at
year start and from Maddison (2010). The latter were taken as average of the figures for two
neighbour years as the original figures were estimations for mid-year.
As for the availability of literacy for the Republics (NIS), most of the data appeared in 1926.
For the NIS total population series we used the data from Maddison (2010) as averages of the
figures for two neighbour years (similar to the USSR). The resulting figures were close to those
from HSE IDEM (2011) which compiled their series from the national and supra-national statistical
offices (CIS Statistical Committee, Eurostat).The official data on literacy, of course, had their
intrinsic shortcomings. In all the FSU censuses literacy was defined as the ability to read at least
one language. Hence, writing skills were not taken into account at all. However in the 1926 census
instruction it was stated that an ability to write ones name was not enough to be considered as
literate person. It is quite possible, however, that by 1939 such ability was often sufficient for a
person to be counted as literate. In our opinion, conventional measurement of literacy based on
direct questions left much room for reading proficiency criteria to be eased, especially in adverse
environment of mass terror.
While the 1897 and 1926 data contain the best age distribution of literate population (1-year
age groups) the later data become much less detailed. The 1939 census was the last one with official
publication of literacy data for the total male and female population with breakdown between age
cohorts of 9-49 years and 50+ years. For 1959 the age distribution of the overall literate population
is available in HSE IDEM (2011). In none of the later census publications we have comprehensive
data. It was disclosed for the age cohorts of 9-49 years and for the total population of 15 years and
older in GKS RF (1992). We can arrive at the level of literacy for the total population of age 9 and
over if we assume that the percentage of literate in the age of 9-14 was the same as in the age of 9-
50 (that was already close to 100%). Our reconstruction of the literacy level for the age cohort of
age 50+ was based on exponential-function interpolation of it for all the ages (male and female
population separately). The results show that illiteracy had not been eradicated completely even by
the fall of the Soviet era: in 1989 almost 8% of the USSR female population of 50 and older could
neither read nor write at all.
2.3. Educational attainment
Our third educational variable (besides age heaping and literacy) concerns educational attainment.
We expressed educational attainment for the male, female and total population separately in
6 ISCED levels to which the national systems of the Russian Empire (less), the Soviet Union and
the NIS after its dissolution (more) generally fit.
A question on educational attainment was asked in 1897 census but was not included in the
1920 and 1926 questionnaires. In other censuses education-level grouping varied from census to
census. It was the most detailed in 1959 and the least detailed in 1937 and 1939. The data on
educational attainment for the NIS started to appear in 1939 but without any breakdown by age
groups (except Russia). The official publications started to disclose such kind of data since 1959
only. The 1937 census had the most detailed and comprehensive age structure, all later census were
more aggregated. In the case of the 1970 census we had to choose people aged 10-15 as the first age
group, which did not correspond with the first category in other censuses and with the other
categories of the same census ending on either 4 or 9. In the 1989 census no data were available for
the age category of 10-14 years while better education-level coverage was provided.
Following the previous cross-country datasets on educational attainment (Barro and Lee,
2010; Cohen and Soto, 2007) and the age structure of the FSU published data on censuses we chose
as our balanced solution to select 5-year intervals for our age groups starting with 10 years and
completing with 70+ years.
The duration of each level of education was attached to each broad based educational group
(complete lower secondary, incomplete and complete upper-secondary, incomplete and complete
vocational non-tertiary and tertiary for this example). For incomplete levels of education as reported
in the census, we assigned the average value of the nearby completed ones. However the
distribution of people between the smaller categories was not equal. E.g. for 1939 all the levels of
education from complete ISCED 2 to incomplete ISCED 5 are merged together with average
duration of 9.92 years of schooling. This is not too far from reality at first glance but after looking
at the structure of previous enrolment it becomes evident that this average duration is upward-
biased: most of the people in the merged category could belong to those who had completed ISCED
2 and had no more (with only 7 years of schooling).
Hence, without knowledge of the previous life-time enrolment structure and completion
rates for various age groups it was not possible to define the weights for the smaller categories.
Indeed, it is likely that the duration of primary and lower secondary education changed over time
though not significantly (±1 year). In most cases we assigned to each education level those
durations that were normatively prescribed as of the census date. This lead to a slight
overestimation in 1970 and 1979 when significant part of the population obtained their lower
secondary education at the time when its duration was 7 years (instead of 8 years later) while the
proportion of people who obtained only primary education under older rules (duration was reduced
from 4 to 3 years) was evidently less. In earlier years actual term of schooling level tended to be
shorter than normatively prescribed one. To take this into account we used the evidence from Allen
(2003) and Mironov (1991, 1994).
ISCED 4 and ISCED 5 data for 1939 and especially subsequent census years are definitely
upward-biased as the share of part-time (after-job evening and correspondence) study was growing.
Persons with correspondence education in ISCED 5 level were included starting from 1939. As it
follows from TsSU (1971, 1977), in 1960s part-timers reached almost a half of all the ISCED 5
education enrolment and up to 20% of the ISCED 4 enrolment. Though the period of
correspondence study was 0.5-1 years longer it evidently failed to compensate the lack of learning
time for part-time students relative to full-time ones. And since we do not include ISCED 6
graduates their little quantities relative to those of ISCED 5 do not compensate the upward bias
2.4. Educational enrolment
The Soviet-era official publications started their enrolment series since 1914/15 school year (that
could already be somewhat negatively affected by the WWI) as a base to the Soviet-era comparison
with one of the last years of the ‘old regime’. After the 6-year pause the data on all education levels
become available beginning with 1920/21, normally on an annual basis.
We included the data for the selected years of the 19th and early 20th centuries from
Johnson (1950) to better highlight the place of the Imperial and the Soviet periods in education
growth in the FSU. These data cannot be compared directly without taking into account decreases
in population resulting from the loss of territory and wars. The actual enrolment data used was
combined with the attainment data from the censuses to estimate educational attainment in the years
between censuses (see section 3). Therefore we express educational enrolment in ISCED levels
similar to educational attainment. As noted above, ISCED classification is generally comparable to
the USSR/NIS national ones. However, we had to pay attention to some special cases.
The first case was pre-tertiary education institutions that operated in 1922-1940 as ‘rabfaki’
(‘faculties for workers’ in direct translation) that generally provided evening classes. These
institutions served as an educational lift for working and socially active but low-educated people
allowing them to get eligibility for entering tertiary education institutions without taking full-time
secondary school course. Having taken into consideration all these features we assigned the ISCED
3 to these institutions.
Another special case were the various institutions of lower vocational education. The
composition of such institutions and their level of general education significantly varied over time.
Not all of them provided the trainees with general education of ISCED 3 level before 1970s like the
contemporary institutions of ‘PTU’ and ‘TU’ did. The latter gradually replaced other institutions of
lower level. In 1920-1940s many of lower vocational training institutions were basically the courses
of on-the-job training with added school hours for elementary theoretical knowledge. We assume
that the average level of general education for their graduates was ISCED 1 in 1920-1940s, ISCED
2 in 1950-1960s and ISCED 3 in 1970-1980s.
We should admit the very good quality of the series as regards post-secondary education
(ISCED 4 and 5 levels). Post-graduate education (ISCED 6) was restored in 1928-29 and has good
coverage despite a few breaks. The major problem in operating with the Soviet-era enrolment series
was their lack of comprehension as regards primary and secondary schools (ISCED 1-3 levels). The
official statistics provided the continuous series for ordinary types of various-level education
institutions but often omitted the series for special ones like general education of adults, schools
with specialised classes for children and schools for handicapped children. Happily though, there
are reliable series on total enrolment that can be compared with those for various levels to restore
and allocate the residual quantity. We use both incomplete series and those on total enrolment to
predict the complete ones on education levels for the USSR as a whole, predominantly for the pre-
WWII period (1920/21-1940/41 school years), but also for some subsequent ones (1946/47-
1955/56, 1962/63-1968/69, 1988/89-1990/91). It should be noted that the data availability was
worsening significantly when major school reforms were launched and their implementations were
The gender composition of students is presented much worse in the official publications
primarily due to later start of the coverage (1927/28 except ISCED 4 since 1921/22) and larger
intervals between the data points (10-15 years maximum as regards primary and secondary schools,
3-5 years for the higher levels). However, as approximate gender parity was achieved there by 1940
we found it possible to make interpolations for primary and secondary schools until 1970/71,
assuming actual data points as fluctuations around the 1940-1971 sideways trend and extrapolating
the latter for the rest of the period under coverage (1972-1989). Our approximations for post-
secondary non-tertiary and tertiary education are thought to have better fit to reality due to
availability of more intermediate data points. Our interpolations for the period of 1955-1984 for
post-graduate education (ISCED 6), that never experienced a gender parity, seem to be less in
precision, due to larger intervals between data points, but also close to reality in that sense that the
share of women, after its significant decline in post-WWII years, was gradually increasing since
2.5. Financial data on human capital expenditures
One way to valuate human capital is to estimate expenditure on education (creating a cost-based
measure of human capital). However, to do so we require estimates on government expenditure eon
education. Unfortunately, the data on education expenditures for the period prior 1917 are at most
indicative and not sufficient to analyse them with sophisticated methods. Most probably they are
not inclusive and have the potential to be revised upwards. We provided them with the purpose of
comparing their relative level to that of the subsequent period only to have a general idea of the
process. As for the Soviet period the financial data are more or less reliable only after the 1923/24
fiscal year (from October to September), being the first complete one after the new relatively firm
ruble was introduced and the USSR was established in December 1922.
The USSR National government consolidated budget (‘svodny biudzhet’ or ‘gosudarstvenny
biudzhet SSSR’) included all levels of the state finances: the USSR central governement (‘soiuzny
biudzhet’), the union republican governments (‘respublikanskii biudzhet’) as well as regional and
local governments (‘mestnyi biudzhet’). The government budgets of autonomous republics that
were in subordination to the union republics as well as of other similar administrative units were
considered as regional government budgets.
Educational institutions of ISCED 1-3 levels were generally financed from local budgets
with some co-financing from regional ones. The institutions of ISCED 4-6 levels were
predominantly financed from the Union republican and the USSR central government budgets
depending on their size and significance perceptions. However, most of the funds for ISCED 5-6
level institutions were supplied by the USSR central government budget.
For the USSR for the official financial reporting the most commonly used term was
‘prosveshchenie’ (‘enlightenment’ in direct translation). However, the meaning of this term
underwent changes over time. Besides education proper it included other items like cultural services
6 We tried as much as possible to take into account those changes in duration of various schooling levels that took
effect over time. However, the period prior 1930s could be subject to some revisions in this aspect.
and, in certain periods, it also included expenditures on research. In certain (1989 and probably
1990) years ‘enlightenment’ also included some items not covered by educational, cultural services
and research expenditures.
The expenditures for education proper consisted of two major groups: general education
(‘obshchee obrazovanie, vospitanie’) and vocational education (‘podgotovka kadrov’). The former
generally included kindergartens (ISCED 0), schools of various types for general education for both
children and adults (ISCED 1-3) as well as homes for orphan children, additional after-classes
services, certain types of courses for children moral upbringing; while the latter encompassed
vocational non-tertiary and tertiary education, and adult training. There was no division of the
general education financing between the levels (most often they were in the same school and the
same teachers could give classes to both ISCED 2 and 3 pupils). Such a classification was adopted
in 1930s but in the subsequent official publications some data were recalculated backwards to the
end of the 1920s.
We used primarily the data from public reporting of the USSR Financial Office.7 In
addition, we made use of the data from the USSR Statistical Office when they proved to be more
compatible with the other values or if the data from the former institution were missing. The
statistical office reported the expenditures with inclusion of nongovernmental institutional sources
more frequently. For the early years (1920-1940s) we often preferred to take the revised national
budget expenditures data from Plotnikov (1954) and Subbotina (1965) because of their later
reclassification of the expenditure categories. We compared the aggregate data from Plotnikov
(1954) for 1928/29-1954 with the alternative data from the Russian émigré scholar Kovankovski
(1956). The differences prior to 1945 could be explained by the subsequent reclassification of the
budget expenditures that Plotnikov, as an insider scholar, should have taken into account more
precisely. Evidently, he was not interested in any upward revision of the earlier data that could
make the subsequent growth to become more modest. Small differences for 1941-1945 arose
because of the rounding in Kovankovsky (1956).
Like in the case of enrolment, we assigned some special-case education institutions to the
recipients of the respective level of financing. These were various institutions of lower vocational
education (‘ISCED 1-3 vocational’ as a special sub-category) and certain institutions of pre-tertiary
academic education for adults (‘rabfaki’) in 1920-1930s (inside the subcategory ‘Other ISCED 1-6
The official expenditure figures included both current (for wages, scholarships and stipends,
books etc.) and capital (for construction and renovation, equipment purchase and repairs). The latter
accounted for about 8-10% of overall expenditures on educational, cultural services and
research.The official publications provided not only the government expenditures from the budget
but also from various institutional sources (that were basically under the government control). They
also captured the part of private expenditures that was union republican budget revenues as tuition
fees in upper secondary school grades, vocational non-tertiary and tertiary education. The fees size
assessment is based on MinFin (1957) with our assumption that Republican budgets received 90%
of the fees and the USSR central government received the remaining 10%. These fees were
introduced in 1940 and abolished in 1956. Very approximate estimations of the other private
expenditures were taken from Noah (1966) for selected years in 1950s and from Rogovin (1982) for
The educational financial data were much better represented for the USSR as a whole than
for its constituent republics. Therefore we used the former to estimate the latter when it was
necessary. Another approach was to estimate the share of a republic in total expenditures and then
converting it into absolute numbers. Logarithmic transformation was sometimes used to estimate
the data in periods of high inflation (end 1920s-1930s, 1990s). We made allowance for the border
changes in 1929 when Tajikistan split off from Uzbekistan and in 1936 when Kazakhstan and
Kyrgyzstan split off from Russia becoming republics of the USSR.
7 Narodnyi Komissariat Finansov SSSR – NarKomFin in 1930s, Ministerstvo Finansov SSSR – MinFin in 1950-1980s.
For the Soviet era we used our assumption for the allocation of the Union budget residual
(consolidated USSR budget minus the sum of all the republican budgets, effectively the USSR
central government budget) between the Republics. The size of consolidated budget of a particular
republic was chosen as a single criterion to define its weight among the other republics in
expenditures of the USSR central government.
The information on execution of the consolidated budget of the Russian Federation has been
provided by the Treasury since 2003 (Federal’noe Kaznacheistvo – Kaznacheistvo Rossii, 2011).
For earlier period (1995-2002) it is reported in various topical volumes of the State (currently
‘National’) Research University Higher School of Economics (SU–HSE). The latter institution also
provided assessment of institutional and private education expenditures in Russia since 1995.
2.6 Book production
Besides literacy, age heaping, enrolment and government finance, another indicator of education is
book production per annum. The two indicators of book production (number of titles, number of
copies) capture codified knowledge production (the former more, the latter less) and consumption
(the latter more, the former less) in terms of natural output. They may be considered as a reliable
proxy for human capital in the long-run before the ICT revolution (i.e. for the entire Soviet period
until 1990s). However they fail to capture the quantity of information and we have no data on text
volume in the books published for an extended period. The evidence provided in Mironov (2003)
suggests that the share of brochures was significantly higher in the FSU than in other countries.
Official publications and propaganda texts are also included into the Soviet-era book statistics while
in other countries they are normally omitted.
Another feature of the book production indicators is that they are sensitive to unfavourable
changes in macroeconomic environment that accompany wars and economic crises. These
indicators have a more rapid and more significant response to such shocks than enrolment and
Nevertheless, books may be considered a useful proxy. Hence, we included them in our
datasets within current country borders. This means that we made allowance for the border changes
in 1929 (Tajikistan split off from Uzbekistan) and in 1936 (Kazakhstan and Kyrgyzstan split off
from Russia), the same way as in educational finance. Similarly, our approach in interpolation or
retropolation of the data consisted in estimating the share of a republic in total expenditures and
then converting it into absolute numbers.
2.7. Labour market (employment and wages)
So far, we discussed literacy and age heaping, attainment, enrolment, and book production. These
data may be considered first order human capital indicators in the sense that, besides corrections for
statistical problems, they do not require further calculations. However, from these underlying data,
we may be able to calculate average years of education (based on population, attainment and
enrolment), and cost- and income-based human capital measures (based on expenditure on
education and wages respectively). Since we already discussed expenditure on education, here we
turn to wages.
The Soviet labour market was strictly regulated throughout the whole period beginning with
the early (1920s) until the late (1966-1991) Soviet era. The most severe restrictions were in effect
from 1940 to 1956 when not only farmers but also all other employees could not change their
employment without permission from the management. For collective farm employees such
restrictions were lifted in 1965. However, excluding the period of mass compulsory labour during
and some time before and after World War II, a typical Soviet worker (both blue- and white-collar)
had a relative freedom of choice as to what education to obtain and what occupation to choose.
Moreover, the available evidence suggests that many of the formal restrictions effectively were not
obstacles to a high degree of social mobility.
At the same time, in the centrally-planned Soviet economy wage proportions were defined
and set by the government. However, they were set to address the shortage or abundance of
particular skills and therefore affect their supply and demand. The government planners had to set
the qualification tariffs, industries’ and enterprises’ wage bill limits in such a way as to provide
greater or lesser incentives for present and prospective employees working in a particular field. In
certain periods a great deal of power was delegated by the central planners to the enterprise
management to define the remuneration of individual employees or groups of employees within
defined limits. Therefore it is possible to argue that wage distribution was a part of the Soviet
economy (almost totally regulated by the government) that experienced the outcome of market
forces, i.e. the supply of and demand for labour. Evidence of their feedback reactions (based on
cross-correlation analysis) is provided in Didenko (2006).
The most significant structural shortcoming of the available official statistics on wages
(including salaries) is their lack of an intra-industry dimension, so it is not possible to study wage
differentials on the level of employees’ occupations or educational attainment. The major exclusion
were the wages of the blue- and white-collar workers in industry, construction and agriculture. The
Soviet ruling elite considered the industrial sector as the key one in the national economy. That is
why the relation between the wages of blue- and white-collar industrial workers may be considered
as the core of the overall income distribution and, hence, as a reliable proxy for the trends of human
capital private returns. Hence, our assumption is that the visible and non-visible (i.e. not reflected in
official data) income relation was the same for the blue- and white-collar industrial workers in any
There were four major periods in methods of grouping the wages data in Soviet and Russian
official publications: 1913-1918, 1923-1938, 1940-2004, since 2005. We tried to splice the data for
these periods and make those adjustments, which we found necessary to make them as much
comparable as possible. However, the data relating to different periods should be compared taking
into account some imprecision that may arise of this. We believe that these discrepancies are not as
big as to cast doubts on the indicator trend directions.
The figures for the years 1913 - 1917 are taken from the early Soviet-era publications. These
data were based on the Industrial census in 1918 that covered a sample of 3043 enterprises that
operated all over the period of 1913-1918 and were located on the territory controlled by the
Communist (Bolshevik) government in Moscow. In that period not only official figures could be
published but some independent estimations as well. We believe that the independent calculations
in Krumin, ed. (1924) and quasi-official ones in TsBST (1924) were more reliable. Therefore we
corrected the 1918 census data for the respective coefficient for the year 1913, for which all the
sources have the wage data for blue- and white-collar employees.
The official series for the Soviet era start from 1922. It was the year when monetary wages
were replacing the predominance of in-kind remuneration of industrial employees: monetary share
in wages jumped from 25-30% to 77% of their total wage during 1922 (TsBST, 1924). The reason
was in that the monetary reform was developed and launched with the introduction of a new ruble
in December 1922. This relatively firm currency circulated in parallel with previous one (subject to
hyperinflation) until February 1924. From the political side the labour market gained support as the
USSR was officially established in December 1922 after the central government in Moscow gained
control over the territory within its borders (except some parts of the Central Asia).
The wage data prior to 1940 were based on sample surveys of enterprises, the large-scale
ones predominantly. Enterprises of either 30 employees or 16 employees with any engine-power
equipment were considered as large-scale in 1920s and 1930s. In 1928 they included about 72% of
the blue-collar labour force. Before the Bolshevik Revolution large-scale enterprises differed
substantially with the artisan industry (‘kustarnaia i remeslennaia promyshlennost’) in terms of
wages. It follows from TsSU (1924) that in 1913 the average annual wage of the artisan blue-collar
workers was 73.64 rubles versus 291.5 rubles in large-scale enterprises. This spread tightened
during the early Soviet times. As of TsSU (1929), average wage in small-scale enterprises was 33%
lower than in the large-scale ones in the period 1925/26-1928/29. And in the same period the large-
scale industry wages were on average 4.1% higher than in the entire industry. However we lack
more extensive data on the early Soviet small-scale industry. Nor do the sources provide data on
blue- and white-collar wages within the small-scale industry.
We did not calculate wages for the period of demonetarised and hyper-inflationary economy
before a relatively firm currency was introduced, since most of remuneration consisted of in-kind
payments then. However, when estimating white/blue-collar wage differential for 1918-1922 we
relied on retro- and extrapolations that were derived from the time-series of the monetary wages in
the periods that preceded and followed.
We also avoided using the monetary data on wages for 1941-1944 due to both its poor
statistics and the lack of economic sense. Although the monetary system appeared to be more stable
this time than during the Civil War, the major part of consumer goods was also sold for non-
monetary means of payment.
Our average wage figures include various types of monetary and in-kind remuneration of
employees. However they do not include the cost of subsidies for various social services consumed
by them. As such subsidies could not be normally substituted by employees’ choice (i.e. used as any
means of exchange) they cannot be considered as a marketable remuneration. Though we bear in
mind that an accessibility to such kind of services and subsidies affected employees’ preference for
a particular job place and position. It follows from TsSU (1983) the share of such subsidies from
the public welfare funds (‘obshchestvennye fondy potrebleniia’) in overall employees’ income
increased from 18% in 1940 to 28% in early 1980s.
The reported wages are not refined of compulsory and quasi-compulsory deductions (direct
taxes, allegedly voluntary cash contributions and the government quasi-bond subscriptions). This
issue was explored in Chapman (1963) only for selected years and the discrepancy between the
official reported figures and the actual employees’ incomes, which they could use by their own
choice, was not significant in 1928 and 1937 (3-4%), but was extremely high during the WWII
(40%), and was moderating thereafter (from 16% in 1948 to 12% in 1954). As the period (1928-
1954) explored in Chapman (1963) was harder than that of the 1960s-1980s, when the government
tended not to be so persistent in restricting personal consumption, we believe this discrepancy
diminished further over time. If we assume that these deductions were equally distributed between
the white- and blue-collar industrial workers then their wage differential should not be affected.
However, all the evidence says that this was not the case for the distribution between the farm and
non-farm employees. The former had lower wages and were stripped off sometimes below their
subsistence level so that higher potential for additional extraction remained in the non-farm sector.
Therefore an upward-bias (about half the size of the official/actual income differential pointed out
above) arises in our average wage in the national economy (including agricultural non-state
enterprises) for 1930s-1950s.
The official statistics provide much more frequent data on employment (the number of
workers) than on their wages. But employment data have their own weak points that resulted in
upward bias of the respective data on average wages in the national economy. This bias would
increase when going back in time and becomes especially significant in republics with large rural
sectors such as Kazakhstan.
In terms of employment the structure of data from official sources is definitely biased to
industrial manufacturing sector. Annual data on agricultural sector include only enterprises of the
state property (‘sovkhozy’, MTS, RTS) that in 1920-1960s constituted just a minor part of the rural
labour force. Service sector is poorly represented in early times when the substantial part of it was
private one (before 1930s). However, these data are consistent with the respective average wage
data while more comprehensive employment data from Harrison (1998) are not. Average wage data
are biased upwards (especially from 1930s to 1960s) as wages in agriculture were significantly
lower than in urban-based sectors of the economy. As the share of rural employment decreased over
time (like the share of non-state enterprises in agricultural employment did) the elder the data the
more they are downward-biased in employment and upward-biased in average wage.
Only from 1940 scarce official data appear on agricultural enterprises with collective
property (‘kolkhozy’) that constituted the major part of it until 1970s. So that direct calculation of
unbiased average wage becomes possible for selected years. But for the FSU republics except
Russia we have unbiased average wage data only from mid-1980s. To address this problem we used
a retropolation correcting for the change in urban/rural population ratio. This corrected average
wage series allow us to calculate our unbiased income-based human capital measure in Section 4.
Based on Steinberg (1990) we found out that the total wage bill coverage exceeded that of
the employment at least since 1965. Therefore the average wage in the national economy (including
agricultural non-state enterprises) for 1960s-1980s, calculated on the basis of the Soviet official
data, has an upward-bias diminishing from 12% in 1965 to 3% in 1985. This bias seems to be less
in the pre-WWII years as extremely low paid farm employment had substantially larger share in
overall employment than it did in 1960s-1980s.
Some data on blue- and white-collar workers were omitted in the above sources. We
interpolated them based on the total employment and average wage in the state-owned sector. In
some cases (mainly for 1920s) we used time-series retropolation. The data for the last year of the
USSR was predicted using the StatKom SNG (1992) data. They were compiled after the USSR
dissolution (December 1991) and were for the CIS countries only, i.e. the USSR republics
excluding Georgia, Latvia, Lithuania, Estonia. In 1990 the CIS countries accounted for more than
95% of the USSR population. The StatKom SNG (1992) figures were more comprehensive as they
included all the sectors of the economy while the other official figures did not.
2.8. National accounts (GDP, fixed capital) and their price indices
Obviously, any analysis of human capital is severely limited if we cannot calculate its relationship
with per capita income. Therefore, we also include series of per capita national income in the
USSR. However, initially, the structure of the national income of the former USSR was quite
different from that in most Western economies. The epistemological fundamental for national
income calculations under Soviet-type socialism was the belief that no new value added may be
created outside sectors of material production. Those industries which produced intangibles (i.e.
knowledge producing) were classified as of intermediate consumption and non-productive.
Therefore the Soviet official Net Material Product (NMP) figures omitted most of services until
mid-1980s. For the period of 1985-1990 the USSR GNP data (in established prices) were calculated
by the late Soviet statistical office (Goskomstat) with the IMF and the World Bank assistance and
were published in IMF, WB, OECD, EBRD (1991). With application of the same methodology the
data for the USSR GNP for 1965-1984 were calculated in Steinberg (1990).
For the USSR period we constructed the series of NMP (based primarily on official figures
for material production sector), GNP (based on the research literature for the overall economy) in
established current prices, its deflators and gross fixed capital series (also in current prices) based
on the available data. The figures applied to the territory within actual USSR borders. For the
WWII period the USSR territory as of 1940 temporarily occupied by the Nazi troops was also
We checked various estimation of the USSR GNP in current prices taken from the previous
literature by the monetary indicators that were originally expressed in current prices: total wage bill
and the national budget total expenditures. The same procedure was applied to our estimations. We
chose to link together those series that had generally the same concepts and close values in
neighbour time points. We also used the series of both NMP and GDP in current prices for their
For the period of 1885-1913 we used the data from Gregory (1982) on Net National Product
(calculated by final use) to define GDP nominal growth rate relative to 1913 for which we had both
the NNP from Gregory (1982) and the GDP from Markevich and Harrison (2011). Their data were
also used for the period of 1913-1928 after conversion from constant 1913 to current prices
employing our preferred deflators for that period. It follows from Markevich and Harrison (2011)
that their Net (or Real) National Income calculated by sector of origin (precisely by net value-added
that was refined from intermediary consumption but was the source for gross investment) was
assumed to be equal to Gross Domestic Product as in the framework of the UN System of National
Accounts. We assumed that it was approximately equal to the GNP bearing in mind it was closer to
reality after 1917 when capital flows to and from abroad were strongly limited.
For later periods we took basically the current price data points and series from Bergson
(1961) for 1928-1955, Becker (1969) for 1958-1964 and Steinberg (1990) for 1965-1990 of his own
calculation (not replicating the late Soviet official methodology).
The series of the USSR GNP from Harrison (1998), Easterly and Fischer (2001) were
expressed in constant prices but seem to be quite artificial in monetary terms. E.g. they spliced the
data from the literature expressed in various denominations (pre-1961 and 1961) without making
conversion. However their data exposed short-term trends of the GNP dynamics and we used them
for interpolation of the appropriate data in current prices.
Our gross investment series were basically taken from the same sources as GNP except
Bergson (1961) because we preferred the series from Moorsteen and Powell (1966) for 1928-1957
as more complete and with application of more strict methodology. For 1958-1961 we used
averages of the estimates from Moorsteen and Powell (1966) and Becker (1969). Gross fixed
investment values did not include those in livestock, inventories but did include those in residential
housing and capital repairs in construction and installation services. In contrast to many countries
much of the residential housing was on corporate balance sheets, especially in 1960-1990s.
Our gross fixed capital stock estimation (in current prices) is based on gross fixed capital to
GNP (at factor cost) ratio derived from Easterly and Fischer (2001) assuming that this relationship
is correct for particular year regardless of its monetary expression. Easterly and Fischer (2001) used
the series based on Western estimates that were generally the same that we used for our GNP and
gross investment values for the period prior to 1956 at least. Our calculation of implied retirement
rate showed that either a) the fixed capital series from Easterly and Fischer (2001) made very
modest allowance for its retirement or b) the gross investment values from our sources were
underestimated or c) our GNP deflator was less than that of fixed capital stock. The latter seems
quite improbable as the Soviet-era prices for consumer products generally outperformed those for
investment goods. The principal difference in the FSU economic growth rates assessments arises
from application of different measurements of inflation, both the indicators and their size. Therefore
finding an appropriate price index to evaluate the FSU human capital in monetary units is rather a
complicated but very important issue. Our preferable inflation indicator was GNP deflator as it is
the most comprehensive price index that covers an entire economy. It includes not only consumer
goods and services but also government consumption and capital assets. However, we used
consumer price indices as a cross-check where it was possible.
For 1885-1913 our GNP deflator was taken as an average of the 2 indices derived from
Gregory (1982) who applied both Podtiagin wholesale price index of Russian regional markets and
the combined retail price index of Russian capital cities St.-Petersburg and Moscow. The former
was a price basket of 66 commodities, the latter was based on 38 commodities excluding housing
For 1913-1928 the General retail weighted-average price index (‘Biudzhetnyi indeks
TsBST, obshchetovarnyi srednevzveshennyi’) was chosen as our preferred one. It was constructed
after the Bolshevik Revolution by the Central Bureau for Labour Statistics (TsBST) that was the
joint body of the Soviet official Labour union organisation (VTsSPS), the official Central Statistical
Board (TsSU) and the Government labour office (NKT). This index had the longest and the most
detailed record among the other price indices published by the official statistical office. The other
retail price indices had values close to it.
For the period of 1928-1955 we constructed our Chain Deflator Index (hereinafter referred
to as CDI) as neither of the available price indices for this period provided any satisfactory tool for
us to capture the structural changes in the Soviet economy in an optimal way.
First of all, application of any price indices to a centrally-planned economy requires some
aspects to be taken into account. One of the basic features of such an economy is the government’s