Uncategorized

Years of abundance and years of famine: The dark story of oil

Seven years of great abundance are coming throughout the land of Egypt, but seven years of famine will follow them. (Genesis 41:29-30)

The book of Genesis provides a good description of the ‘second dimension’ that I have been using to analyze words in the New Year addresses. This dimension distinguishes between years of abundance and years of famine – in other words, it separates the economically robust years and the economically challenging years. In Figure 1 (which repeats Figure 1 from the previous post), the second dimension is the vertical dimension of the map. The years shown nearer the top are years of abundance and the years shown nearer the bottom are years of famine. Note that the Soviet years to the left are mostly grouped together and located in the middle of the figure. The Perestroika years in the center are mostly located closer to the bottom than the Soviet years, because those years of reformation are related to economic difficulties. The post-Soviet years on the right-hand side of the graph are more spread out than the other two clusters. Several years in this group, i.e., 1991, 1996, and 1999, are outliers at the bottom of the map. These three years are years of significant overturn in Russian history, both politically and economically speaking.

dim12YearsColorFigure 1. New Year addresses by year: Soviet period (red), Perestroika (purple), and post-Soviet period (blue).

Year 1991 is the year of the dissolution of the Soviet Union, which occurred due to, among other reasons, economic crisis. Year 1996 was a year of presidential elections, and at that time, many workers had not been paid for months. One of the largest miners’ strikes took place in 1996 because miners were owed $200 million in unpaid wages.1 Year 1999 immediately followed the 1998 economic crisis and was a year that witnessed apartment bombings in Moscow, the start of the Second Chechen War, and was the year Yeltsin resigned as President. The years at the top of the map are years of relative economic prosperity.

The prosperity and economic stability of Russia are strongly dependent on oil, which is Russia’s most important export commodity. Dimension 2 is correlated with the price of a barrel of oil2  (r = 0.42, P-value = 0.004). This dependency is not surprising at all. Most economic crises and associated non-economic crises in the USSR and Russia were related to the fall in oil prices. First, in 1979, the USSR’s economy already was struggling to pay for the effects of its planned economy. This crisis is summarized in the 1979 report from the Assistant to the Chairman of the Council of Ministers of the USSR, N. Kirillin, but was ignored at the time since the high oil prices provided an easily available economic resource.3 The Soviet Union’s ‘business’ scheme at the time was for the country to extract and sell oil. Fuel energy amounted to 16 percent of the export in 1970, but was already at 54 percent in 1985. The money received from the sale of oil was used to buy food: grain imports were 2,2 million tonnes in 1970, but reached 45,5 million tonnes in 1985, and meat imports were 165 thousand tonnes in 1970, but increased to 857 thousand tonnes in 1985.4

oil-photo

Oil prices in general are known to reflect important historical events, especially those that occur in regions that are highly dependent on oil production. For example, oil prices peaked during the Yom Kippur War (1973) and the Gulf War (1990-1991). Oil prices also increased during the Iranian Revolution (1979) and at the beginning of the Iran/Iraq War (1980). However, these increases were followed by the most significant decrease in oil prices in the second half of the 20th century. The adjusted price of a barrel of oil fell from $115.62 in April 1980 to $22.33 in March 1986. When the price of oil fell, inevitable changes in the ways the USSR’s economy functioned started to take place, and Perestroika began.

The economy of modern Russia depends on oil and gas even more strongly than it did in the Soviet Union. Now, oil and natural gas sales account for 70 percent of Russia’s exports,5 so it is not surprising that every significant change in oil prices has an effect on the Russian economy. Note the years 2001 and 2009 at the bottom of the blue cloud of years presented in Figure 1. Both years are associated with significant decreases in oil prices.

dim12WordsFigure 2. Words used in the New Year addresses: Dimensions 1 and 2.

In Figure 2 (which repeats Figure 2 from the previous post), we can see words that are attracted to and repulsed from Dimension 2. At the bottom we see words that occur more frequently in years of famine. The most prominent is the pronoun ja ‘I’ in the right bottom corner of the map. In addition to ja ‘I’, we see segodnja ‘today’, put’ ‘way, track’ and delo ‘work’. In times of crisis, leaders often talk about circumstances that are behind us or in front of us, and about the work ahead that already has begun or must be done in the future. See two characteristic quotes from a speech by Yeltsin delivered in 1991:

(1) Не вина России, что ее столкнули с этого пути, превратили в испытательный полигон коммунизма. Сегодня мы избавились от этого наваждения.

‘It is not Russia’s fault that it was pushed off this track, and was turned into testing grounds for communism. Today we are free from this nightmare.’

(2) Мы вместе решились на перемены, вместе начали дело, которое, может быть, станет главным для России XX века.

‘We together decided to start changes; we together started the work that maybe will become the most important for the Russia in the twentieth century.’

At the top of the map we see words such as pust’ ‘let’, rebenok ‘child’, novogodnij ‘New Year’s’, drug ‘friend’, vmeste ‘together’, prazdnik ‘holiday’, sem’ja ‘family’, which together create a picture of a large family enjoying a New Year’s Eve meal. During economically robust years, leaders of Russia are inclined to talk about family values. A quote from Medvedev’s 2010 address in (3) and a quote from Putin’s 2012 address in (4) illustrate this tendency.

(3) И всё, что мы делаем, мы делаем для наших детей – для того, чтобы они были здоровы, чтобы у них в жизни всё получалось, чтобы они жили в безопасной, благополучной и счастливой стране

‘Everything that we do, we do for our children – so that they will be healthy, so that they will succeed in life, so that they will live in a safe, successful, happy country…’

(4) В эти минуты мы особенно остро чувствуем, как летит время, как быстро растут наши дети, как дорожим мы своими родными и близкими, как любим их.

‘In these moments we especially feel how the time flies, how fast our children grow, how much we appreciate our friends and family, how we love them.’

Thus, we see that the 100 most frequent words in the New Year addresses can capture differences in rhetoric that are associated with years of hardship and years of abundance. Also, we see that the second dimension correlates with the adjusted price of a barrel of oil, because oil is Russia’ most important export commodity and its price greatly influences Russia’s economy. The next post will be about the dimension that is correlated with democratic freedom.

[1] Alessandra Stanley. Russian Miners Strike, Defying Yeltsin. The New York Times. February 2, 1996. http://www.nytimes.com/1996/02/02/world/russian-miners-strike-defying-yeltsin.html

[2] Adjusted for inflation.

[3] See Gajdar, Egor. 2005. Dolgoe vermja. Moscow: Delo, p. 337.

[4] Ibid. p. 340.

[5] Oil and natural gas sales accounted for 68% of Russia’s total export revenues in 2013. U.S. Energy Information Administration. July 23, 2014. http://www.eia.gov/todayinenergy/detail.cfm?id=17231. Accessed at May 27, 2016; Will Russia Survive the Oil & Gas Downturn? Oil&Gas 360. October 15, 2015. http://www.oilandgas360.com/will-russia-survive-the-oil-gas-downturn/. Accessed at May 27, 2016.

About Julia Kuznetsova

Standard
Uncategorized

Dimension 1: Political era

In this post I explore the first dimension within the many-dimensional space that is produced via the 100 most frequent words found in the New Year addresses . This first dimension divides all the New Year addresses into three groups: all the addresses presented before the year 1985, the addresses presented in the years between 1985 and 1990, and all the addresses given after 1990. These groups can be seen clearly in Figure 1.

dim12YearsColor

Figure 1. New Year addresses mapped by year: Soviet period (red), Perestroika (purple), and post-Soviet period (blue).

These clusters are easy to interpret. The years in blue to the right are the years after the fall of the Soviet Union, i.e., the years when the addresses were given by the presidents of Russia and addressed to the citizens of Russia. The years in purple and red to the left are the years of the Soviet Union; more specifically, the years in purple are the years of Perestroika, with all its new lexicon of change.

One might think that these clusters were formed simply according to chronological year, with the earlier years to the left, the middle years in the center, and the more recent years to the right. However, I can show that the political system, and not the chronological year, is actually a better predictor for these clusters. In order to do so, I use two linear regressions. The first predicts the first dimension coordinate using the year of the address as a predictor, and the second uses the political era – Soviet, Perestroika, or post-Soviet – as a predictor. Both regressions are helpful in predicting the first dimension coordinates. However, the two models are significantly different and can be compared based on how well they predict the outcome. The first one explains 75% of the outcome, whereas the second one explains 96% of the outcome (according to Adjusted R-squared). Thus, the political system is definitely a better predictor of the clusters that we see in Figure 1 than the chronological year.

Why does the political era have an impact on the most frequent words used in a New Year address? The answer is easy to understand if we look at Figure 2, which contains words distributed on the same map as shown in Figure 1. To the right, we can see words that have a negative correlation with Dimension 1. These are words that indicate addresses given by the presidents of Russia. President ‘president’, Rossija ‘Russia’, and graždanin ‘citizen’, which is  used to address Russian listeners, instead of the Soviet address, tovarišč ‘comrade’. These are words that never appeared in the Soviet New Year discourse and are clear indicators of post-Soviet times.

dim12Words

Figure 2. Words used in the New Year addresses: Dimensions 1 and 2.

In the middle of Figure 2, at the same location where we see the Perestroika years on Figure 1, we see words such as čelovek ‘human’, mir ‘peace’, and put’ ‘way’ that clearly are associated with disarmament and humanization – new ideas set up during Perestroika.

On the left side of Figure 2 is an illegible cloud of words, which is shown in detail in Figure 3. Here we see words that could have appeared only in the New Year addresses during the Soviet era: tovarišč ‘comrade’ – the Soviet way to address listeners; rabočij ‘worker’ – an important social class during socialist years; words pertaining to the political system, such as socialism ‘socialism’, socialističeskij socialist’, kommunističeskij ‘communist’, leninskij ‘Lenin’s’; and words associated with Soviet institutions, such as central’nyj ‘central’, verxovnyj ‘supreme’, sovet ‘soviet’, KPSS ‘CPSU (Communist Party of the Soviet Union), SSSR ‘USSR’, partija ‘party’, and komitet ‘committee’.

dim12WordsSoviet

Figure 3. Words associated with Soviet years: Dimensions 1 and 2.

It is interesting to pay attention also to the use of personal pronouns that appear among the most frequent words. Figure 4 is the same map as that shown in Figure 2, but now only a few words have been selected. Personal pronouns such as ja ‘I’, my ‘we’, and vy ‘you (plural)’ are highlighted in blue. Interestingly, all these pronouns are gathered on the right side of the map, which is the side associated with the post-Soviet years. By contrast, the Soviet years contain only one word that is compatible with personal pronouns in meaning – narod ‘people’, highlighted in red. This grouping of pronouns indicates a change from more collective thinking, which is characteristic of the Soviet era, to more personal interactions, which is characteristic of the post-Soviet era.

dim12PronounsRus

Figure 4. Personal pronouns and the word  narod ‘people’: Dimensions 1 and 2.

Thus, we can easily deduce the political era in which a New Year address is given by using only the 100 most frequent words. Political era is the most important dimension in the matrix, and it explains most of the variation found within the 100 most frequent words. My next post will describe the second dimension, which is related to the economic situation.

About Julia Kuznetsova

Standard
Uncategorized

Tea, Lemonade, and Correspondence Analysis

This post is about the method that I use to analyze the New Year addresses. In short, I extract one hundred words that are the most frequent among all of the addresses and then apply correspondence analysis (CA) to their frequencies. People who use CA and similar methods typically don’t discuss in detail how such methods actually work and simply proceed directly to the pretty pictures that those methods produce. However, I believe that every result is more meaningful if we can envision how it was obtained, so I will try to explain how CA works using a simple imaginary example that is straightforward and easy to grasp intuitively.

However, first I tried to come up with a sophisticated story about why a person might need to apply CA. But my ideas were too complicated and unrealistic. One of them, for example, involved a company that suddenly suggested that it would pay for employees’ beverages during the workdays of one week in the summer. An employee found several receipts from various weeks and two frequent buyer cards in her pocket. I tried to use all that information to find out which of these drinks were purchased on workdays during the summer. After concocting this scenario, I decided that it was easier not to try to come up with a realistic backstory and just present a simple illustrative case that’s easy to understand and describes how the method works. Please keep in mind that, even though this case is so straightforward that no one would actually need to use statistical analysis here, it would become much more complicated when more variables are involved, as happens in the case of New Year addresses.

IMG_5600-001

Imagine that Alice found her receipts from Starbucks from a week in January and a week in July. Alice had purchased one drink on each of those days. In the winter it was cold, so she was more interested in hot drinks, whereas in the summer it was hot, and she bought iced coffee and lemonade. Throughout the year, she drank coffee on weekdays on her way to work to jump-start her workday, but at weekends she felt that she did not need to concentrate that much, so she drank tea or lemonade instead of coffee. Table 1 presents the list of the drinks that Alice bought during those two weeks.

Table 1. Drinks that Alice purchased on different days.

day day of the week month drink
1 Mon January coffee
2 Tue January coffee
3 Wed January coffee
4 Thu January coffee
5 Fri January coffee
6 Sat January hot tea
7 Sun January hot tea
8 Mon July iced coffee
9 Tue July iced coffee
10 Wed July iced coffee
11 Thu July iced coffee
12 Fri July iced coffee
13 Sat July iced lemonade
14 Sun July iced lemonade

If Alice analyzed her receipts based only on the drinks she had purchased, she would be able to distinguish the receipts from the summer and the receipts from the winter. She would also be able to distinguish the workday receipts from the weekend receipts. Of course, this example is easy to understand so far, but remember that this example will also illustrate how CA works.

First, let’s transform the data in the following way. A new table (Table 2) contains a row for each of the fourteen days for which Alice has receipts. All the words that appear in those receipts – coffee, tea, hot, iced, lemonade – are headings for the columns. Each cell of the table contains a ‘1’ if the word at the top of the column appears in the receipts on that day and ‘0’ if it does not.

We can see that the word coffee can be used to distinguish the workdays and weekends; i.e., every day that contains the word coffee is a workday. The word iced is helpful in determining the month of the receipt; each day that contains the word iced is in July, whereas every day that does not is in January. However, other words are less useful in determining the day of the week and the month. The words tea, hot, and lemonade appear only on weekends; however, none of them appears on all the weekend days in the dataset. Moreover, hot and tea appear exactly on the same days, so if we add information to the model that is contributed by the word tea, then the word hot cannot add any new information to the model.

Table 2. Words that appear in the receipts on different days.

day coffee tea lemonade hot iced
1 1 0 0 0 0
2 1 0 0 0 0
3 1 0 0 0 0
4 1 0 0 0 0
5 1 0 0 0 0
6 0 1 0 1 0
7 0 1 0 1 0
8 1 0 0 0 1
9 1 0 0 0 1
10 1 0 0 0 1
11 1 0 0 0 1
12 1 0 0 0 1
13 0 0 1 0 1
14 0 0 1 0 1

Now let’s input Alice’s data to CA. CA reads Table 2 and comprehends it as five points in a 14-dimensional space. For example, coffee became a point with the following fourteen coordinates: (1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0). Then, CA finds a line to which all five points in the 14-dimensional space are close together. Imagine that if a tube encompassed this line, most or all of the points would be inside the tube. CA proposes that this line is the first new dimension. Then it finds the second line so that the sum of the distances from all the points to the second line and the first line is minimal. CA proceeds similarly until it reaches the number of dimensions in the space, in our case fourteen.

Several first dimensions are the ones that best describe what is happening in the data. They point to the factors that influence the distribution most significantly. In the case study of the receipts, these factors are the days of the week and the seasons. Figure 1 shows the distribution of the fourteen days on the map for the first two dimensions. Days 1 through 5 are grouped together. These are the days when Alice drank non-iced coffee. Days 8 through 12 are also grouped together. These are the days when Alice drank iced coffee. These two groups are located far away from Days 6 and 7 to the right and Days 13 and 15 in the top left corner. On Days 6 and 7, Alice drank hot tea. On Days 13 and 14, Alice drank iced lemonade.figure1Figure 1. Groups of days according to drinks purchased on that day.

CA thus provides us with more information than could be gleaned from just counting the number of words. From Figure 1 we immediately see that the words hot and tea contribute exactly the same information. We can also observe that lemonade, hot, and tea provide similar information and are located significantly away from coffee. If we were to investigate the information that is contributed only by the words, column by column, we would have to draw these conclusions by ourselves, but CA does this work for us. This capability becomes extremely advantageous when we move from a simple case with a 5 x 14 matrix to a much more complicated case with a larger matrix.

We can map our major factors – days of the week and seasons – as shown in Figure 2. The figure is divided by two lines: purple and green. The purple line separates the seasons; the winter days are below the purple line and the summer days are above the purple line. For this factor, the words iced and lemonade strongly increase the probability of the summer season, whereas the words hot and tea decrease this probability. At the same time, the word coffee appears with the same probability in both the summer receipts and winter receipts, so coffee is not a good predictor for the season factor. The green line separates the workdays, located below the green line, from the weekend days, located above the green line. Here, the word coffee is a strong predictor of the workday, whereas lemonade, hot, and tea strongly predict a weekend day. Here, the word iced is not helpful, because it occurs both on workdays and during the weekend. We can also see that the words tea and hot always appear together, and if we take the information that is provided by the word tea, then the word hot does not add anything to our model. Therefore, one of these words, for example, hot, can be excluded from the analysis.figure2Figure 2. Major factors in a day’s distribution.

In the case of the 100 most frequent words found in the New Year addresses we will have a similar matrix, but a much larger dataset. There will be 100 words instead of 5, and 46 years instead of 14 days. Nonetheless, the underlying idea is similar. I analyzed the matrix using words as the headings of the rows and years as the headings of the columns. Each cell of the matrix contained the frequency of the word in the New Year address given that year, measured in items per million (ipm). Items per million is traditionally used in corpus linguistics as a measure of frequency that does not depend on the size of the document. For example, the word bol’šoj ‘big’ appears three times in the New Year address delivered in 1970. The length of the New Year address that year was 605 words. Therefore, in the cell at the intersection of the row for the word bol’šoj ‘big’ and the column for 1970, we have 4958.68 = (3/605) * 1,000,000. Measuring the frequency of words in ipm is necessary because the lengths of the addresses vary notably – from 1445 words in 1993 to 194 words in 2012.

Unfortunately, unlike in the simplified case of Alice’s Starbucks receipts where the main factors, seasons and workdays versus weekends, are already known to us, in the case of the New Year addresses, we do not know the main factors that contribute to the distribution. In the following posts I will investigate the important dimensions for the New Year addresses and interpret those dimensions. For each dimension, I will show what it correlates with in the real world. I will start with the most important dimension – the political system – in the next post.

About Julia Kuznetsova

Standard
Data collection

Gathering leaves

New Year 1971 in Moscow on post stamp

USSR – CIRCA 1970: stamp printed in USSR shows New Year symbols, devoted to the New Year 1971

Have you ever wondered what will happen in Russia in the upcoming year? How can we get clues? A fascinating resource that unites historical and linguistic information in a condensed form is available each year to provide some answers and foresight. This fount of information is the collection of New Year’s Eve addresses that have been presented by Soviet and Russian leaders over the past almost fifty years. This blog is about how we can deduce linguistic, social, and even economic information from the New Year’s Eve addresses and determine what these speeches can tell us about historical events.

The tradition of the New Year’s Eve address started on December 31, 1970 with the speech given by Brezhnev. The addresses soon became an established part of the traditions associated with New Year celebrations in the Soviet Union, and later in Russia, together with the New Year tree, gifts, and traditional food. I have gathered all the addresses that have been given since 1970 and provide them on this website (in Russian). They are divided into 20th century addresses and 21st century addresses.

This first post describes how and why I collected these addresses. I first became interested in these unique speeches when I attended a conference talk by Fidler and Cvrcek who had analyzed the New Year addresses given by the leaders of Czechoslovakia. I began to realize that New Year addresses could provide a wealth of information. They are transmitted every year, they are influenced by historical events, and they have a codified form, so historical information can easily be found in these texts. Moreover, they reflect the socio-economic and cultural climate of the time, not only through the historical information they hold, but also through linguistic clues.

My first task was to collect all the addresses. Luckily, during Soviet times the New Year’s Eve addresses were printed on New Year’s Day on the front page of Pravda (‘Truth’), the most important daily newspaper in the Soviet Union. An electronic version of all the Pravda issues is available. However, I quickly found that although I could search through the newspapers easily enough, the quality of the photocopies was poor and I was not always able to read the text. So, this part of my journey led me to the basement of the local university’s library where the microfilms of old newspapers were stored.

There is a certain irony in that, at some point, most people envisioned that microfilming would be the technology of the future, and now it is associatedmicorfilmreader only with the past. For me, this was my first experience working with microfilm. In order to access the collection, I had to use a special designated elevator to the basement of the building, which only the people who wanted to access microfilm were allowed to use. The staff members there were delighted to see someone at all and seemed to enjoy helping me learn how to view microfilm on the screen of the reader (similar to the one shown to the right) and how to convert the images on the screen to pdf files. I typed in all the New Year’s Eve addresses from 1970 to 1991 using those copies.

Unfortunately, the practice of publishing the New Year addresses, which was so helpful for the generations of historians to come, suddenly stopped in 1992. Until 1992, the front page of the newspaper clearly indicated that Pravda was the main media outlet of the Communist Party, but on January 1, 1992, the New Year address is missing and in its place on the front page is a horoscope – a clear sign of change.

In addition to the New Year addresses, a considerable amount of cultural and historical information is provided by the overall appearance of the New Year’s Dayphoto front page. For example, pictures in the January 1 editions tell their own story. In the newspapers from 1971 to 1984, the New Year’s Eve addresses sometimes were accompanied by a picture of the leader, however most often they were printed next to an impersonal picture, such as a photograph of the Spasskaya tower with the Kremlin clock or a drawing of a hammer and sickle in front of the clock. Year 1985 was the first year since 1971 when ordinary people appeared in the front page photo of Pravda on January 1. The photo from the front page of that year shows traditional participants of the New Year celebration in front of the Kremlin Palace of Congresses. It portrays Ded Moroz (a fictional character similar to Father Christmas), Snegurochka (the Snow Maiden), and a boy who represents the New Year. Later that year (1985), Gorbachev came into power and started a process that eventually led to “socialism with a human face”, but apparently something was already in the air on the eve of 1985, inadvertently evidenced by Pravda’s front page photo. This change opened doors to other changes, which eventually led to the New Year address no longer appearing in Pravda seven years later.

Most of the more recent New Year addresses are available online. However, I couldn’t find the New Year address given by Yeltsin on December 31, 1997. My brother-in-law, a journalist, suggested that I contact the Yeltsin Center’s archives that store all the documents that are pertinent to Yeltsin’s rule. I wrote to the Center and explained that I was conducting a study of the New Year addresses and had collected all but the one from 1997, and asked if they could provide the missing address. They were very kind and promptly sent me a pdf file of what appears to be the original document that Yeltsin would have actually used to read the address. In this document, to my surprise, some of the words are in bold print. It seems that these are the words that were supposed to be the intonational focus when spoken aloud. I am now wondering whether this use of bold font to stress particular words was standard practice for written speeches or if this was an accommodation made specifically for Yeltsin.

The curators of the Yeltsin Center foundation have written to me occasionally since that initial contact and have asked if I’ve published any further research about the New Year addresses. This blog is my first step towards that effort. My next post will explain how I deduced information from the most frequent words found in the New Year addresses.

About Julia Kuznetsova

Standard