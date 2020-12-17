Petr Plecháč, completing a Ph. D. at Charles University, made world headlines with his analysis of Shakespeare’s Henry VIII. It was long accepted that the play was co-authored by playwright John Fletcher, but Plecháč’s study – using machine learning – analysed word frequency patterns and rhythms to provide further evidence that the play was a collaborative effort. Henry VIII was not written by Shakespeare alone.

Determining the differences required a granular approach by the academic, who explained the various aspects of the project, including why Shakespeare was so suitable for such an investigation:

“My Ph. D. thesis explored the possibility of using versification features – such as rhythm or rhyme – to help determine authorship or authorship recognition. I chose Shakespeare’s Henry VIII for a number of reasons. For one, there was a lot of training data upon which to build, namely all of the plays by Shakespeare, John Fletcher and Philip Massinger (also posited in the past as a possible co-author), and many lines written in iambic pentameter.

“Second, a strong hypothesis already existed that Henry VIII was a collaborative work. It was long understood that this play was not solely by Shakespeare, so that made it a good choice. Finally, there is the fact that all of Shakespeare’s work has been digitised. Those were all factors that led me to focus on this play.”



In practice it meant separately inputting not only the play but extensive work by both playwrights (as well as Massinger, seen as a less likely candidate for co-authorship) into a machine learning system, and familiarising it with their individual approaches.



“The general principle was that I collected different works of the playwrights in around the same period that Henry VIII was written, and I trained the algorithm to recognise their styles. After this information was input, the trained model was then applied to Henry VIII. The first thing you do when you train the model is test how it performs on known data and you perform cross-validation. That means you leave out one play by Shakespeare and train the rest and then run it on the one that was taken out. And you do the same with Fletcher. And the model reacts and labels the data in cases where you already know the correct answer. This way, you can estimate the accuracy of the model and how reliable it is. In my case it was very reliable: in 99 percent of cases it was able to determine authorship correctly.



“When we talk about versification, it means focusing on stress patterns – bit strings that represent stressed and unstressed syllables. It’s much easier today because these days we have very reliable software libraries that offer all kinds of machine learning and the machine learning part is easier to implement through a few lines of code. What is more difficult, is extracting these bit strings, these stress patterns from a text, but I was lucky that I was able to use software that had been developed at Stanford University by Ryan Heuser that had even been tested on Shakespeare’s work. There was evidence it would perform pretty well.”

Those patterns are far too numerous and interconnected to ever be successfully mapped by humans - effectively an impossible and largely fruitless task. That is why machine learning proved so useful, especially since changes could be made to existing software that Plecháč used.

“It is possible to tweak existing software a little bit or to tune its parameters, this is annotation versus versification level. First, you need to label the data, you need to tokenise it, to split the text into words. Also you need to mark the stresses, which syllable carries the stress and which does not, and then comes the analysis – training the machine to recognise the work and that is where my coding began. Of course, I didn’t code it from scratch and a mistake in some of the articles that came out about my work was that I used neural networks – I did not. It was a very popular technique in authorship recognition called support vector machine. And that is not my invention. I used general machine learning software libraries that were designed for this purpose.”



In the end, his findings further cemented that Henry VIII was a collaborative work, as was first posited by James Spedding in 1850.

“There are tiny, tiny differences when it comes to the frequency of words: someone uses the definite article far more often, someone will use by, another nearby, and language is unconscious. Stylometry focuses on these tiny differences that are very hard for a human to detect. If you or I read Henry VIII, we can focus on different aspects, who uses ye more often than you, or feminine endings in lines, but this would occupy all of our time. As humans we cannot focus on hundreds of different details simultaneously. In the end, each scene, in each play, is represented by one thousand variables; each scene is represented as a vector in 1000 dimensional vector space.”



Plecháč says he never expected his work to cause the sensation it did and says that is only good news for the field of Versification Studies, which have a tradition in Slavic countries like the Czech Republic but are virtually unknown anywhere else.



“Not that long ago, few people in English-speaking countries had heard of versification studies before so it is the first time readers heard about something like that. I never expected the article to cause such a sensation. I had gone to San Francisco for a conference and then learned from a friend that a major Hungarian paper had reported on my work, which was followed by CNN and then article after article around the globe. That kind of thing doesn’t happen really very often to people specialising in versification. I received emails from some really big names, asking me to collaborate on future projects. So that has been very positive and the interest overall has been very important for the field.”

Petr Plecháč, Ph. D. Petr Plecháč, Ph. D., completed his first doctorate in theory of literature at Palacký University in Olomouc and his second, in mathematical linguistics, at the Faculty of Arts at Charles University. He made world headlines with his research of authorship in Shakespeare’s Henry VIII. Plecháč works at the Institute of Czech Literature at the Czech Academy of Sciences and at the University of Basel.