On the R Track

Hacking with R

R is a programming language that is traditionally used for statistical computing. By plugging large amounts of data into RStudio (a user-friendly add-on to R), R will be able to analyze it, plot the data, and find correlations. Typically, the scholars that use this tool are statisticians, but humanities scholars are slowly warming up to this method of analysis. Close reading and human synthesis are incredibly important to these scholars, but, by using R, we are able to not only analyze one text’s specific word choice, we are able to analyze the entire canon at once.



Learning R took weeks in our Hack Lab. We were lucky enough to be provided with an early version of Matthew L. Jockers’s book Text Analysis With R for Students of Literature, but even then we fought against the program. The promise of analyzing hundreds of books this way was enticing, but we could hardly handle just one text. Many of us ended up simply plugging in the information without actually understand how or why we were doing it. I can’t say much about my own experience, simply because I didn’t really understand it. However, that’s not to say I didn’t learn anything.


R was a great lesson in cleaning data. We used Project Gutenberg to acquire our texts, which had a large amount of boilerplate metadata. If you don’t get rid of it, the table of contents and the copyright page will be analyzed along with the actual text. It was an excellent introduction to acquiring free texts online and how to use them. This also helped us conceptualize copyright laws and intellectual property rights. A debate about whether or not we have a right to Hemingway’s texts emerged, even though none of us had ever thought about it in a classroom before. R continuously fed us with great discussion topics, even though we hardly moved passed the first few chapters of Jockers.


R was also a great lesson in patience. If one letter was off in RStudio, nothing would work. This required a great deal of combing through the code to find the small mistake. Often we had to read each other’s, just to get fresh eyes on it. This really made the lab much more collaborative than previous ones and we often worked together to produce the correct material.


Our discussions on the broader implications of this tool were by far the most important thing to come out of the Hack Lab. With R, we are no longer limited to experiencing texts one-on-one. We are now able to analyze the entire canon. Is the word choice of a male New England writer similar to a female one? How similar are the Bronte sisters? How does vocabulary change by location? By time period? By person? How much did people care about the future in the sixteenth century? This opens up a huge number of research questions based on macroanalysis. R can put to rest many debates around word choice and open us up to a whole world of analysis.

Tags: