When I was writing my undergraduate thesis, I often came across advice about how to look at sources similar to that given in “Tooling Up for Digital Humanities”:
“Scholars once accustomed to studying a handful of letters or a couple hundred diary entries are now faced with massive amounts of data that cannot possibly be analyzed in traditional ways.”
When working on my thesis, I only had a handful of letters, because I was working with the very limited sources left behind by the Catholic Women in Elizabethan England. However, the fact remains that we now live in a time when there is more information available than ever before. Thankfully, we also have more ways of analyzing this information than ever before. One of these is Voyant, a tool that analyzes texts to see what words are most common, and when they occur.
In examining some texts using Voyant, I realized that there was truth to another statement in “Tooling Up for Digital Humanities,” that simply looking at the words in a vacuum can lead to a simplistic understanding of the text. Having a close understanding of these texts helped me to avoid some simplistic assumptions, but in unfamiliar texts, such close reading would be more difficult. For example, I ran an analysis on Wuthering Heights and found that the words “Catherine” and Cathy” both appeared frequently. Knowing the book well, I knew that this referred to two characters; without this familiarity, I might not have understood why the two terms appeared as they did.
I also found some problems with the text entries from Project Gutenberg. I had to remove the beginning and end of the texts, which explain what the text is and what restrictions are placed on it, or else Voyant mistakenly believed “Gutenberg” to be a popular word in every text analyzed. I also had to check if there was an introduction; including the introduction to Edgar Allan Poe’s poems created a very different chart than the one that used only Poe’s own words. If I was to analyze a poem to see if it was written by Poe or not, checking for similar common words, like heart, like, love, and night, would be a good first step. This would be skewed if I included words written by other authors.
I next ran an analysis on Jane Eyre. I was most interested in the word “Little,” because my favorite line in this book is “Do you think, because I am poor, obscure, plain, and little, I am soulless and heartless? You think wrong! — I have as much soul as you, — and full as much heart!” Using the “Word Tree” tool, I could even see a little bit of the context. However, the punctuation not being scrubbed left some obvious problems; I am not interested in “little” being followed by a semicolon, but rather what word came next. If I was to run a proper analysis, I would need to Tokenize the text.
Using Voyant was very interesting, and revealed some of the advantages and disadvantages of text analysis tools.
Works Cited