In analyzing text as a humanities scholar, diving deep into a key phrase or passage is often the best way to develop a hypothesis or summarize the literary significance of a work. Think of phrases that have become short-hand micro-summaries or prompts for a literary work’s key theme. To see the words “To be or not to be,” most scholars immediately think of Hamlet and the start of that soliliquy that questions existential choices. When one reads “It was the best of times, it was the worst of times….” the extremes of human experience made concrete in Charles Dickens’ “A Tale of Two Cities” could not be more plain yet poetic.
By contrast, in using a digital tool for text analysis, the humanities analysis process could not be more opposite and the results more different. This act of “distant reading” can reveal patterns our human minds are not adept at spotting through such a large volume of words, but in the process I find myself feeling like a layer of humanity that makes literature worth examining is lost. I see benefit in how digital tools can investigate how a text was structurally created or in revealing the change of an author’s writing work over time. But, will the data digital tools spit out truly capture what makes literature speak to the truth of the human condition? I suspect not.
As one way of testing this out, I use a word cloud created on WordClouds.com to run a text analysis of the American literature classic The Great Gatsby.
As beautiful as the word cloud that came out was, I found it to be devoid of meaning and even misleading because much of what makes the book so interesting is what is implied between the lines of the relationships than what is directly said by the web of key characters. Additionally, as much fun as playing with the different customizing features of this word cloud tool was, I realized that in seemingly frivolous aesthetic choices to adjust colors, fonts, shapes, etc. some words literally became easier to read than others, which could skew the conclusions a reader may make in this “analysis.” For example, even though reviewing the accompanying text excel sheet paired with the visual word cloud tabulated the most commonly used words, the second most prominent word, “Gatsby” (used 189 times throughout the book) is not easily visible in the word cloud, whereas the next most frequent word (“Tom” at 175 times used) was front and center in the word cloud. How could that be based on the tools own quantitative data? To make amends to the key character, Gatsby, I made the word cloud itself into the shape of the letter “G,” a visual reminder of the ubiquity of Gatsby’s presence throughout the story even when he was not directly in a scene. Like Mark Twain’s famous quote about there being “lies, damned lies, and statistics,” qualitative data is no less subject to manipulation. As digital tools make more compelling info-graphics for communicating data, I worry that the attractive visual aids may delude us into thinking we have facts in front of us. The evidence we need to draw accurate conclusions will not always be pretty.