21
Feb
2021
Feb
2021
Voyant Tool: Week Three Practicum
categories: Uncategorized
In the class reading Tooling up for Digital Humanities there are several sections that I found significant to my findings from the tools. The section, Authorship Attribution, discusses the identification of writings based on comparing common function words. A tool called Document Terms on Voyant shows you a word cloud of the most frequently used terms. For my particular book, The Wizard of Oz, the most frequently used words were the characters names: Dorothy, Scarecrow, and Woodman.
In another section of this same reading, Term Frequency, it states that the most fundamental form of content-based text analysis is word counting. This is how often words and phrases appear in a text. The Text Revival section of this reading correlates to the Phrases tab on the Voyant tool because it searches particular words and phrases. I think these results show that the story is very character driven and focused. It’s crazy how you are able to make conclusions and assumptions about something simply from this tool.
The next reading we had for class was titled When OCR Goes Bad: Google’s Ngram Viewer & The F-Word, and it discussed optical character recognition. It takes the word you input and scans it against the five million books Google has. The Ngram does not always recognize words correctly and it is case sensitive. One thing I found interesting was how the tool could show you different relationships between different words during different time periods. It allows you to then search books that these results are found in, which can give great insight into relationships during different times. I have included one of my search results between different animals during different time periods. I was not surprised by the results, but it was interesting to look and see when domesticated animals became common in comparison to owning a horse throughout the decades.
In the final class reading, Optical Character Recognition, it talked about how OCR provides a high degree of recognition accuracy. In both activities, the pattern recognition component of OCR was the most frequently utilized and displayed in different ways. My take-away from this was learning how to read, analyze and use these patterns to make some observations about texts.