Voyant Tool: Week Three Practicum

In the class reading Tooling up for Digital Humanities there are several sections that I found significant to my findings from the tools. The section, Authorship Attribution, discusses the identification of writings based on comparing common function words. A tool called Document Terms on Voyant shows you a word cloud of the most frequently used terms. For my particular book, The Wizard of Oz, the most frequently used words were the characters names: Dorothy, Scarecrow, and Woodman.

In another section of this same reading, Term Frequency, it states that the most fundamental form of content-based text analysis is word counting. This is how often words and phrases appear in a text. The Text Revival section of this reading correlates to the Phrases tab on the Voyant tool because it searches particular words and phrases. I think these results show that the story is very character driven and focused. It’s crazy how you are able to make conclusions and assumptions about something simply from this tool. 

The next reading we had for class was titled When OCR Goes Bad: Google’s Ngram Viewer & The F-Word, and it discussed optical character recognition. It takes the word you input and scans it against the five million books Google has. The Ngram does not always recognize words correctly and it is case sensitive. One thing I found interesting was how the tool could show you different relationships between different words during different time periods. It allows you to then search books that these results are found in, which can give great insight into relationships during different times. I have included one of my search results between different animals during different time periods. I was not surprised by the results, but it was interesting to look and see when domesticated animals became common in comparison to owning a horse throughout the decades. 

 

In the final class reading, Optical Character Recognition, it talked about how OCR provides a high degree of recognition accuracy. In both activities, the pattern recognition component of OCR was the most frequently utilized and displayed in different ways. My take-away from this was learning how to read, analyze and use these patterns to make some observations about texts. 

 

 

 

 

 

 

 

 

 

Zotero Practicum

I inserted a screenshot of both the bibliography and the ten Zotero links because they did not paste in the correct format on my site but I have included them anyways. The categorization of materials is useful to digital scholarship because it helps researchers access, store, and share data easily with others in a reliable manner. I have found it particularly helpful in storing different types of medias in an organized manner that is easy to access later on. The three authors text we read this week provided insight into the positives and negatives of digital humanities and the different applications and rules that go into making it successful and the simple ways to avoid things from going wrong.
Categorization of materials is useful to digital scholarship because it helps sort data to make it easier to search, save and find for users. The reading Classification and Its Structures addresses the many different ways things can be classified: One-dimensional, N-dimensional, classification schemes, Priori Systems and several rules of classification to follow.  The closer the purpose of the classification to the central problem of the research, the more likely is a custom-made classification scheme to be necessary.This reading also highlights a growing emphasis on image-based computing for humanities and how this database is growing. However, there are difficulties with categories for humanities research. This reading discusses the difficulties of agreeing on and maintaining consistency in keyword-based classifications or descriptions of images due to similarities among graphic images.
In the second reading, Databases, addresses how databases can be problematic for some humanities research and how to minimize these errors. Humanists had realized that the use of databases could create intellectual opportunities such as the mapping of relationships among entitiesvisualization of information patterns and methodologies worthy of studying further. This chapter focuses on the implementation and design of relational databases to remove technical and conceptual details that are problematic. The most common issues involved redundancy, which can be resolved with creating a primary key record and normal forms. Transaction management and Collaborative Database Collections can often lead to fragmented data that does not give consistent results. However, if implemented successfully, it could expand the possibilities of knowledge representation considerably. 
Finally, in The Order of Things, the author seems to question what constitutes how things are classified and what establishes the justification behind their categorization. This seems to be the overarching issue he is addressing and how that related to our thinking of categorization in humanities. This author brought up several examples that served as good tools in how we think about categories and groupings of items.
Citations:

Foucault, M. (2005). The order of things. doi:10.4324/9780203996645

 

 

 

1 2