Letter to Future Student

Dear Future Student,

Welcome to Digital Humanities! I must admit I’m jealous – you are just starting an experience that I really enjoyed. I hope you are ready to think deeply about technology and the internet. It’s a fascinating subject, and Professor Remy will lead you to some really meaty articles. But you will probably learn the most this semester from your classmates. This class is full of interesting discussions, and you will hear perspectives that you haven’t before. Don’t be afraid to come forward with your own experiences. Whether it’s something you love, hate, or are indifferent to, your perspective on matters is what makes this class great.

This class asks what it means to study the humanities using technology. I took this class during the Coronavirus lockdown, so it was entirely remote. This was a great time to study this subject, because I had examples right in front of me of the benefits of the internet with regards to accessibility for everyone. However, we also saw some of the downsides of this – learning how to operate technologies remotely can be difficult. If you ever had to take classes remotely due to lockdown (or for other reasons), I encourage you to think about your perspective. What positives and negatives do you see in each? What about other online communications? As a society, more and more of our culture has moved online. Taking this class is a great opportunity for you to think critically about what this means for you.

The most important technology I learned about this semester was Zotero, which is a program to create bibliographies and citations quickly. I wished I had found this program earlier! But the most important lesson of this class is that everyone can accomplish a lot with technology. I do not feel like a coder, after having taken this class. But, I did do a little coding, and I learned a lot from the experience. You don’t have to throw yourself into things and become an expert; just try your best, and accept that learning is why you’re here. Make an attempt, and reflect on your experiences, and you will do well. It is your reflections and thoughts that matter, not if you’re able to succeed at every single activity. Don’t be intimidated.

I will be bringing Zotero out of this class, but I will also be bringing the confidence to try new technologies, and the perspective on why accessibility matters. These will be great benefits for the rest of my education, and my life. You made a good choice to take this class, it will be useful whatever your major is.

I also bring the deep-rooted fear that comes whenever I think about the lack of privacy online. This isn’t a new thing for me, but you might gain some existential dread with the greater understanding of how technologies work. Don’t be afraid to think about the negatives of technology, as well as the positives. Having a well-rounded understanding on things will allow you to make informed decisions.

Have a great semester. I hope you’re able to meet with your classmates in person – I never appreciated that enough before this year.

VR: A Reality that can’t be Escaped

Virtual reality is a spectrum, and elements of it can be useful in a variety of situations. “Surround Sound” is something that is possible even on mobile devices, as headphones now will have different sounds from the right and left ear. Specials seats to simulate movement has been a mainstay of rollercoasters from “Soaring over California” to the majority of the rides at Universal Studios. Augmented reality, such as phone apps, has been made mainstream by Pokémon Go, and even before that was normalized to an extent by Google Maps. These are also part of a long history of tricking the senses to make someone think they are elsewhere; as Robert Cable points out, theater and the arts have been creating immersive realities for people for hundreds of years (Cable). My own experiences with VR headsets have felt similar to hypnosis, when your senses give conflicting information. Hypnosis is an old practice, which has been formally studied since the 18th century (Orne and Hammer). Thus, treating VR as some unprecedented, frightening thing, is unreasonable. Elements of VR have been present for many years; what is most new are VR goggles, and the proliferation of internet communication.

VR headsets do carry significant problems, which I believe need to be discussed. Screen time has the potential to damage vision (“Developer Warns VR Headset Damaged Eyesight”). As someone who grew up being told to not sit directly in front of the television, I am bemused to be told to place a screen an inch from my eyes. Headsets also cause nausea, which makes VR something that is not accessible to everyone. The benefits of VR – increased communication, “travel,” and educational opportunities – are locked behind a barrier for people who become nauseas when using VR headsets. The headsets are also expensive, creating further barriers, and rely on visual stimulus, locking out the visually impaired. And, as with any technology, I worry that as it becomes prevalent, the choice to “opt out” may no longer be available. If someone has their own health reasons to not use VR, and then classes require virtual trips to a museum, that person is put in an impossible position. Similarly, societal pressures are very strong; if the only way to see friends is to purchase a $300 headset, people will have to use VR whether they like it or not. Finally, VR as it currently stands is a threat to privacy. Oculus requires a Facebook login. In the words of Rory Mir and Katitza Rodriguez, “With this lack of choice, users can no longer freely give meaningful consent and lose the freedom to be anonymous on their own device” (Mir and Rodriguez). Using one of these devices – much like a Google Home or an Alexa – gives up some of your privacy.

That said, the internet communication elements of VR are important. Being able to talk with people far away is a blessing, as the last year has taught us. And using the internet to “visit” museums can share and spread cultural artifacts, without locking them behind the barrier of plane ticket and entry costs (the barrier of purchasing the internet and a computer are still present) (Robson et al.). Many of these projects, such as the Virtual Studiolo, can be viewed from anywhere with an internet connection, and without sacrificing privacy to view (MacNeil). It must be acknowledged that these virtual walkthroughs of museums are often clunky and imperfect. The last year has shown us that physically going outside is different than visiting from an online forum. Part of the “immersion” of a museum can’t be replicated – even by the very best of VR that we have available. However, I find that this is something that can be explored more. I dislike VR headsets, but I think there is a place for Augmented Reality and Virtual visits to sites of cultural heritage – to preserve the sites that would be destroyed by constant tourism and make visiting them more available to everyone. Unfortunately, I think making museums virtual will be largely a labor of love, since the price benefits of selling VR headsets for videogames is much higher, so I anticipate VR will move in that direction.

 

Works Cited

Cable, Robert. What Is Virtual Reality? | Stanford Humanities. 8 Feb. 2019, https://shc.stanford.edu/news/stories/what-virtual-reality.

“Developer Warns VR Headset Damaged Eyesight.” BBC News, 10 June 2020. www.bbc.com, https://www.bbc.com/news/technology-52992675.

MacNeil, Anne. “The Virtual Studiolo.” Storymaps, 12 Feb. 2021, https://storymaps.arcgis.com/stories/54830bcfdb9f4c878d05d8cbe21cf4c3.

Mir, Rory, and Katitza Rodriguez. “If Privacy Dies in VR, It Dies in Real Life.” Electronic Frontier Foundation, 25 Aug. 2020, https://www.eff.org/deeplinks/2020/08/if-privacy-dies-vr-it-dies-real-life.

Orne, Martin T., and A. Gordon Hammer. “Hypnosis | Definition, History, Techniques, & Facts.” Encyclopedia Britannica, https://www.britannica.com/science/hypnosis. Accessed 7 May 2021.

Robson, Stuart, et al. “3D Recording and Museums.” Digital Humanities in Practice, https://eds-a-ebscohost-com.libproxy.chapman.edu/eds/ebookviewer/ebook?sid=c6ee6c9e-ba7a-4988-9374-92ea77b3e148%40sdc-v-sessmgr01&ppid=pp_91&vid=0&format=EB. Accessed 7 May 2021.

Snip Snip

I have been crocheting since I was fourteen, and I know: every project requires the destruction of a pair of scissors.

The thing is, crocheting is making something. I take the ball of yarn, and I build it up in neat rows and patterns. But to build it, I have to break the project down, undo stitches right and left, and at the end, I need my pair of scissors to cut and tie off the yarn. My flowers, hats, scarves, and blankets were all made by ripping patterns apart and cutting yarn up, just as much as they were made by building. As Mark Sample says, destruction is necessary to learn (Sample).

In a similar vein, working on the code for Monkeys writing Shakespeare required that I break the thing to understand how it works. I already know enough code from previous experiences to understand how to troubleshoot some basic C++ and HTML, so I focused more on a problem I ran into with the code: Why couldn’t I get apostrophes to work?

I eventually realized that the variable “original_text” was set apart by apostrophes, so inserting one in the middle of the line broke the code. Fortunately, I could insert the HTML code for apostrophes, “&#x27” to add in the apostrophes to my text. I didn’t know how to fix this problem until I looked online, and found the webpage “How to create the apostrophe symbol in HTML”. Fixing this was a collaborative effort, even if my teammate at educative.com will never know me.

I was able to add in apostrophes using the internet’s help, so I could destroy Dolly Parton’s song while maintaining proper grammar.

I don’t consider myself a coder. I don’t find that joy in making codes work that I think is a requirement to really loving coding. I also don’t actually speak any coding languages; I can stumble along if I have a book next to me, but actually writing code isn’t something I have ever done well.

However, I’m not afraid of coding. And the way that coding is taught and thought about currently inspires fear. It is similar to the way math is taught; the idea that there are “math people” and “not math people” is something I have always disliked. Most people can grasp the concepts of math and coding, but these things are taught as if it is an entry to a secret, high society, that not everyone deserves entry to.

These gates that bar the way to coding are bars of privilege. Women are kept out of coding because at the first sign of trouble, they’re told “well not everyone is good at this…” and they start to believe that a B+ in class is a sign that they don’t belong. Meanwhile, boys with a C in class feel like their doing fine, because they are not told with every glance that they do not belong. Miriam Posner writes about this, saying:

“But it also makes you extremely conscious of your mistakes, confusion, and skill level. You are there as a representative of every woman. If you mess up or need extra clarification, it’s because you really shouldn’t — you suspected this anyway — you shouldn’t be there in the first place.” (Posner, Think).

I cannot speak to the experience of people of color, but I wouldn’t be surprised if it was similar. Even when there are coding classes and math classes that work to overcome the barriers that keep coding so white and male, the pervasive idea that people are “coders” or “not” will continue to keep the underprivileged firmly in the “not” category.

I think, instead, we need to think about coding similarly to how we think about cooking. Not everyone is a cook. But, anyone can make toast, if they’re given a toaster, some bread, and some easy to follow instructions. We have to remove the economic barriers keeping people from toasters, and also (to strain the metaphor) remove the socialized idea that a certain group of people will burn toast every time. Let people burn their toast and break their code; it’s the first step to a great meal and a good coder.

My first crochet projects were a mess, and I certainly wasn’t branching out into creating my own patterns. But I was a crocheter as soon as I picked up a hook and started enjoying myself. Through talking with others, tearing projects apart, and practice, I got a lot better. But what mattered was the enjoyment. If a person enjoys Digital Humanities, they are a Digital Humanist – even if only through creating messy projects using software others have already made.

The only cuts that hurt more than they help, when learning something, are the ones that cut people out of the group.

 

Works Cited

“How to Create the Apostrophe Symbol in HTML.” Educative: Interactive Courses for Software Developers, https://www.educative.io/edpresso/how-to-create-the-apostrophe-symbol-in-html. Accessed 30 Apr. 2021.

Posner, Miriam. Some Things to Think about before You Exhort Everyone to Code. http://miriamposner.com/blog/some-things-to-think-about-before-you-exhort-everyone-to-code/. Accessed 30 Apr. 2021.

Posner, Miriam. » Think Talk Make Do: Power and the Digital Humanities Journal of Digital Humanities. http://journalofdigitalhumanities.org/1-2/think-talk-make-do-power-and-the-digital-humanities-by-miriam-posner/. Accessed 30 Apr. 2021.

Sample, Mark. “Notes towards a Deformed Humanities.” @samplereality, 2 May 2012, https://samplereality.com/2012/05/02/notes-towards-a-deformed-humanities/.

Shaffer, Kris. Monkeys Writing Shakespeare. https://kshaffer.github.io/monkeyswritingshakespeare/. Accessed 30 Apr. 2021.

Scalar Practicum

In class, I made a small Scalar project on crocheting. However, I immediately learned the truth of Jessica Bocinski’s words: you need to know what you’re doing ahead of time, before starting a Scalar project.

Scalar’s page renaming system does not change what the page is called in the “recent” tab. This made my in-class project very difficult to work with, because I had accidently made two pages called “home” and they both appeared in the “recent pages” tab. This was the most frustrating thing I found about Scalar, finding pages or media once they were uploaded. Therefore, I created a new project from scratch for the practicum, where I planned ahead of time what I would need. This is a Scalar Project retelling the Agatha Christie novel “The Mysterious Affair at Styles”. I learned that Scalar is much easier to use when things are planned out ahead of times. My main difficultly in this project was ensuring the links to media all worked correctly. I learned a lot about how to create paths, which I found interesting. The pages are far from perfect, but it was fun to set up the project. I learned the difference, for example, between the image carousel and the card widget, and why each would be useful in different circumstances.

Much like the widgets, using Scalar, WordPress, or Storymaps are all useful, but in different circumstances. Scalar is a great source for telling stories with linked pages, or for museums. Storymaps and Adobe Spark, however, are good for presentations with a single page. My Storymap presentation raised a lot of interesting research questions to me, which making a similar Scalar project would not do; however, trying to tell a murder mystery through Storymaps would be a frustrating experience. WordPress’s tagging system is superior, but it is fundamentally a blogging site, without the paths that are so useful on Scalar. I am glad that we are learning all of these different tools, because each has its own use for different kinds of projects.

Scalar link again: https://scalar.chapman.edu/scalar/the-mysterious-affair-at-styles-/index?path=emily-inglethorp-has-died

Center for American War Letters Digitization Proposal

Chapman has collected a unique archive in the Center for American War Letters (CAWL). These letters were given to Chapman with the understanding that the letters would be used for historical scholarship. In order to fulfil the trust of the donors, Chapman has an obligation to make these letters as accessible to scholars as possible. As shown in the last year with Coronavirus, accessibility often means making things available online. Even in normal times, digitization makes documents more accessible to people with low vision and to people who cannot travel to California. For these reasons, the Center for American War Letters ought to be digitized.

Digitizing these letters is not simply a matter of scanning them and running an Optical Character Recognition (OCR). OCR is not a perfect transcription method even for printed books (Sullivan). The handwritten letters discussed here would need to be transcribed by people. This doesn’t necessarily mean the transcription would need to take place at Chapman, however; we could follow the example of Transcribe Bentham, and attempt a crowdsourced transcription. This would require high quality photos of the letters, and a way for people online to submit their transcriptions (Ross). However, these digital commitments would be very possible, and many projects have used them successfully.

Once the letters were transcribed, they would have to be organized. We must not underestimate the task of classification; organizing by year, war, or topic must all be considered carefully (Sperberg-McQueen). We must also consider the legal ramifications, and the ongoing costs, associated with digitization. As Roy Rosenzweig writes, a process must be developed on how to treat legal matters; should letters simply be purged if there are complaints? (Rosenzweig). Chapman also must allocate money for the upkeep of the project, or else it will decay (Nowviskie and Porter).

CAWL is one of the things that makes Chapman unique; increasing its digital footprint will increase Chapman’s visibility. It would be similar to the University of Michigan’s “Michigan in the World” historical project (Michigan in the World | U-M LSA History). The investment needed to digitize these letters would be worth it. On a personal note, when writing my undergraduate thesis, I could not access a document that was in England, and written in Latin. The digitization of this document saved my thesis. I know that digitization is difficult and expensive, but it is an important part of being a research institution at this time.

 

Works Cited
Michigan in the World | U-M LSA History. https://lsa.umich.edu/history/history-at-work/michigan-in-the-world.html. Accessed 14 Mar. 2021.
Nowviskie, Bethany, and Dot Porter. “The Graceful Degradation Survey: Managing Digital Humanities Projects Through Times of Transition and Decline.” Literary and Linguistic Computing, vol. 24, no. 2, June 2009, pp. 225–33. DOI.org (Crossref), doi:10.1093/llc/fqp009.
Rosenzweig, Roy. “Scarcity or Abundance? Preserving the Past in a Digital Era.” American Historical Review, vol. 108, no. 3, June 2003, pp. 735–62.
Ross, Claire. “Social Media for Digital Humanities and Community Engagement.” Digital Humanities in Practice, by Claire Warwick et al., Facet Publishing, 2012, pp. 23–45, https://eds-b-ebscohost-com.libproxy.chapman.edu/eds/ebookviewer/ebook?sid=5322476a-83c0-4a5d-a546-0dd1c3ab2a02%40pdc-v-sessmgr04&ppid=pp_23&vid=0&format=EB. eBook Collection (EBSCOhost).
Sperberg-McQueen, C. M. “Classification and Its Structures.” Companion to Digital Humanities (Blackwell Companions to Literature and Culture), by Susan Schreibman et al., Hardcover, Blackwell Publishing Professional, 2004, http://www.digitalhumanities.org/companion/.
Sullivan, Danny. “When OCR Goes Bad: Google’s Ngram Viewer & The F-Word.” Search Engine Land, 19 Dec. 2010, https://searchengineland.com/when-ocr-goes-bad-googles-ngram-viewer-the-f-word-59181.

Scarcity or Abundance: The Choice Must be Ours

In the book Matched, by Allie Condie, only 100 stories, songs, and poems were preserved by the totalitarian government. Naturally, the heroine discovers a poem that the government had deemed too revolutionary to be preserved, and she is inspired by it to fight back (Condie). This book was one of many of the dystopian young adult books that were popular during my childhood, but the idea that a government would intentionally preserve some art, and destroy others, was a concept that remained with me long after I forgot the love triangle that was at the center of the book series. The difficultly of intentional preservation is that no one is without bias, leading to an abundance of proof for one side of history, and a scarcity of all else.

I ran into the problems of preservation when writing my thesis on Catholic women in the time of Elizabeth I. The records of women were often not nonexistent because of the patriarchal society, and the records of Catholics were often intentionally destroyed by the Catholics themselves, seeking to avoid persecution. This was, in some ways, a blessing; I had the opportunity to look at just about every record that has been preserved. As Roy Rosenzweig writes, “the injunction of traditional historians to look at “everything” cannot survive in a digital era in which “everything” has survived” (Rosenzweig). An important part of being a historian was working in that empty space where not everything has survived. Scarcity does not frighten me; although I wished many times in my research for more documents, I was able to focus on what did exist, and use it to make my own arguments. And this scarcity had a purpose. If every document I wished for had survived, many more people might have been murdered for their religious beliefs. The control of the preservation was in individual’s hands, and they made the choice to destroy things to protect themselves and their families.

The abundance of the internet frightens me far more, because the people who decide what to preserve often do not have the best interests of the people whose data they are preserving at heart. Some governments have considered laws on “the right to be forgotten,” which awaken a host of worries about censorship and privacy (Fleischer). Opening people’s mail without their knowledge is a federal offense; but activists worry constantly that their text messages or other internet communication will be used against them in court.

Of course, the potential scarcity of the internet worries me as well. I remember in 2017, when the Trump administration began deleting news articles about Climate Change. Rosenzweig’s comment “Future historians may be unable to ascertain not only whether Bert is evil, but also which undersecretaries of defense were evil” reminded me of this (Rosenzweig). The destruction of research for a political agenda is frightening, and the destruction on the internet carries with it an element of gaslighting. Burning a book leaves ash and a space on the shelf; editing an internet page may leave no record at all, outside of people’s memories.

My issue with the abundance of the internet is that control is taken out of the creator’s hands, and given to conglomerates who want to gather and sell data to the highest bidder. Therefore, I believe that the Archive projects we examined in class are a better way of remembering the past. They are by nature “opt-in,” as people have to send in photos and memories themselves. The curators of these projects have a responsibility to be considerate in examining what posts to save, if space is a concern. As for long-term preservation of these materials, I do not have an answer. Relying on government bodies, like the Library of Congress, as the September 11 Digital Archive does, comes with concerns about the government’s goals (Home · September 11 Digital Archive). Although the loss of data is a huge concern, humanity has survived the burning of the Library of Alexandra. I am more concerned that we will not survive the complete loss of anonymity and personal control that the internet threatens.

 

Works Cited
Condie, Ally. Matched. Penguin, 2010.
Fleischer, Peter. “Foggy Thinking about the Right to Oblivion.” Peter Fleischer: Privacy…?, 9 Mar. 2011, http://peterfleischer.blogspot.com/2011/03/foggy-thinking-about-right-to-oblivion.html.
Home · September 11 Digital Archive. https://911digitalarchive.org/. Accessed 3 Apr. 2021.
Rosenzweig, Roy. “Scarcity or Abundance? Preserving the Past in a Digital Era.” American Historical Review, vol. 108, no. 3, June 2003, pp. 735–62.

Midterm Lightning Talk: A Dangerous Experiment

For the midterm, I reviewed “A Dangerous Experiment”: Women at the University of Michigan. This project was very interesting to me because I attended Notre Dame for my Undergraduate degree, which only allowed women to attend in 1972. I wanted to learn more about the history of women’s education over the last hundred years, which this project covered.

I found Zotero very helpful for my essay, and I learned a lot about both the University of Michigan and about Digital Humanities projects through this activity.

My slides can be viewed here: Laura Neis Lightning Talk

StoryMaps Practicum

In her essay “Humanities Approaches to Graphical Display,” Johanna Drucker writes:

“the rendering of statistical information into graphical form gives it a simplicity and legibility that hides every aspect of the original interpretative framework on which the statistical data were constructed. The graphical force conceals what the statistician knows very well — that no “data” pre-exist their parameterization. Data are capta, taken not given, constructed as an interpretation of the phenomenal world, not inherent in it.”

This is to say, when data is presented, raw data loses something in being interpreted. No interpretation is without bias, and they all simplify what was originally there for the sake of legibility.

Using “Storymaps” showed that when data was assembled for easy access, a lot of information was lost. I used the sites of death of the 40 martyrs of England and Wales for my presentation, which can be viewed here: StoryMap. I was familiar with this dataset because I used it for my undergraduate thesis. However, if I was not familiar, the maps I made would have been less meaningful. As Richard White writes, creating these maps is a “means of doing research” – they were not inherently useful without the interpretation needed to make sense of them.

The maps also didn’t clarify some parts of the data, which needed to be specified elsewhere. For example, I had to add a paragraph about the nature of the dataset, and how the prevalence of priests in the 40 Martyrs of England and Wales shows that it is not a representative sample.

I ran into a few issues when uploading the data. For example, at first the file I was using didn’t have the names of the columns at the top. This meant ArcGis was unable to read it. Even once I was able to upload the file, for some reason the names had question marks in them. I was unable to fix this problem.

Adding in the monarchs as an attribute came with a problem; some sites had multiple monarchs, but only one is visible. For example, at Tyburn, people died there at every monarch, but the color is just orange for James I. It is unclear why, considering James was not the first nor the most prolific monarch. I also couldn’t add the actual date of death into the arcgis function, due to there being too many dates, and so had to simplify by sorting by monarch.

A close up view of a data point in ArcGis

Although Richard Reynolds was killed in 1535, under Henry VIII, his data point is marked orange for James I due to many people dying at Tyburn.

On its own, these maps are interesting, but background knowledge is needed to interpret them in a sensible way. For example, my research covered Elizabeth specifically, so I was not surprised that she oversaw more executions than any other monarch. I knew that she specifically targeted priests, who were more likely to be canonized, while other monarchs focused more on taxing Catholic laypeople.

A breakdown of deaths under various monarchs. Elizabeth I has 20, the next highest is Charlies II with 6

Elizabeth I oversaw half of the martyrdoms canonized by the Catholic Church.

This was a very interesting experience, and I was glad to be able to use these programs. A large part of my job as a research assistant is to represent knowledge in an understandable way, while noting the data losses that occur in this process.

 

Works Cited

Catholic Forum. “Patron Saints Index Definition: Forty Martyrs of England and Wales.” Wayback Machine, https://web.archive.org/web/20130313083422/http://www.catholic-forum.com/saints/martyr02.htm. Accessed 6 Mar. 2021.

Drucker, Johanna. “Humanities Approaches to Graphical Display.” Digital Humanities Quarterly, vol. 005, no. 1, Mar. 2011.

White, Richard. “What Is Spatial History?” 1 February 2010, http://web.stanford.edu/group/spatialhistory/cgi-bin/site/pub.php?id=29. Accessed 6 Mar. 2021.

StoryMap Link: https://storymaps.arcgis.com/stories/a73d9ad02a154f0a907d07aafc7a2d80

Topic Modeling and Algorithms

I ran the Complete Grimms Fairy Tales on the TMT to generate topic modeling. I tried it with various lengths of words. I thought it was interesting because it created categories for what kind of stories were being told.

Topic modeling of "The Complete Grimms Brothers" with 5 words. Many of the words are "rquote" or other meaningless terms

Running it with 5 words created a problem; because I had not tokenized my text, many of the topics included words that were not useful. However, Topic 4 is clearly stories about children, topic 8 about princess stories, and topic 10 probably about the “animal” type of stories.

Topic modeling of "The Complete Grimms Brothers" with 10 words

Running with 10 words removed some of the problems of using non-tokenized texts. For example, now topic 1 looks to be about people who “set off” from home to find castles, topic 2 about adventure stories, topic 7 about princess stories, and so on.

Topic modeling of "The Complete Grimms Brothers" with 25 words

But expanding the number of words is not always useful; using 25 words was too many. It began putting words like “snowdrop” and “dog” together, which might be because of too many words per topic.

I then ran Paradise Lost, which I have only read parts of, because I was interested to see if I could parse the topics if I only know them vaguely. Sure enough, I could see the correlation of Topic 1: Paradise, and topic 2: Lost, pretty quickly. Topic 5 also spoke to me as being about Love. If I am to use this tool on texts that I have not close read, I’m glad I can still parse the text somewhat.

Topic modeling of "Paradise Lost" with 10 words

I used the “Talk to Transformer” tool to see what it came up with for “Digital Humanities”. It gave me either definitions, quotes from academic articles, or a breakdown of how students did on a quiz based on their major.

Text generated by "Talk to Transformer", which includes the text "European (i.e. non British) students score better in several subjects than “British” students"

I don’t think this software will be writing blog posts any time soon. However, it is scary that information on the internet is so readily available. Several of my classmates found the Transformer tool could find information about them and their families with just a little input.

One could argue that the tools have to be trained on something. If information is available to be used to train these algorithms, is it really a problem if some of that information is a little sensitive? I say yes, absolutely, because the main reason we are developing algorithms to deal with large amounts of data is because companies are collecting large amounts of data. They don’t always know what to do with it, but they are causing a demand for software that can deal with huge amounts of data, by collecting this data.

I have never been able to opt out of this process. My classes began using Facebook as a method of communication when I was a senior in high school, and did not have a Facebook. This put me at a disadvantage with my peers, and I ended up having to go online more – using dial-up, because Wi-Fi is expensive and my family couldn’t spare the money – just to keep up. I have to use my gmail to sign into things all the time, signing me up for spam emails for the rest of my life. I have had to write letters of recommendation for my friends where their prospective bosses ask all sorts of personal questions that have nothing to do with their ability to accurately do a job. A friend had to take a personality test as part of the school application process.

Does this data collecting help anyone? It certainly hurts the underprivileged, who can’t afford Wi-Fi, or who have difficult family situations, as Danah Boyd points out. You can’t not be online – no job will hire you without a cell phone number to call. Currently, the choices are social isolation, and giving up privacy. As the last year has shown us, social isolation is not a choice anyone can make. We need to be protected from these businesses that gather data without permission. If that means we can only train AI on “Paradise Lost” – so be it.

 

Works Cited

Blevins, Cameron. “Topic Modeling Martha Ballard’s Diary.” Wayback Machine, 16 Nov. 2016, https://web.archive.org/web/20161116080309/http://www.cameronblevins.org/posts/topic-modeling-martha-ballards-diary/.

Boyd, Danah. Data, Algorithms, Fairness, Accountability. http://www.danah.org/papers/talks/2016/CDAC.html. Accessed 23 Feb. 2021.

Brett, Megan R. Topic Modeling: A Basic Introduction Journal of Digital Humanities. http://journalofdigitalhumanities.org/2-1/topic-modeling-a-basic-introduction-by-megan-r-brett/. Accessed 23 Feb. 2021.

Grimm, Jacob, and Wilhelm Grimm. Fairy Tales, by The Brothers Grimm. Translated by Edgar Taylor and Marian Edwardes, https://www.gutenberg.org/files/2591/2591-h/2591-h.htm. Accessed 26 Feb. 2021.

Milton, John. Paradise Lost. 1991. Project Gutenberg, https://www.gutenberg.org/ebooks/20.

Remy, Emma, et al. “How Does a Computer ‘see’ Gender?” Pew Research Center, https://www.pewresearch.org/interactives/how-does-a-computer-see-gender/. Accessed 23 Feb. 2021.

Using “Voyant” to analyze texts

When I was writing my undergraduate thesis, I often came across advice about how to look at sources similar to that given in “Tooling Up for Digital Humanities”:

“Scholars once accustomed to studying a handful of letters or a couple hundred diary entries are now faced with massive amounts of data that cannot possibly be analyzed in traditional ways.”

When working on my thesis, I only had a handful of letters, because I was working with the very limited sources left behind by the Catholic Women in Elizabethan England. However, the fact remains that we now live in a time when there is more information available than ever before. Thankfully, we also have more ways of analyzing this information than ever before. One of these is Voyant, a tool that analyzes texts to see what words are most common, and when they occur.

In examining some texts using Voyant, I realized that there was truth to another statement in “Tooling Up for Digital Humanities,” that simply looking at the words in a vacuum can lead to a simplistic understanding of the text. Having a close understanding of these texts helped me to avoid some simplistic assumptions, but in unfamiliar texts, such close reading would be more difficult. For example, I ran an analysis on Wuthering Heights and found that the words “Catherine” and Cathy” both appeared frequently. Knowing the book well, I knew that this referred to two characters; without this familiarity, I might not have understood why the two terms appeared as they did.

Use of the words “Catherine” and “Cathy” in the book Wuthering Heights. It is not immediately clear if it refers to one or two characters

I also found some problems with the text entries from Project Gutenberg. I had to remove the beginning and end of the texts, which explain what the text is and what restrictions are placed on it, or else Voyant mistakenly believed “Gutenberg” to be a popular word in every text analyzed. I also had to check if there was an introduction; including the introduction to Edgar Allan Poe’s poems created a very different chart than the one that used only Poe’s own words. If I was to analyze a poem to see if it was written by Poe or not, checking for similar common words, like heart, like, love, and night, would be a good first step. This would be skewed if I included words written by other authors.

If the preface is included, the word “Poe” is one of the most common – at least, in the beginning of the book

With the preface removed, the most common words become Poe’s own. Although even here, the prominence of the word “poem” means that I probably left an afterward at the end that I would have to remove if I wanted to truly examine this document.

I next ran an analysis on Jane Eyre. I was most interested in the word “Little,” because my favorite line in this book is “Do you think, because I am poor, obscure, plain, and little, I am soulless and heartless? You think wrong! — I have as much soul as you, — and full as much heart!” Using the “Word Tree” tool, I could even see a little bit of the context. However, the punctuation not being scrubbed left some obvious problems; I am not interested in “little” being followed by a semicolon, but rather what word came next. If I was to run a proper analysis, I would need to Tokenize the text.

The word “Little” in Voyant’s “Word Tree” tool for Jane Eyre

Using Voyant was very interesting, and revealed some of the advantages and disadvantages of text analysis tools.

 

Works Cited

Brontë, Charlotte, and F. H. (Frederick Henry) Townsend. Jane Eyre: An Autobiography. 1998. Project Gutenberg, https://www.gutenberg.org/ebooks/1260.
Brontë, Emily. Wuthering Heights. 1996. Project Gutenberg, https://www.gutenberg.org/ebooks/768.
“Optical Character Recognition.” Wikipedia, 28 Jan. 2021. Wikipedia, https://en.wikipedia.org/w/index.php?title=Optical_character_recognition&oldid=1003359641.
Poe, Edgar Allan. The Complete Poetical Works of Edgar Allan Poe Including Essays on Poetry. Edited by John Henry Ingram, 2003. Project Gutenberg, https://www.gutenberg.org/ebooks/10031.
Text Analysis » Tooling Up for Digital Humanities. 2 Mar. 2017, https://web.archive.org/web/20170302102313/http://toolingup.stanford.edu/?page_id=981.
“When OCR Goes Bad: Google’s Ngram Viewer & The F-Word.” Search Engine Land, 19 Dec. 2010, https://searchengineland.com/when-ocr-goes-bad-googles-ngram-viewer-the-f-word-59181.