Monthly Archives: October 2014

Week Eleven Reading Blog: Does Advancing Digital Scholarship Mean that the Book has to Die?

The “book is dead.” This rather stunning statement by Tim Hitchcock in his Journal of Digital Humanities article titled “Academic History Writing and its Disconnects” did not sit well with me.  As a lead-in argument to a host of online readings in support of born-online digital scholarship and open peer review, Hitchcock’s assertion seemed oddly out of place and rankled me to no end. Granted, the “digital turn” is akin to an industrial-revolution-style sea change in how historians research, write, and present history, and we have an obligation to get on board — like it or not. However, the book, a roughly 1,200-year-old means of communicating causal and critical thinking in nearly all languages, still has relevance. Why does the advent of digital scholarship presume the death of a world-wide literary form that has proven effective for centuries and continues to do so in the present?  Digital scholarship can and should supplement the book form, be it digital or in hard copy, in as many ways as possible without shutting down the most effective means in human history of communicating ideas and arguments. I have yet to visit a digital-history Web site that can advance effectively and concisely, using randomized hyperlinks, visualizations, and the like, an argument that demonstrates critical and causal thinking.  William G. Thomas III mused openly about these challenges in posing questions about, and providing some recommended solutions for, the challenges inherent in presenting a cogent argument  in the form of a born-digital online article. Thus, in my mind, rumors of the book’s death as a form of advancing ideas and arguments have been greatly exaggerated.

Many of the authors in this week’s readings advocate very strongly for open peer review and the academy’s recognition of digital history as a legitimate form of historical scholarship. These arguments resonate strongly with me, because so many of these new digital tools, tools that take most of us (especially me!) outside our comfort zones, can clearly do so much to advance our knowledge of the past, particularly in the visual realm.  In particular, the “social interaction” stemming from online peer reviews as discussed by Alex Sayf Cummings and Jonathan Jarrett is a great enhancement to the often close-minded peer-review processes that scholars face today.  Most importantly, these technological advances allow  historians to reach a wider audience online, especially for those scholars and general readers who feel that by commenting on a draft digital article, they can help advance the argument and thus have some “skin in the game” — somewhat like the folks who contribute to Wikipedia but in a more substantive way. The success enjoyed by Melissa Terras’s use of social media to publicize her journal articles is a case in point. The spike in downloads after blogging about her articles was amazing and a testimony to her ingenuity.

Open peer review as a working concept has a great deal of merit. Even blogging and its attendant informality can do more to advance an article’s argument than the closed-off, formalized peer reviews that most journal articles employ as scholarly filters today. I have never been a fan of anonymous peer reviews, mainly because they allow the nameless reviewers to take cheap shots or to grandstand their own theories at the expense of another person’s scholarship.  I have published two books with peer-reviewed academic presses and one journal article, and I found the peer-review processes in both cases to be useful in some instances and just plain aggravating in others. The most useful feedback emphasized tangible ways to clarify an argument or a certain point in the body of the work. The least useful feedback came in the form of something like “why your book or article should conform more to my own scholarship in that same area.” Thankfully, most of my editors were always quick to see through those “grandstanders” and dismiss their criticisms as “off the mark.” But those anonymous comments always left me feeling that some fellow scholars tended to behave too jealously or guardedly toward new scholarship in their respective fields. In one case, I pulled an article from a well-known, peer-reviewed journal because the feedback by one reviewer was unbelievably mean-spirited and grossly unfounded. The feedback had almost nothing to do with the subject of my article but everything to do with that reviewer’s own ideas about a tangential aspect of the topic.  Ultimately, I withdrew the article when the editor told me that I should “consider” revising the article completely (!) to satisfy that particular unnamed reviewer’s  comments. Clearly, the editor was afraid to run afoul of that reviewer; and, as a result, I walked. Thanks, but no thanks. And because the reviewer was anonymous, I could not interact with him or her to learn the true motives behind the comments. Thus, I  tend to agree with Tim Hitchcock’s implied message that peer-filtered journals and books, by virtue of the very formats they employ and that they protect ruthlessly, can represent a type of “fascist authority” unto themselves. Open, attributed, online peer reviews, if patrolled properly, can be very liberating and eminently useful. And the historian has the ability to interact with the reviewer — a big plus for me.

Although I am clearly advocating for open scholarship online, I am most concerned with the proprietary aspects of one’s work and its overall permanence online.  Edward Ayers is correct that producing digital scholarship is a risky venture in the online world of hackers and cyber-attackers. Since I have two published books on the shelves that represent decades of research, travel, and writing (and lots of dollars tied up in those efforts), I feel less inclined to risk losing my proprietary rights to that scholarship simply to fulfill an altruistic impulse to get on board with the digital world and make everything available online. Frankly, a well-packaged, copyrighted monograph produced by an academic publishing house is akin (in my mind) to a safety deposit box for one’s scholarship. The publisher copyrights it for the author, obtains an ISBN number, catalogs it with the Library of Congress, and so on. Thus, my rights as an author and the owner of intellectual property are, at least in theory, secure — even with the one version of my book that exists digitally for Kindle users.  And a hard-copy version on the shelves further gives me that sense of permanence, a fact that Cummings and Jarrett underscore when they write that nothing is “safely online in the long-term.” Even their assurances that not much is lost, either, don’t make me feel much better.  While William Thomas sees virtue in exposing the inner workings of one’s scholarship online, I see some risk. I prefer that a strong first draft appear online for peer review after the historian has done much of his or her work offline; and, likewise, I prefer a way to upload a finished product somewhat permanently, well after the online peer reviews have helped generate a final product. If only scraps and bits of hard-won research information appear online without some safeguarding of the scholarship, then Ayers’s fears will be realized: historians won’t risk it. Thus, I think that Alex Galarza and company’s successful 2012 appeal to the American Historical Association for some guidelines regarding the value of, and attribution for, digital scholarship in the context of both the Academy and for tenure-track credit is extremely important.

But in spite of my eagerness for online peer review and digital history more broadly, I think that the book as a form through which to communicate ideas and arguments does not need to disappear into the stratosphere.  Books can simply take on new forms in a digital world that allows them to expand their impact and ability to reach — and draw in  — a much larger and interested audience. Electronic (or digital) books like those developed for Kindle readers are a good start, but the basic linear form of organizing an argument and presenting it with all its attendant evidence has no substitute — at least for the moment. I’m open to new ways of re-imagining the book in the digital world, but I think we have to use the book format as the basic building block of any attempt to present an original historical argument online. If we don’t teach new historians how to use that form now, how can they hope to make sense of the thousands of tomes that exist in the world now and upon which these very same historians will have to rely as secondary sources well into the future?

Steve Rusiecki

Week Ten Reading Blog: Gaming the System

“Our consciousness of the past is inextricably bound by pictures.”  I concur fully with Joshua Brown’s assertion about the inherent visual quality of history and the impact that historical visualization has on our understanding of the past.  In my experience, the power of history not only rests in well-crafted prose that critically examines historical themes and events but also in one’s ability to experience that history visually and through other senses.  Thus, I appreciate fully the digitally immersive, highly visual quality of Brown’s The Lost Museum Web site and his historiographical argument for why and how people of the past tended to resist such visual innovations in the media of their day.  That same resistance seems ever-present today as we negotiate the new era of the “digital turn.” Brown’s argument is most convincing to me, though, in the context of understanding the inherent resistance people often bring to new visualizations of the past, but I don’t see how his argument supports the core subject of this week’s readings, Gaming, or addresses my own skepticism about this form of digitally visual history.

The article by Laura Zucconi, Ethan Watral, Hannah Ueno, and Lisa Rosner about the development of their interactive historical game, Pox and the City, gets to the heart of my concerns regarding digital games as a viable means of teaching the meaning behind historical events.  Granted, as the authors assert, games “work best when they are visually stimulating,” but the visualizations and attendant algorithms that players must negotiate don’t seem to me to be the best way to do history — at least history that has any true meaning. My deeply entrenched skepticism on this topic stems from watching over time my own son’s addictive interplay with video and digital games starting in the late 1980s. He played a variety of historically themed games that I found to be amusing and, in some cases, quite accurate in terms of period dress, equipment, and architecture. Specific games that I recall him playing were computer-based versions called Civilization (the same one by Sid Meier that Adam Chapman analyzes, I believe) and The Sims or console-based fantasy games with historical components, such as Zelda.  But he always seemed most interested in mastering the algorithms that allowed him to proceed to the next level, a behavior in keeping with Zucconi and company’s statement that “games work best when they are open-ended, allowing players a set of choices without pre-determined outcomes.”  But in seeking to master these algorithms, my son often made anachronistic choices that essentially created counter-factual history, “a kind of historical fiction rather than historical fact,” as the authors readily admit. When I told my son that certain choices he made digitally would not have been viable or possible in the game’s actual historical setting, he simply stated, “Well, that’s how it is here.” Thus, I was never quite certain that he learned the proper lessons from his digital immersion in these historical games. I would like to believe that he took away some important themes from these games, such as social lessons on how people lived in the past and the limitations of their existence. But, in the main, he seemed to treat these historical games more as problem-solving tutorials (not really a bad thing) rather than as instructive historical mediums. And, worst of all, he grew up hating history. Go figure.

In many ways, the games my son played in the past and contemporary examples such as Pox and City have more in common with modeling simulations like Elijah Meeks’s and Karl Grossner’s ORBIS than they do with Brown’s The Lost Museum.  Simulations can help us predict what might happen using historical data as the basis for their construction. Thus, they have their own utility in this regard. But I think that players can get something more out of  games if the scenarios are coded to avoid or eliminate anachronistic choices by the players; in other words, if the available algorithm steers them toward proper (my code word for “realistic”) historical situations, then the game can have greater value.

Adam Chapman’s approach is probably to best way to assess the historical utility of these games. He contends that if we are to understand video games as a digital “form” that conveys a type of historical meaning rather than as one that captures “content” with historical fidelity, then we must examine them in the context of the video-game medium itself.  Like film, as Chapman explains, games can “function as a mode of historical expression” if we choose to view them in the context of the medium within which they exist and, perhaps, as their own analytical category.  I agree with Chapman that these video games “are, like all histories, mimetic cultural products”; but, like all things that people construct, even historical monographs, they have to be considered effective only in the context of their capabilities and limitations. But I depart with Chapman in his contention that a game like Sid Meier’s Civilizationis history [emphasis added] [simply] because it is a text that allows playful engagement with, connects to and produces discourse about the past.”  I agree that playing can equal learning in some settings, but when someone “flips the mental switch” from the learning mode to the entertainment mode, I think more is lost than gained. And, yes, I concede Trevor Owens’s point that games reach a broader audience. However, I disagree with Owens that games are “particularly good” at “articulating causal models for why something turned out the way it did.” Those causal models are most likely carefully programmed algorithms that may not make sense in the real world.

Ultimately, I think games can be one way to allow an interested public to explore history in both a visual and an immersive sense — as long as we supplement those games with other effective forms of historical discourse, such as interactive maps; qualitative and quantitative models; high-resolution photographs and lithographs; and, yes, dare I say it, monographs. I see more power in many forms working in unison with each other rather than privileging one form, such as games, over all others.

Steve Rusiecki

Week Eight Practicum Blog: Omeka is Very Nifty!

I must admit that Omeka proved to be a very engaging, user-friendly tool for assembling a variety of digital artifacts into a reasonably clear storyline. In keeping with the topic of my dissertation, D-Day, I selected jpeg images of two D-Day-related primary-source documents and four photographs to provide a brief storyline for the events on that day. Uploading each item and completing the meta-data fields was a breeze, even though I was not certain what information to include in certain fields for selected items. In those cases, I just defaulted to “D-Day, 6 June 1944.”

Setting up the site’s appearance from the Omeka dashboard was not a problem, but I did not really like the few available appearance themes. I would have preferred more information about how to customize the appearance of my public site.  I relied heavily on Omeka’s online how-to page (http://info.omeka.net) in order to build my exhibit and upload items, but I was not able to find anything in the guide that explained how I could revise the site’s appearance.

The biggest problem I encountered with creating the page was developing an exhibit after I had entered my six items. Again, the Omeka how-to page was not very effective in describing how to proceed. I was not able to find a direct pathway to creating an exhibit through the various tabs and menus. In the end, I managed to assemble my exhibit mostly through dumb luck and by “test-clicking” the various tabs on the site.  I found the page that allowed me to choose the “gallery” option for my items, which is how I had wanted to portray the images on the public site. My intent was to present the user with a grouping of panels in a specific order that would facilitate a sequential, linear narrative of D-Day; and, somehow, I was able to do it.  Unfortunately, the exhibit did not “pop up” immediately after clicking on my public site’s link. Instead, the gallery exhibit was buried under the “browse exhibits” link. But, when I clicked on that link, the gallery I wanted to see was there. So much for small victories!

I was impressed by Omeka’s ability to sustain the overall quality of the images, even after uploading. Some of the jpeg images were small, and I was afraid of significant distortion once Omeka converted them into thumbnails that, when clicked upon, would increase in size. But, alas, my fears went unrealized, and the storyboard effect of the gallery was all the more appealing because of each image’s crispness.

The best layout for the kinds of stories I would tell on Omeka is the gallery version. A visitor to the site gets an immediate sense of the storyline’s scope and scale by seeing in one collage all the images associated with the exhibit. He or she can then follow the storyline by clicking sequentially on each image and reading the description from the meta-data page or jump to those images the visitor finds most interesting.

Overall, I enjoyed my first experience with Omeka. Often, I find myself blogging about numerous problems I’ve encountered negotiating different digital tools, mainly because I seem to lack that “computer sense” so many others seem to have. But, in this case, I spent to bulk of my time actually creating something with Omeka that I enjoyed doing and that  visitors to the site with an interest in World War Two might enjoy. Thus, this blog will probably be my shortest one of the semester, since I invested the balance of my practicum time in using the digital tool in question.  The modest product of my humble effort with Omeka appears at the following Web link:

http://ranger394.net/Omeka/exhibits/show/d-day–6-june-1944/d-day-in-documents-and-photogr

Steve Rusiecki

 

 

Week Nine Reading Blog: Crowdsourcing Might Get a Little Cramped

Of all the readings we’ve done this semester, I’m most ambivalent about the topic at hand, Crowdsourcing History, and its potential impact, both positive and negative, on the quality of digital history.  In my more than 30 years in the Army, I’ve had to conform to a very useful, productive approach to problem-solving and product development, in which teamwork  and collaboration proved essential in most cases. But I’ve also seen aberrations of this same approach, which I have pejoratively deemed “the group-think,” scuttle many efforts and produce less-than-optimum outcomes. In other words, too many “fingers in the pot,” and too many “chiefs” attached to those fingers, have resulted in potentially excellent results watered down to mediocrity in order to appease some larger group. In turn, these products, when presented to senior leaders for a decision, often obscured self-imposed disadvantages in favor of the fact that everyone had a hand in its creation and “could live” with the final outcome. I don’t want to belittle the power of collaboration in achieving excellent results, but I also want to ensure that the outcomes produced by singular personalities adhering to equally rigid standards don’t get kicked to the curb in favor of a particular methodology for producing digital history.

Roy Rosenzweig’s article on Wikipedia cuts to the core of my concerns, despite his enthusiasm for a medium that allows for the construction of historical knowledge based on a lot of “fingers in the pot.”  Although I find Wikipedia to be a useful tool, and I fully support ‘democratized’ history online, I don’t share Rosenzweig’s complete enthusiasm for the site.  The question that haunts me most is: If anyone can add to or revise historical entries on Wikipedia with the same  authority as a trained historian, then why have trained historians? Rosenzweig specifically takes aim at the fact that current historical scholarship is “characterized by possessive individualism” (Rosenzweig, 117).  Frankly, I don’t see a problem with a historian whose life’s work is defined by a specific historical topic, genre, or event. Without experts in specific subjects, who can police a crowdsourced site like Wikipedia effectively in order to strip out the errors and blatant misinterpretations?  Rosenzweig’s contention that most facts on Wikipedia, according to his own sampling, are generally correct and that Wikipedia’s real virtue rests in its capability for public revision are good points.   And yes, Wikipedia is an online encyclopedia that is not intended to replace individual historical scholarship; in fact,  I think that Wikipedia’s prohibition on original historical scholarship making its way onto the site is a pipe dream. I’ve seen it happening already on certain topic entries for which I have specific expertise, such as the Battle of the Bulge during World War II. Overall, I think we need to view how collaborative historical scholarship occurs online in a very specific way. We still need the “possessive individualism” that comes with a scholar taking ownership for his or her historical work and then using that special background to help ensure that others “get it right” online — or at least to make sure that those who do add their two cents do so in an informed manner.  We need that individual historical expertise out there to advocate both for the academy and for democratizing history online in order to sustain the balance necessary to doing good history, regardless of the medium.

But one theme surfaced in the readings regarding Wikipedia that left me flatfooted: the demographics of the site’s contributors. Perhaps the most puzzling aspect of Wikipedia’s participatory demographic is the fact that it contains mostly educated white men from the West. According to Leslie Madsen-Brooks, a 2011 study identified only 8.5 percent of the site’s editors as women. By contrast, she states, most users of genealogical Web sites (around 65 percent)are women.  These statistics puzzle me. I’m not sure why such a gender gap exists on these two sites. Do these gaps suggest that American (or perhaps Western) society have quietly, through overt practice,  accepted strictly defined gender roles when preserving some forms of history, such as women as the keepers of the family history? Does Wikipedia attract more men because men feel that they want to safeguard a historical record that perpetuates a certain gender-defined image of themselves — for personal or political reasons? Madsen-Brooks doesn’t seem to have an answer, and neither do I. Frankly, this whole discussion surprised me and has left me hanging. I’m stumped.

I am most enthusiastic, though, about the kind of “crowdsourcing” that Trevor Owens, Tim Causer, Justin Tonra, and Valerie Wallace describe. The fastest route to democratizing history through the digital medium is to get the primary sources out there on the Web as effectively, accurately, and efficiently as possible. I think the efforts to involve the public in this enterprise are on target, particularly in the case of the Jeremy Bentham archive. Causer and company, while outlining the pitfalls and frustrations inherent in getting amateurs and volunteers to transcribe primary sources into digitized text, are employing perhaps the best method for making the Bentham archive available. I happen to own hundreds of original Civil War letters (my hobby is collecting original World War II and Civil War documents), and I’ve transcribed all of them into Word documents. The process is tedious and aggravating, especially when the handwriting is poor or the text is faded.  I routinely had a fellow historian check my work to make corrections or to reinterpret some of the more difficult handwriting issues. But the fact that I had already done the “heavy lifting” by transcribing the bulk of the document with reasonable accuracy allowed him to concentrate on  those finer points requiring correction.

Granted, as Causer and company point out, the quality-control process did not really “[quicken] the pace of transcription” (Causer et al, 130). But what’s more important? Getting it right the first time, or getting it quickly? My vote is for getting it right the first time. Sheila A. Brennan and T. Mills Kelly faced this fact when they realized that, much to their chagrin, they had to allow time for Hurricane Katrina survivors to heal emotionally for a year or two before they (the survivors) could provide testimony that was of a high quality and useful. Thus, fast is not always good,  and “crowdsourcing,”  while effective, may not equal speed of output. Most importantly,  getting the public to help in producing digitized archives not only empowers the average person with the ability to help preserve for all time the intellectual fruits of his or her past, but it also helps to “democratize” history by making members of the public ‘intimate” with the primary sources. What better way to energize the public’s imagination and awareness of history than through the collective preservation of its treasured sources? I’m all-in for this type of crowdsourcing effort.

Steve Rusiecki

Week Seven Practicum Blog: Google-Mapping the Civil War

My first foray into the world of the Google Map Engine was very informative and productive. For this exercise, I captured, using single-point graphics, the various locations where one Civil War soldier, the oddly named Consider Flowers, and his regiment, the 1st Michigan Cavalry, traveled during the Civil War.  Typically, much of the regiment’s activities were characterized by simple movements from one place to another in an effort to locate and engage the Confederate forces or to conduct raids along major logistical lines of communication (main roads, waterways, etc.).  In some cases, the regiment fought pitched battles in places such as Winchester and along the Totopotomoy Creek during the years 1864 and 1865, years which defined the time limits of this exercise.

Overall, I found the Google Map Engine to be a very user-friendly tool. The speed of the zoom-in and zoom-out features was remarkable and allowed me to double-check my search locations to ensure that I was not adding to the map similarly named locations in different parts of the country. I chose as my base map the colorized, highly detailed terrain map that depicts foliage and elevations clearly. As a former infantryman, I am highly sensitive to the importance of these features on a map and how they impact an army’s ability to move quickly and efficiently across the battlefield space. Thus, I think the full impact of this seemingly ubiquitous cavalry regiment’s travels become much starker when considered in the context of the difficult terrain the troopers had to traverse, often at great speed.

In order to distinguish between events occurring in several of the same places during both 1864 and 1865, I selected one layer each for the two years in question. I opted to use named balloon icons to represent the general movement locations; and, for the first layer, I allowed the Google Map Engine to select varying colors for each icon.  As I progressed in  my data entry, I grew to dislike these auto-generated colors, but I left them in place anyway. They were too light in shade to stand out effectively, but I wanted to retain them in order to compare a different color scheme I planned to use for the second layer.  The only time I changed the balloon icon was to represent locations where I knew from memory or from the available data that a battle or skirmish had actually occurred (I’m sure I missed a couple, though). In these cases, I selected an icon indicating, for lack of a better term, an “explosion.” I tested yellow as the “explosion” color, but it did not stand out as well as the shade of blue that I also tested. Thus, I went with the blue.

For my second layer, I used the same approach for the icons; but, in this case, I colored the balloon icons “red.”  When compared on the map simultaneously with the icons from the first layer, the differences jumped out easily.  The only icons I changed were for two events: the surrender at Appomattox and the Grand Review in Washington, DC. For the Appomattox surrender, I selected a horse icon, mainly because of the image I have of General Lee riding off with great dignity following the ceremony. I used a “sun” icon to portray the Grand Review in DC in May 1865, primarily as a metaphor for the postwar “dawn of a new day.”  I would have preferred the ability to use flag or soldier icons to portray the activities of the different armies, but I made due with what the program offered.

Many of the locations listed in the regiment’s history had no accompanying references on the online map. Unsurprisingly, locally named references to certain crossroads, ferry crossings, and the like have long since vanished. Therefore, in order to compensate for the “fuzziness” of my data, I used Google to pull up sites discussing the Civil War and, using online maps (and even Mapquest in one case), I located the 1864 and 1865 locations and then identified the closest named town or road intersection that was still identifiable on the Google base map. I then used that spot as the closest reference to the Civil War-era location and then, after adding it to the map,  renamed it for the wartime location.  I tried to get as close as possible to the original location, and I refused to settle on something that was more than three or four hundred meters off.  For example, the map engine did not recognize Milford, Virginia, but my Google research showed me that it was located within around four hundred meters of Bowling Green, Virginia. The map engine recognized Bowling Green, so I used that balloon icon to represent Milford.  In some cases, I never found a reference to a wartime location in the context of a map that would correlate with today’s Google Map. For example, I never managed to locate Mallory’s Crossroads or Jones’s Bridge.

Actually, I know a great deal about the Civil War and record-keeping during that period. Yes, they generated a lot of paper, but many of the journal entries for troop locations were nothing more than a “swag”  at best — unless it was a large town or city like Winchester or Richmond.  Maps were scarce and seldom accurate. Local people tended to tell the troops where they were (and not always precisely), and the soldiers recorded these names as ground truth. Few soldiers had ever ventured far from home before the war and did not know intimately the geography of Virginia or other parts of the country. Thus, my efforts to reconcile the “fuzziness” of my data may have either clarified or compounded the inherent “fuzziness” of the primary-source data.

In any case, the Google Map Engine proved to be phenomenal tool.  The point balloon icons worked well, and the labeling application was very easy to populate and save. I experimented with linking locations using lines, but those lines only seemed to confuse the visual depiction.  Overall, I enjoyed using this tool, and I plan to use it again in the future.

Steve Rusiecki

 

Week Eight Reading Blog: The Digital Face of Public History

The status of public (or popular) history when compared to academic history has always intrigued me.  At what point does the academy accept history generated by the non-academician?  Or, for that matter, history produced for a non-academic audience? I think the question posed by Carl Smith in his 1998 article “Can You Do Serious History on the Web?” cuts to the heart of this debate.   If the Web is open to all comers, then Smith’s question suggests that such a rift between  academic, or “serious,” history exists. Thus, if “serious” history is academic, as Smith implies, then public or popular history is the exact opposite — “unserious.” Given what the readings for this week have described, I think history developed for, and presented on, the Web for a broader public can be just as (if not more) “serious” than what the academy produces.  And, in many ways, that Web-based history can touch many more lives and influence the present more dramatically than if historical debates remained the exclusive domain of the academy’s so-called “ivory tower.”

For me, history serves a purpose. I craft the military history I write for a specific audience — soldiers and, yes, the general public. The experiential lessons gleaned from past conflicts consistently inform our application of military power in the present and future.  Moreover, the public’s understanding of warfare further develops a broader appreciation for the sacrifices men and women have made on behalf of the nation over time. But in a broader sense, history helps us stake out a way ahead and, ideally, prevents us from repeating the same mistakes over and over again.  Thus, only by making history more universal can we hope to fulfill such an ambitious charter. Thankfully, the rapidly developing digital world is pushing us inextricably toward this very goal.

Carl Smith’s discussion about the online project he curated, The Great Chicago Fire and the Web of Memory, provides an excellent example of an academically  managed, digitally constructed site of historical knowledge targeted toward a wider, non-academic audience. The beauty of this public Web site is that it remains visually immersive without losing the  authority of the trained historian.  The site’s various snippets of carefully crafted, tightly packaged  historical narratives, all based on reams of primary-source material, material that the average user can also browse and evaluate,  lends a remarkable power to the site. And yes, the narrative is there if you want to follow it (an important feature for me in particular). I appreciate the fact that Smith recognized the limitless capacity of his digital medium; he included and then managed over 300 different pages on the site, each with its own wealth of material in the form of facsimile representations of original documents, lithographs, and photographs.  The analog world would never let him get away with something so cost-intensive.  In my view, Smith has given the public more than it needs and can possibly hope to absorb; but, in doing so, he has increased dramatically the possibility that something on that site will appeal to a more robust audience of varied tastes and interests.   And in appealing to that broader public, someone is likely to take away a powerful lesson about what the Great Chicago Fire means to us today and how its memory can influence that city, and other cities, in the future.

Most importantly, public history on the Web enables powerfully immersive visual and sensory experiences that have largely been missing from the history I tend to experience. I agree with Mark Tebeau that the voice of someone who lived through a historical event describing what happened and what it meant to him or her is powerful. Such voices, he rightly contends, “call forth memory, time, and context” (Tebeau, 28).  Can a monograph achieve that end? Perhaps … in some cases. But the experience is not the same. Even the virtual tours of American heritage sites like Monticello that Anne Lindsay described can create a sensory experience of history that the printed monograph cannot achieve.

I often recall the ridiculously cerebral character in the 1984 movie Ghostbusters, Egon Spengler (played by the late actor Harold Ramis), responding to another actor’s question about what books he liked to read. Egon simply deadpanned the following line: “Print is dead.”  Granted, that remark was written for comic relief. That statement has haunted me for 30 years, because I’m someone who loves to immerse himself in the power of the written word. But that statement also made me aware that  not everyone experiences the written word in the same way and that historians can and should pursue other possibilities for an immersive historical experience to enhance the power of history. In other words, in keeping with Egon’s assertion, we need to find ways to bring “print,” and history, to life in order to immerse people in the experiences of the past. Today, the digital world allows us to achieve much of that goal. One example is how today’s digital world allows us to experience the past visually and aurally, a capability that makes  history all the more compelling to many people.

Bruce Wyman, Scott Smith, Daniel Myers, and Michael Godfrey argue collectively that “people are becoming different types of learners” and require new ways to experience history (Wyman et al, 462). I agree fully. The more we can do to immerse an interested public  directly into a multi-sensory historical experience, the more that that history will mean to them in the context of their own lives. For my first book, I walked the very ground in Belgium where the battle I was researching took place, and I had the ability to interview scores of surviving veterans from both sides about their experiences in that same battle.  These remarkably immersive experiences were life-changing for me, but such opportunities faded quickly as the veterans passed on and as former battlefields became private property. Such experiences are rare for historians today — and even rarer for the interested public. But, as Wyman and company have testified, museums are a great place to leverage the emerging sensory capabilities of the digital world in order to replicate for future generations what I experienced in the early 1980s researching that book. Wyman and company’s strategic thoughts for an immersive, interactive historical experience in a museum are on target. In fact, the most important guideline they proffer, in my opinion, is to “[r]emove barriers to content and experience” (Wyman et al, 467). The average museum visitor should not need a PhD in history or computer science in order to wade through layers of technological fanfare and dense content just to experience the past interactively. A clear goal and a consistent content approach, as the authors contend, are crucial to the success of any interactive, immersive experience.

Furthermore, Melissa Terras has described the vastly different historical interests people have exhibited based upon her analysis of the most commonly accessed digital archives at places like Oxford and Cambridge. Thus, the immersive experience must cater to these wide-ranging interests. I can see the point behind Roy Rosenzweig’s ambition to make history more democratic. History must be useful, but it has to mean something to us first so that we can use it. Thus, history must be accessible to all — not a select few. I think the digital world of today can get us there.

Steve Rusiecki

Week Six Practicum Blog: A Tangled Web of Networks

The two networking tools I used for this practicum — Palladio and RAW — were each unique in different ways. And, despite using the same Civil War data set for each one (battle and unit), I managed to glean different things from the same data by viewing it through the two different tools.

Palladio proved extremely easy to use. The drag-and-drop feature allowed me to load an Excel spreadsheet with the Battle and Unit data very quickly. The download box populated rapidly,  and I was able to produce a graph very quickly with a simple “click.” I’m not sure that the floating nature of the graph was of much help to me, though. Granted, I could expand and contract its various nodes and edges quickly, but the graph was most useful to me in its static form.

Interestingly, the Palladio graph allowed me to see very quickly that the various units represented in the data, such as the 1st Michigan Cavalry and the 136th New York Infantry, generally fought the war in the same region. The 1st Michigan Cavalry stayed principally in  northern Virginia as indicated by its visual linkage to battles such as Old Church, Winchester, Centreville, and Brentsville. By contrast, the 136th New York Infantry spent most of its time in the South, fighting at Atlanta, Chattanooga, and Stone Mountain. Yet the graph indicated that at some point, both units participated together in the battle of Gettysburg, suggesting that the “regionalization” of various Union regiments did not mean that the Army’s senior leaders could not call upon them to move and fight elsewhere. But the most significant thing I took away from the network visualization of these units and the battles in which they participated was that, for the most part, many of them fought in one general region within the United States and seldom moved from that area. Perhaps one explanation was the difficulty inherent in moving a foot-borne Army from one place to another quickly. Locomotives offered limited support, and damaged rail networks throughout the South likely complicated train traffic.

My only difficulty with Palladio was not with the program but with my ability to figure out how to import a screenshot of my graph into the body of my blog. I’m still figuring out how to do it. But, in the meantime, a pdf version of that screenshot appears at the following hyperlink: Palladio

Like Palladio, RAW was easy to use. The drag-and-drop upload feature resembled that of Palladio. The data uploaded quickly, and I was able to generate networking diagrams almost instantly using the Battle and Unit data set. I began with an Alluvial Diagram, which was very difficult to use, even after I adjusted the height and width repeatedly. The data lines were not easy to follow, but the tightly packed lines suggested, contrary to my Palladio graph,  that many units fought in some of the same battles. For example, the varying thicknesses of the bars to the left of each battle name seemed to suggest a hierarchy of common battle participation among units.  If I read it correctly, then this information was more useful than the Palladio graph, which did not really capture those commonalities in a clear, comprehensive manner. In addition, I was not sure what the various colors assigned to each unit were telling aside from possibly serving as a visual guide to lead me to certain  units in the network graph much more efficiently. Here is the Alluvial Diagram I generated.

Aldie
2
Aldie
Atlanta
1
Atlanta
Averasboro
1
Averasboro
Aylett’s
1
Aylett’s
Bealton Station
1
Bealton Station
Beaver Dam
1
Beaver Dam
Bentonville
1
Bentonville
Berryville
1
Berryville
Bethesda Church
1
Bethesda Church
Beverly Ford
1
Beverly Ford
Brandy Station
1
Brandy Station
Brentsville
2
Brentsville
Bull Run
4
Bull Run
Cassville
1
Cassville
Cedar Creek
1
Cedar Creek
Centreville
1
Centreville
Chancellorsville
3
Chancellorsville
Charles City Courthouse
1
Charles City Courthouse
Charlestown
1
Charlestown
Chattanooga
1
Chattanooga
Cold Harbor
2
Cold Harbor
Cross Keys
2
Cross Keys
Culpepper Court House
2
Culpepper Court House
Dallas
1
Dallas
Deep Bottom
1
Deep Bottom
Dinwiddle
1
Dinwiddle
Fairfax Courthouse
1
Fairfax Courthouse
Falling Waters
1
Falling Waters
Fisher’s Hill
1
Fisher’s Hill
Five Forks
1
Five Forks
Fort Scott
1
Fort Scott
Fredericksburg
1
Fredericksburg
Front Royal
2
Front Royal
Gaines Mill
1
Gaines Mill
Gettysburg
4
Gettysburg
Grove Church
1
Grove Church
Groveton
1
Groveton
Hagerstown
1
Hagerstown
Halltown
1
Halltown
Hanover Court House
1
Hanover Court House
Harrisonburg
1
Harrisonburg
Hartwood Church
1
Hartwood Church
Hawes’s Shop
1
Hawes’s Shop
Hope Landing
1
Hope Landing
Jefferson
1
Jefferson
Jones Cross Roads
1
Jones Cross Roads
Jones’ Bridge
1
Jones’ Bridge
Kelly’s Ford
1
Kelly’s Ford
Kenesaw Mountain
1
Kenesaw Mountain
Laurel Hill
1
Laurel Hill
Leetown
1
Leetown
Liberty Mills
1
Liberty Mills
Luray
1
Luray
Malvern Hill
1
Malvern Hill
Middleburg
2
Middleburg
Middletown
2
Middletown
Milford Station
1
Milford Station
Mine Run
1
Mine Run
Monterey
1
Monterey
New Creek Station
1
New Creek Station
New Market
1
New Market
North Anna
1
North Anna
Old Church
1
Old Church
Opequon
2
Opequon
Peach tree Creek
1
Peach tree Creek
Petersburg
1
Petersburg
Picket
1
Picket
Piedmont
1
Piedmont
Piney Branch Church
1
Piney Branch Church
Poplar Springs
1
Poplar Springs
Port Republic
1
Port Republic
Prince George Court House
1
Prince George Court House
Racoon Ford
1
Racoon Ford
Rapidan
1
Rapidan
Rapidan Station
1
Rapidan Station
Rappahanock Station
3
Rappahanock Station
Resaca
1
Resaca
Richmond
1
Richmond
Robertson’s River
1
Robertson’s River
Robertson’s Tavern
1
Robertson’s Tavern
Rood’s Hill
1
Rood’s Hill
Shepherdstown
2
Shepherdstown
Smithfield
2
Smithfield
Snicker’s Gap
1
Snicker’s Gap
Stone Mountain
1
Stone Mountain
Strasburg
1
Strasburg
Todd’s Tavern
1
Todd’s Tavern
Tom’s Brook
1
Tom’s Brook
Totopotomoy
1
Totopotomoy
Trevilian Station
2
Trevilian Station
Turner’s Ferry
1
Turner’s Ferry
Upperville
1
Upperville
Wauhatchie
1
Wauhatchie
Weldon Railroad
1
Weldon Railroad
White House
1
White House
White Post
1
White Post
Wilderness
2
Wilderness
Willow Springs
1
Willow Springs
Winchester
1
Winchester
Yellow Tavern
1
Yellow Tavern
Yorktown
1
Yorktown
136th New York Infantry
14
136th New York Infantry
1st Michigan Cavalry
28
1st Michigan Cavalry
29th New York Infantry
6
29th New York Infantry
44th New York Infantry
22
44th New York Infantry
4th New York Cavalry
54
4th New York Cavalry

 

 

The next network I generated was a Circular Dendogram (whatever that is), which tended to reinforce the “regionalization” impression I got from my Palladio graph. In  this case, though, the resulting graph was much, much clearer and easier to follow.  Yet unlike Palladio, I was able to see more examples of units fighting in multiple regions.  For example, the network diagram confirmed that the 4th New York Cavalry was strictly a regionally aligned regiment while the 136th New York Infantry, as suggested by the Palladio graph, fought in northern Georgia and later at Gettysburg. But the Dendogram helped me see that the 136th also fought cross-regionally throughout northern Virginia, most notably at places like Chancellorsville.  This particular network diagram did the most to convince me that not all Union regiments were wedded to one region, further suggesting a higher degree of deployment capability and mobility than one might expect of a horse-drawn army.  Here is how my Dendogram looked:

136th New York InfantryChancellorsvilleGettysburgWauhatchieChattanoogaResacaCassvilleDallasKenesaw MountainPeach tree CreekAtlantaStone MountainAverasboroBentonvilleTurner’s Ferry1st Michigan CavalryBrentsvilleFort ScottGettysburgMontereyHagerstownFalling WatersRapidanRobertson’s RiverBrandy StationCentrevilleTodd’s TavernBeaver DamYellow TavernMilford StationHawes’s ShopOld ChurchCold HarborTrevilian StationWinchesterFront RoyalShepherdstownSmithfieldOpequonCedar CreekPicketDinwiddleFive ForksWillow Springs29th New York InfantryBull RunCross KeysGrovetonChancellorsvilleGettysburg44th New York InfantryYorktownHanover Court HouseGaines MillMalvern HillBull RunShepherdstownFredericksburgChancellorsvilleMiddleburgGettysburgRappahanock StationMine RunWildernessPiney Branch ChurchLaurel HillNorth AnnaTotopotomoyCold HarborBethesda ChurchPetersburgWeldon RailroadPoplar Springs4th New York CavalryRappahanock StationPiedmontNew Creek StationStrasburgHarrisonburgCross KeysPort RepublicNew MarketMiddletownLurayBull RunFairfax CourthouseGrove ChurchHartwood ChurchHope LandingKelly’s FordSnicker’s GapAldieMiddleburgUppervilleJones Cross RoadsCulpepper Court HouseBrentsvilleRacoon FordRapidan StationBeverly FordBealton StationRobertson’s TavernRichmondAylett’sWildernessTrevilian StationWhite HouseJones’ BridgeCharles City CourthousePrince George Court HouseDeep BottomWhite PostBerryvilleFront RoyalCharlestownHalltownSmithfieldLeetownOpequonFisher’s HillTom’s BrookRood’s HillLiberty MillsJefferson

 

Overall, I found both tools somewhat useful in providing clues to the regional mobility of various Union regiments during the Civil War.  But I’m not certain that these tools told me something that I couldn’t have gleaned from the Excel spreadsheet. Frankly, the hours of data construction that went into the spreadsheet was the real work. Uploading it and creating the network graphs was a breeze. The visualizations were certainly intriguing; but, at some point, my guess is that the data compiler could have come to the same conclusions about mobility and regionalization without graphically representing the data. Frankly, I prefer the the visualization and its impact, an impact made all the more effective by the general ease of use involved in reading the results of both programs.

 

Steve Rusiecki

 

Week Seven Reading Blog: ‘Historicizing’ Geography is Long Overdue

Perhaps no other visualization technique in the realm of digital history excites me more than the ability to link an event to the place where it happened — in both a geographic and a geo-digital sense. My previous blogs have allowed me to articulate the importance I place on history as a highly visual academic endeavor. History happened to real people living or traversing real places, and I think the static representations in hard-copy books (maps, still photographs, etc.) have finally gone the way of the Dodo bird thanks to digital mapping and modeling resources such as Google maps and ORBIS. Showing where history happened in a spatial sense — and linking large corpuses of data to specific geographical locations as described by Tim Hitchcock and Stephen Robertson — will not only lead to expanded historical knowledge but will also allow the average person to interact with data represented on a map, to recognize spatial patterns from that interaction, and to replay events digitally (at varying levels of fidelity) in order to take away useful lessons that may influence the present and future.
Needless to say, much of what I am discussing is in the context of maps used in military history. War and its attendant battles all happened in both time and space. The principal object of war, according to Clausewitz and almost anyone who has experienced war, is to destroy an enemy army at a specific geographical location at some point in time. Thus, the only way to communicate the importance of a particular battle is to show how it unfolded on the very terrain where it occurred. The readings opened up mapping possibilities for military history that exceeded my wildest expectations. The ability to use programs like GIS to link a textual analysis or narrative of an event directly to the place where it happened, as suggested by Hitchcock, is incredible.
Not surprisingly, and since my other blogs have testified to D-Day as my current subject of historical inquiry, the first thought that came to mind was a digital representation of the Allied landing on Omaha Beach on 6 June 1944. Historians are still debating today how the American forces managed to get off the beach under such strong German opposition and chaotic conditions. I can’t help but imagine a mapping approach that mimics Hitchcock’s and Robertson’s use of GIS and Google maps as a way to portray geographically the individual squad and platoon actions that carried the day on Omaha Beach. While Robertson was able to conflate both data and location to reconstruct a Harlem street corner’s inherent social make-up and frictions, the same could occur with a battle map of Omaha Beach. An interested user could click on any part of the beach at any point in the landings to identify the landing unit, its casualties, and its actions on the beach. In this way, the user could scroll through the entire map, select what he or she considers key terrain, and match the Allied advance on the beach to specific units, perhaps developing a fresh perspective on how and why the Allies prevailed that day. Did individual small-unit actions really carry the day? Did the planning and rehearsals for the invasion really coalesce on the beach as intended to allow for a tactical success by day’s end? Digital mapping can add a new perspective to address those questions. I think this interplay of digital mapping and data gets to core of what Edward L. Ayers and Scott Nesbit see as “deep contingency,” which for them is “inherently spatial” (Ayers and Nesbit, 9-10). In other words, true agency is most evident when engaging not just space but scale. For example, small-unit actions on Omaha Beach, when considered in the context of the soldiers’ collective or individual agency, did not occur in separate “silos” but potentially complemented the actions of others along that mile-long beach, resulting in a known outcome but not necessarily a clear understanding of how that outcome came to be. Just imagining how I could have applied these tools to the maps from my own two books intrigues me to no end.
I am less taken with modeling programs like Elijah Meeks’s and Karl Grossner’s ORBIS, which, according to reviewer Stuart Dunn, aimed “to model the costs and times of travel between different points within the Roman Empire, over land or by sea or river.” Even though ORBIS, according to Meeks, is wildly popular and well grounded in historically accurate data, it seems like an exercise in counterfactual history. For example, someone could use it to test the cost and travel times from a randomly selected Point A to a randomly selected Point B within the Roman Empire just to see what the program might spit out. Granted, the results may inform our understanding of travel limitations in the Roman Empire in a broader sense. But the problem for me is that that particular journey may never have happened. Thus, it becomes less history and more a simulation of what might have been. I agree with Dunn that programs like ORBIS, when placed in the context of history, push historians into “perilous territory.” Does something like ORBIS make us historians or prognosticators? Or, worse, do we become revisionists based on what might have been and not what really happened? I readily acknowledge that simulation modeling can be very effective. The U.S. Army has been using it for decades, specifically with programs configured with known weapons capabilities, predictive doctrinal behaviors of potential enemies, and so on. These programs help to create what might happen, not what happened; and, most importantly, they are great training tools. In any case, I am happy to know that ORBIS is popular, because any interest in history is a great thing to me. But Dunn is equally on target when he states his concern that future works of history, as articulated through the prism of a program like ORBIS, might play principally to a “populist rather than academic” audience. I think what Dunn really means is that history might become nothing more than a game to be reworked and revised in hindsight, subject to the influence of ahistorical, man-made algorithms that appeal mostly to the “gamers” of society.
In the main, I am very impressed with the potential that geospatial mapping offers to our understanding of historical events. The examples provided by Hitchcock and Robertson are incredibly tantalizing, but we have to keep in mind that just because the results are generated digitally, our analysis of those results may not keep pace with the outputs. Robertson admitted to spending six years studying Harlem through the prism of these geospatial tools in order to recover the everyday lives of black people living there in the early 20th Century. In my mind, the most worthwhile things are seldom ever easy to produce, and I would strongly consider investing six years in a data-linked, geospatial representation of Omaha Beach, particularly if that representation could teach our junior leaders in the Army today the critical importance of their individual decisions and actions on the wider battlefield. That level of “deep contingency” in the context of a spatially defined battlefield matters most to me — because I’ve experienced it personally.

Steve Rusiecki

Week Five Practicum Blog: The Power of Text Mining

My initial foray into text mining with Google’s Ngram Viewer proved rather exciting. For the first time, I was able to generate a highly useful visualization of one topic’s frequency over a period of time from a large online corpus of information — Google books. I appreciated the program’s accessibility and ease of use for the average user. Most importantly, the distribution graph that the viewer generated was easy to interpret and made sense at first glance.

I selected as my search topic the phrase “invasion of Europe” for the 100 years between 1900 and 2000. I chose English as the preferred language of my book corpus and “3″ as the smoothing feature. The viewer instantly generated an easy-to-follow distribution graph that clearly showed the expected spike in frequency for the phrase “invasion of Europe”between the years 1940 to 1944. The values on the left showed an initial low frequency of use  followed by a remarkable five-fold jump during the World War II years (1940 to 1944) as expected. The viewer even tracked two variations of the phrase — one with the word “Invasion” capitalized and the other with all letters in the phrase capitalized. Each of these variations had a separate graph that was well below the frequency of my initial search feature, most likely because the variations captured by the Ngram Viewer highlighted the less frequent use of the phrase as a title while my version, with the lower case “i,” suggested greater use in the content of the books themselves.

The Viewer’s most useful feature was the ability to scroll the mouse over the distribution graph and view the actual frequency numbers at specific points in time. And then below that graph, hyperlinks for year groups (such as 1940-1944) led me directly to the online documents reflected in the numbers that produced the graph. The graphing tool and the accompanying date links made sorting through the relevant and irrelevant texts rather easy.  They came up as thumbnails of the covers, which made for quick scrolling and recognition. Since my topic focuses on the invasion of Europe in the context of World War II, I was able to locate quickly original digital scans of Life magazines from 1940 to 1944 while quickly moving past “false hits,” such as a book about the 1853 Turkish invasion of Europe or a 1903 economics treatise chronicling the “commercial” invasion of Europe at that time.

Overall, this foray into text mining using the Ngram Viewer was very productive for me. The results of my first search are as follows:

Next, I sampled Bookworm using the “Chronicling America” corpus. Like the Ngram Viewer, I found this program very easy to use and navigate. Unfortunately, due to copyright issues, the corpus of newspapers in this database does not extend beyond 1920. Thus, it falls outside the time period that interests me — 1939 to 1945. And, unlike Ngram, Bookworm will work only with a single word and not a phrase, a significant limitation for me. In any case, I decided to use the term “invasion” to see what it would yield for me between the available years of 1840 to 1920.  The results are at the following link: Bookworm Chart.

Like Ngram, the distribution graph came up quickly and was easy for me to read and interpret. Remarkably, I identified several spikes along the x / y axis (books per million / publication year) for the word “invasion.” Like Ngram, I could scroll along the distribution graph and see boxes describing briefly the articles per year that represented “hits” for my text search.But what I found most useful was that I could click on the graph and go directly to the OCR version of the newspaper that registered the “hit.” Although the earlier 19th Century newspapers were difficult to read without an extreme close-up view, they all had a small arrow icon that popped up near the margin to direct me to the line or lines where the word “invasion” was mentioned. Very cool.

I decided to check the three biggest spikes (words per million) against the newspaper publication years to see what was actually happening that required journalists to use the word “invasion.”  The greatest spike was for 1840, and the context used for “invasion” centered on discussions of America’s various militias and those militias’ Constitutional role as defenders against invasion. The next spike came in 1861 in the context of the North’s invasion of the South during the Civil War, and the final big spike came in 1898 during the Spanish-American War and the U.S. invasion of Cuba.

Once again, I was very pleased with how this program functioned and with the graph it produced.  I would have found the program more useful if I could have searched with word pairs, thus perhaps narrowing my search even further. Using more than one word in Bookworm automatically creates a flat-line result on the graph. The results of my tinkering with Bookworm appear at the following hyperlink:  http://bookworm.culturomics.org/ChronAm/#?%7B%22search_limits%22%3A%5B%7B%22word%22%3A%5B%22invasion%22%5D%7D%5D%7D

The third text-mining viewer I sampled was the NYT Chronicle, which included all newspaper editions of The New York Times from just before the Civil War up through 2010. Once again, I found this software easy to use and the rapidly generated distribution graph easy to understand. And, like the Google Ngram Viewer, I could search the NYT corpus using my complete ‘phrase of choice’ — “invasion of Europe.”  The graph produced the expected spike over the war years, but the fact that the spike extended out to 1952 suggests that D-Day was a topic of discussion in the Times more than eight years after the invasion occurred.  In the context of my research,  this remarkable bit of evidence demands some scrutiny, since (as the results suggest) the invasion seemingly took on such a powerfully iconic image in the minds of all Americans that it potentially came to embody all that was good about America — sacrifice and justice in the face of an evil adversary — and thus a topic worth emphasizing to the Times’ readers well after the invasion and the war.  Granted, that assertion is a significant leap based on what essentially is a ‘distant reading’ of the texts, but the result really sparked my analytical imagination.  Text mining clearly has possibilities for my research.

In addition to the excellent distribution graph, I appreciated the scroll-over technique reminiscent of Ngram and Bookworm, but I really liked the direct link from the graph to scanned OCR versions of the newspapers’ original pages.  Additionally, the scroll-over feature on the distribution graph provided the percentage of articles per year, a feature not available in Bookworm’s “chronicling America” (as far as I could tell).  Yet I experienced the most difficulty with the program when attempting to access specific copies of newspapers and trying to save my graph results for importing into the blog. First, I was not able to figure out how to get past the pay-wall for accessing the newspapers. Granted, GMU allows such access, but I could not find a way to log in and obtain it. Next, I had a heck of a time trying to save an image of my graph. My computer’s Screenshot feature only allowed me to save the file in Microsoft Memo or some other goofy program.  Finally, I gave up and just grabbed the link as follows: http://chronicle.nytlabs.com/?keyword=abolition.invasion%20of%20Europe. It actually worked when I exited my browser (Firefox) and pasted it into another browser (Internet Explorer). Go figure.

My final experiment was with Voyant, which would not allow me to upload plain-text files  through Internet Explorer. Very frustrating.  But I took Prof. Robertson’s advice and downloaded Firefox and set it as my default browser.  After that change, everything worked like a charm. But I must admit that I found this program to be highly confusing at first blush. I uploaded the “magazine” and Oscar Wilde’s “novel” from Prof. Roberston’s Dropbox and, given that Wilde had a penchant for flowers (a small point I recalled from my English literature days), I decided to mine both documents (the novel was actually Wilde’s The Picture of Dorian Gray) for the word “rose.”  I typed “rose” into the search box below the text corpus, and it produced color-coded search results in the left margin of the magazine (upper level) and novel (lower level). But only in one or two cases did I find the word highlighted. For both documents, the Words in the Entire Corpus feature showed 38 hits for “rose” and another 22 for “roses,” a surprisingly small result given that one of the metaphors Wilde often used for the effects of a decadent lifestyle was that of the withering rose (or at least some other type of flower).

But when scrolling along the colorized search “hits” beside the text, I could not see any digital enablers (save for an occasional highlighted “rose”) that led me quickly to the “hits” for “rose.” I had to squint my way through several pages before finally finding the identified “hit.”  I thought the screen with the Summary title and its attendant word-frequency analysis for oft-used words in the two documents was extremely interesting. Likewise, the “Words in the Entire Corpus” window, which made selecting and exploring specific words at a click, was quite remarkable.  At first, the “word cloud” in the Cirrus window did not make any  sense to me until I scrolled over selected words and found the same frequency-of-use data available in the other windows I described.

Although interesting as an art form, I found the Cirrus feature’s word cloud to be a bit over the top.  The Word Trends feature was very useful; the distribution graph clearly showed that the frequency of the word “rose,” in whatever context it was used (verb or noun), trended higher in the novel than in the magazine.

Likewise, the Words in Documents feature separated the hits for “rose” by document (23 in the novel and 15 in the magazine),  a feature that would certainly figure highly in my own research.

After toying with Voyant for several hours over two days, I did not feel that it was as user-friendly as Ngram or NYT Chronicle. Frankly, I was not able to keep up with the in-class explanation of Voyant because I was too busy wrestling with the “upload” feature.  Therefore, I went online and printed a “Getting Started” guide that helped me understand what the different features in Voyant were telling me and how they functioned.  The guide proved very useful, and led me to experiment with pasting a URL for a corpus source into Voyant and seeing what happened. I linked the 1 October 1914 issue of the Arizona Republican newspaper from “Chronicling America” and hit “reveal.” The results were very good; the text populated the corpus feature very well, and my sample search for “Germans” (World War I was in full swing when this edition was published) worked nicely.

My standing concern (call it a “fear”) is that the newspaper databases that I want to use for my class project — specifically papers dating from 1940 to 1944 — are behind pay-walls, are password protected, or are filtered in some way that won’t make them accessible via this URL feature. Instead, I will have to located Web sites with the newspapers I need and “snatch” individual examples in order to create my own corpus. I find that prospect to be very daunting.  I tested this approach by grabbing three OCR versions of newspaper snippets from ProQuest’s “Historical Newspapers” database and uploaded them as pdf files. The results were poor. The text was garbled and unclear, and the various features in Voyant simply registered that gibberish.  I’m not sure that the quality of the OCR was the reason, but my confidence level in building my own newspaper corpus for Voyant is all but nil.

Ultimately, Voyant offers much more than the other text-mining programs, but I just need to learn how to use all of the viewer’s features and find ways that it can help me in my research.  Using Voyant with the databases I need seems to be the biggest challenge I have to overcome.

Steve Rusiecki

 

Week Five Discussion Assessment

                                                     Week Five Discussion Questions

1.  Describe text mining in the context of the readings. What are its possibilities for historians? What are its pitfalls?
2.  Frederick W. Gibbs and Daniel J. Cohen believe that text mining is more relevant to open-ended questions, in which “the results of queries should be seen as signposts toward further exploration rather than conclusive evidence” (Gibbs and Cohen, 74).  Explain what the authors mean by this statement.
3. Ted Underwood contends that historians must overcome two obstacles before engaging in text mining: (1) getting the data you need, and (2) getting the digital skills you need. What digital skills does Underwood feel that historians should develop?
4.  According to Cameron Blevins, literary scholar Franco Moretti developed the digital method of “distant reading.”  Describe the concept of distant reading. How is distant reading different from text mining?  How is distant reading useful for historians?
5.  Cameron Blevins argues that the promise of digital history is “to radically expand our ability to access and draw meaning from the historical record” (Blevins, 146).  Do you agree? What other possibilities might Blevins be overlooking?
6. What is “topic modeling?” How does it relate to text mining and distant reading? How is it useful for historians?
7. According to Ted Underwood, an Internet search is a form of data mining. But it is only useful if you already know what you are expecting to find.  Do you agree? What is Underwood’s remedy for seeking the unknown and the unexpected from the digital record?

Week Five Discussion Assessment

The seven questions I crafted for the Week Five Discussion certainly generated a productive discussion, much of which added clarity to the concepts of Text Mining, Distant Reading, and Topic Modeling. However, we ran down a few “rabbit holes” almost immediately due to some confusion between the concept of an Internet search and Text Mining.  The overall flow of the discussion suggested that the sequencing of the questions helped to elicit responses from the class members, but I quickly recognized that I should have re-ordered the questions to define and compare up front the digital concepts of Text Mining, Distant Reading, and Topic Modeling. In the main, the discussion flowed fairly well (albeit a bit sporadically at times), and the different perspectives on each of the three concepts helped the class members to wrestle with and refine their own understandings of these digital approaches and their potential uses for further research.

The one area that created some confusion seemed to stem from a statement that Ted Underwood made in his article “Theorizing Research Practices We Forgot to Theorize Twenty Years Ago.”  Underwood’s discussion of “[s]earch [as] a form of data mining” (Underwood, 3) called into question the real difference between a Full-Text Internet Search and Text Mining. I think we missed part of Underwood’s point and talked past the idea that a Search is radically different from the quantitative concept of Text Mining. My own narrow distinction between the two — that one mines the entire Internet while the other mines a selected corpus of material — certainly contributed to the confusion.  But as the discussion progressed, the distinction between the two became clearer. The Search simply found things while Text Mining counted things, specifically words.  This part of the discussion proved to be the most fruitful in laying the groundwork for exploring and comparing the other concepts. But I was disappointed in my own failure at steering the conversation toward a discussion of Ted Underwood’s six potential uses for text mining and what he described as the underlying theory behind Text Mining — Bayesian Statistics. Specifically, I wanted to leverage the digital expertise of the class members to explore this statistical theory in greater detail to enhance my own understanding of it.

The discussion also helped me to crystallize the distinction between a digital tool and a digital methodology.  Several members of the class rightly stated that tool and methodology were often not one and the same, but they could be in some cases. The most enlightening aspect of this discussion centered on Cameron Blevins’s article and how he used what Underwood described as “noun entity extraction” to sift meticulously from selected newspapers the frequency of various city and town names to argue for the spatial construction of specific regions. In this case, everyone agreed that Blevins had used a digital methodology to respond to his hypothesis and that the results of this approach could stand on their own as sufficient evidence to serve his greater argument.  But I think everyone agreed, even the one or two digital “resisters” among us, that Text Mining remains primarily a research tool that, according to Gibbs and Cohen, does not exclude more traditional methods such as close readings of the sources.

My efforts to explore Franco Moretti’s concept of Distant Reading did not produce the response I had intended.  My own sense of this Text-Mining derivative is that it is useful in locating macro- and micro-patterns, but I struggled to envision what some of those patterns might actually resemble in practice. I wanted to develop this part of the discussion further for my own benefit, but I sensed that several of the class members did not understand Distant Reading and did not see it as a useful approach.  Many of the class members seemed to have already assessed the potential pitfalls of Distant Reading and had decided that, as an abstract concept and in the absence of practical examples, it was not useful for their specific research needs.

I wanted to close the dialogue with a vigorous discussion of Topic Modeling  and explore how, as Underwood asserted, that it differed from Text Mining by allowing historians to locate the unknown and the unexpected from this digital record.  All of us seemed to struggle with this stated purpose for Topic Modeling. However, before we progressed to a discussion of what Underwood really meant, I got the “hook.”   Thus, this part of my discussion went unfulfilled, and the door remains open for further discussions of Topic Modeling.

Overall, I was pleased with the level of engagement by my fellow class members. I think we all struggled at times to come to grips with Text Mining and Topic Modeling, but I feel confident that we collectively advanced our understanding of what these tools are, what they can do for us, what they can’t do for us, and how they may actually work. The discussion at least provided us with a broader understanding of Text Mining before we dove headlong into the various Text-Mining programs as part of the practicum.  Lastly, I am grateful for everyone’s participation and for their indulgence in allowing a digital amateur to steer an advanced discussion of digital tools with some extremely bright and gifted historians. My thanks to all.

Steve Rusiecki