Category Archives: Reading Blog

Week Three Reading Blog: The Pitfalls of Digitally Converted Sources

If one message emanated loudly from this week’s readings on digitization, it was: “Be careful!”  Each author consistently offered a cautionary tale about the potential pitfalls of digitizing analog sources for use by historians and other scholars while simultaneously (but somewhat hesitantly) conveying enthusiasm for the vast opportunities offered by such digitization efforts. Even though Dan Cohen and Roy Rosenzweig proclaimed rather proudly that the “past was analog” and the “future is digital” (Cohen and Rosenzweig, 80), they readily admitted that the digitization path they advocated was truly ‘undiscovered country’ fraught with potential problems and unintended effects.  For me, the digitization of sources, particularly newspapers, represents a windfall in accessibility; but, as Ian Milligan cautioned, historians must know the strengths and limitations of the digitized sources they are using, must be transparent about these sources’ digital nature, and must understand how they were constructed from their analog source (Milligan, 566-567).

A great deal of skepticism surrounded one technique in particular: Optical Character Recognition (OCR).  Issues ranging from accuracy, cost, and digital portrayal seem to mark OCR as perhaps a useful tool for some very specific purposes (such as data mining) but not as a substitute for the source itself. I agree fully. My limited exposure to OCR-digitized newspapers in ProQuest’s Historical Newspaper Database suggest that although OCR-captured works may assist in locating broad trends with regard to selected words and topics, there is no substitute for the historian viewing the original newspaper as it physically existed  for its contemporary readers. This aspect of how people experienced the news is important to me because my dissertation concerns how radio and newspaper media constructed a collective American memory of D-Day (6 June 1944) at the time the event was occurring.  What excites me about OCR-digitized newspapers is not only the availability I mentioned earlier but the ability to conduct key-word searches to pick up from larger corpuses of selected American newspapers the degree to which those papers discussed or referenced the coming invasion in the six months leading up to 6 June 1944.  In fact, my brief tinkering with data mining in ProQuest’s newspaper archive has what may be a new feature (at least since some of the readings were published) that steered me toward an actual facsimile scan of the newspaper articles based on “hits” from my search. Unfortunately, the articles were isolated from the paper’s original layout, so I could not experience the text in the same context as the contemporary reading audience.

In the main, I see great possibilities for digitization — as long as digitized sources supplement other ways that historians locate and interact with source material. As someone who has spent long hours in the National Archives pulling from the shelves box after musty box of World War II documents,  I have always found great value in seeing and holding original documents with all their inherent imperfections and unique graphics. As Simon Tanner pointed out through numerous examples, OCR doesn’t do a good job in capturing text from complex layouts that use graphics and other images. Newspapers fall into this category, and seeing original papers and their often varying photographic quality (and even use of unbleached pulp paper) can tell a lot about how people might remember what the paper was reporting and how the images shaped their memory. I know that Sarah Werner facetiously characterized this approach as “nostalgic fetishizing,” but I think this type of historical research still has a place in the world of digital sources, especially since scholars like Marlene Manoff are starting to see electronic objects as material objects themselves (Manoff, 312).  The danger here is that some electronic permutation of a document or photograph – enhanced or refashioned — becomes an actual material substitute for the original analog version.  That concept gives me pause — and some reason for concern. Will historians find themselves relying almost exclusively on digitally enhanced sources that did not exist in the same form as when they were created? Isn’t that notion the very idea behind the term “ahistorical?”

In any case, and despite my long-standing penchant for using original analog documents as sources, I am most excited about the possibilities of accessing online a broader range of material through digitization. The seven or eight major newspapers now fully digitized and in ProQuest’s database are a great source for me to test out some quantitative theories about how heavily D-Day weighed on the public mind in the months leading up to the invasion and how the media might have portrayed the invasion as an “America-only show.” However, I am painfully conscious of the fact that the data corpuses of many other newspapers are limited and, in some cases, skewed toward certain populations. For example, I am still on the hunt for good representative samplings of newspapers that targeted specific audiences, such as African-Americans.  These newspapers’ perspectives on D-Day are highly relevant to my research. Personally, I have no problem digging for those tough-to-find sources in hard-copy archives or wherever they may be, but I share Ian Milligan’s fear that lazy historians won’t do the leg work needed to dig up obscure sources and will instead rely solely on what is available digitally. I found Milligan’s contention that only digitized Canadian newspapers were figuring predominantly as sources in recent dissertations to be a bit shocking.  His point dredged up for me fears that digitized sources — and only digitized sources — might represent for many up-and-coming historians the left and right boundaries of their research efforts.  Milligan’s concern contradicts Sean Takats’s fear that an abundance of digital sources is the real danger.  Frankly, I think what gets digitized and when will be the biggest problem.  Abundance probably won’t be the real issue.  In my own work in years past, I reveled in finding some previously undiscovered source — that “gem” – that added a unique insight or, dare I say it, “flavor” to the history I was writing. I would hate to see future historians discouraged from doing the necessary leg work to find those obscure sources that will build upon existing historical arguments in rich and informative ways.

Ultimately, digital sources will be (and are now) a boon to all historians — professional and amateur alike. The challenge will be not identifying what Bob Nicholson called “the digital turn” (we’re already there) but for historians to learn the strengths and weaknesses of the digital sources they use and to employ them accordingly in the service of their historical arguments.  But a digitized source should only be one type of source, not the only source upon which we as historians rely.  We still need to dig in those musty old archives to find those undiscovered “gems” awaiting the light of day.

Steve Rusiecki

 

Week Two Reading Blog: Struggling to Define Digital History

The seemingly exhaustive exchange between and among the eight scholars participating in the 2008 JAH “Interchange” highlights for me the same issues that come to mind when I think of digital history: Is digital history a method or a medium? I’m inclined to agree in part with William Thomas’s perspective that it can be both (JAH, 454), but I can’t help but think that this duality is equally problematic.

Historians consider method to be a combination of primary-source types and a theoretical construct. For example, method would be a qualitative analysis of working-class labor patterns in 1890s Baltimore as seen through an analysis of time cards and wage rates – all viewed through the theoretical prism of Marxism.  In its broadest sense, this approach would be labor history from a Marxist perspective. Other methods include social, cultural, military, gender, and other forms of history, all defined both by their genre and their selection and approach to the primary sources.  I have trouble placing digital history into this definition, unless digital history is an approach to examining how historians of all stripes over time have used technology to present their arguments.  In other words, digital history may simply be a sub-genre of history.

I am most comfortable with seeing digital history as a medium through which to present a historical argument, much in the same way that we use the medium of printed monographs and journal articles to present historical arguments today.  I am still a strong believer in causal thinking and an organized, linear flow to presenting an argument and its attendant evidence.  Edward Ayers feels that the digital medium has not affected the writing of history at all but instead has simply opened up new possibilities for portraying the argument visually and spatially (Ayers, third paragraph). My sense is that Ayers feels that only the mode of portraying the narrative historical argument has changed, not the narrative mode itself. Yet when I reviewed Ayers’s The Valley of the Shadow Web site a few years ago, I had some difficulty teasing out his argument from the various hyperlinks scattered throughout the site. Clearly, Ayers’s approach was social and cultural history using a digital medium, but simply overlaying the narrative form onto the medium without mastering how the medium can enhance the argument is fraught with problems. Thus, I would counter Ayers’s point by arguing that the writing of history, when applied to the digital realm,  must adapt to the requirements and possibilities of the digital medium without simply cobbling the text together through a series of disparate hyperlinks to other bits of text. That approach is akin to taking a published monograph and flipping through it randomly. In other words, you can miss the main point very quickly.

In the main, I am more convinced at this juncture in my limited exposure to digital history that it is a medium we must master in the service of our historical arguments in much the same way that we have mastered the book over time as a means of organizing and presenting historical arguments.  I agree fully with Amy Murrell Taylor’s  point in the JAH Interchange that historians have to think “in bold and creative ways about how this technology can serve the interests of history” and how “students can create a truly ‘new’ history as a result” (JAH, 459).  I don’t mean to slight Ayers’s early digital efforts, because his ground-breaking The Valley of the Shadow site has made fantastic strides over the years in refining the argument’s presentation visually and spatially.  But some of the old problems I encountered with his site years ago still remain. Thus, I think digital history is more about the medium than the method — unless the method speaks to a genre that is the history of digital history itself.

Another theme that arose from the readings was the double-edged sword inherent in the abundance of source material now available to all historians.  Daniel Cohen, Mike O’Malley, and Sean Takats each discussed  the abundant nature of sources in the digital world — something that for me has both positive and negative implications.  O’Malley suggests, and Takats seemingly laments, that the wealth of information available to historians today via technology means that historians will have to look at everything available to them and select as examples only the very best evidence. While O’Malley sees a silver lining in the elimination of superfluous evidence in favor of only the best examples, Takats seems concerned that peer expectations will demand exhaustive reviews of enormous corpuses of material instead of being satisfied with carefully selected samplings. I agree with Takats. My fear is that scholars who find such data corpuses too time-consuming and daunting to tackle will only seek out topics based upon smaller  data corpuses.  James McPherson, in his book For Cause and Comrades (1997), argued his point effectively using a carefully selected sampling of Civil War soldier letters. He explained that he selected around 1,800 letters written only by those soldiers in combat-engaged units to argue for the reasons why they fought.   Any amateur Civil War historian knows that the extant Civil War letters in personal and public archives number in the hundreds of thousands. Does McPherson’s 1997 argument now fall flat because he engaged in hand-picked evidence-gathering and sampling to make his argument? Imagine McPherson spending a lifetime going through every one of the existing Civil War letters out there to support just one monograph and the stifling effect it would have on his ability to generate much more knowledge through other projects.

The McPherson example leads to another concern for me that emerged from my analysis of the readings — the relative impermanence of one’s argument if conveyed through digital means. If more and more digitally based sources can now be made available each year, when should the historian consider an argument closed for his or her purposes? William Thomas suggested that historians publishing in the digital medium will be tempted to keep ‘intervening’ by constantly “editing, adding, annotating, and refining” their online work (JAH, 457).  My concern is that historians will become wedded to one lifetime project that he or she will constantly refine and revise, perhaps only to defend the integrity of his or her scholarship as new sources become available or as peers raise counterpoints. I see value in “locking down” one’s argument after a certain point to allow new scholars to build upon and refine those arguments in later years and to allow historians to do what they do now: venture into multiple projects over the course of one’s career.

And so what did the readings tell me about the state of digital history today?  The strongest message was that we as historians are on a fast-moving train when it comes to digital history.  We have to embrace it for what we can do with it and not what it can do to us.  Miriam Posner was correct when stating that the first big step is to develop an online presence.  And now we’re off to the races.

Steve Rusiecki