Category Archives: Practicum Blog

Week Eight Practicum Blog: Omeka is Very Nifty!

I must admit that Omeka proved to be a very engaging, user-friendly tool for assembling a variety of digital artifacts into a reasonably clear storyline. In keeping with the topic of my dissertation, D-Day, I selected jpeg images of two D-Day-related primary-source documents and four photographs to provide a brief storyline for the events on that day. Uploading each item and completing the meta-data fields was a breeze, even though I was not certain what information to include in certain fields for selected items. In those cases, I just defaulted to “D-Day, 6 June 1944.”

Setting up the site’s appearance from the Omeka dashboard was not a problem, but I did not really like the few available appearance themes. I would have preferred more information about how to customize the appearance of my public site.  I relied heavily on Omeka’s online how-to page (http://info.omeka.net) in order to build my exhibit and upload items, but I was not able to find anything in the guide that explained how I could revise the site’s appearance.

The biggest problem I encountered with creating the page was developing an exhibit after I had entered my six items. Again, the Omeka how-to page was not very effective in describing how to proceed. I was not able to find a direct pathway to creating an exhibit through the various tabs and menus. In the end, I managed to assemble my exhibit mostly through dumb luck and by “test-clicking” the various tabs on the site.  I found the page that allowed me to choose the “gallery” option for my items, which is how I had wanted to portray the images on the public site. My intent was to present the user with a grouping of panels in a specific order that would facilitate a sequential, linear narrative of D-Day; and, somehow, I was able to do it.  Unfortunately, the exhibit did not “pop up” immediately after clicking on my public site’s link. Instead, the gallery exhibit was buried under the “browse exhibits” link. But, when I clicked on that link, the gallery I wanted to see was there. So much for small victories!

I was impressed by Omeka’s ability to sustain the overall quality of the images, even after uploading. Some of the jpeg images were small, and I was afraid of significant distortion once Omeka converted them into thumbnails that, when clicked upon, would increase in size. But, alas, my fears went unrealized, and the storyboard effect of the gallery was all the more appealing because of each image’s crispness.

The best layout for the kinds of stories I would tell on Omeka is the gallery version. A visitor to the site gets an immediate sense of the storyline’s scope and scale by seeing in one collage all the images associated with the exhibit. He or she can then follow the storyline by clicking sequentially on each image and reading the description from the meta-data page or jump to those images the visitor finds most interesting.

Overall, I enjoyed my first experience with Omeka. Often, I find myself blogging about numerous problems I’ve encountered negotiating different digital tools, mainly because I seem to lack that “computer sense” so many others seem to have. But, in this case, I spent to bulk of my time actually creating something with Omeka that I enjoyed doing and that  visitors to the site with an interest in World War Two might enjoy. Thus, this blog will probably be my shortest one of the semester, since I invested the balance of my practicum time in using the digital tool in question.  The modest product of my humble effort with Omeka appears at the following Web link:

http://ranger394.net/Omeka/exhibits/show/d-day–6-june-1944/d-day-in-documents-and-photogr

Steve Rusiecki

 

 

Week Seven Practicum Blog: Google-Mapping the Civil War

My first foray into the world of the Google Map Engine was very informative and productive. For this exercise, I captured, using single-point graphics, the various locations where one Civil War soldier, the oddly named Consider Flowers, and his regiment, the 1st Michigan Cavalry, traveled during the Civil War.  Typically, much of the regiment’s activities were characterized by simple movements from one place to another in an effort to locate and engage the Confederate forces or to conduct raids along major logistical lines of communication (main roads, waterways, etc.).  In some cases, the regiment fought pitched battles in places such as Winchester and along the Totopotomoy Creek during the years 1864 and 1865, years which defined the time limits of this exercise.

Overall, I found the Google Map Engine to be a very user-friendly tool. The speed of the zoom-in and zoom-out features was remarkable and allowed me to double-check my search locations to ensure that I was not adding to the map similarly named locations in different parts of the country. I chose as my base map the colorized, highly detailed terrain map that depicts foliage and elevations clearly. As a former infantryman, I am highly sensitive to the importance of these features on a map and how they impact an army’s ability to move quickly and efficiently across the battlefield space. Thus, I think the full impact of this seemingly ubiquitous cavalry regiment’s travels become much starker when considered in the context of the difficult terrain the troopers had to traverse, often at great speed.

In order to distinguish between events occurring in several of the same places during both 1864 and 1865, I selected one layer each for the two years in question. I opted to use named balloon icons to represent the general movement locations; and, for the first layer, I allowed the Google Map Engine to select varying colors for each icon.  As I progressed in  my data entry, I grew to dislike these auto-generated colors, but I left them in place anyway. They were too light in shade to stand out effectively, but I wanted to retain them in order to compare a different color scheme I planned to use for the second layer.  The only time I changed the balloon icon was to represent locations where I knew from memory or from the available data that a battle or skirmish had actually occurred (I’m sure I missed a couple, though). In these cases, I selected an icon indicating, for lack of a better term, an “explosion.” I tested yellow as the “explosion” color, but it did not stand out as well as the shade of blue that I also tested. Thus, I went with the blue.

For my second layer, I used the same approach for the icons; but, in this case, I colored the balloon icons “red.”  When compared on the map simultaneously with the icons from the first layer, the differences jumped out easily.  The only icons I changed were for two events: the surrender at Appomattox and the Grand Review in Washington, DC. For the Appomattox surrender, I selected a horse icon, mainly because of the image I have of General Lee riding off with great dignity following the ceremony. I used a “sun” icon to portray the Grand Review in DC in May 1865, primarily as a metaphor for the postwar “dawn of a new day.”  I would have preferred the ability to use flag or soldier icons to portray the activities of the different armies, but I made due with what the program offered.

Many of the locations listed in the regiment’s history had no accompanying references on the online map. Unsurprisingly, locally named references to certain crossroads, ferry crossings, and the like have long since vanished. Therefore, in order to compensate for the “fuzziness” of my data, I used Google to pull up sites discussing the Civil War and, using online maps (and even Mapquest in one case), I located the 1864 and 1865 locations and then identified the closest named town or road intersection that was still identifiable on the Google base map. I then used that spot as the closest reference to the Civil War-era location and then, after adding it to the map,  renamed it for the wartime location.  I tried to get as close as possible to the original location, and I refused to settle on something that was more than three or four hundred meters off.  For example, the map engine did not recognize Milford, Virginia, but my Google research showed me that it was located within around four hundred meters of Bowling Green, Virginia. The map engine recognized Bowling Green, so I used that balloon icon to represent Milford.  In some cases, I never found a reference to a wartime location in the context of a map that would correlate with today’s Google Map. For example, I never managed to locate Mallory’s Crossroads or Jones’s Bridge.

Actually, I know a great deal about the Civil War and record-keeping during that period. Yes, they generated a lot of paper, but many of the journal entries for troop locations were nothing more than a “swag”  at best — unless it was a large town or city like Winchester or Richmond.  Maps were scarce and seldom accurate. Local people tended to tell the troops where they were (and not always precisely), and the soldiers recorded these names as ground truth. Few soldiers had ever ventured far from home before the war and did not know intimately the geography of Virginia or other parts of the country. Thus, my efforts to reconcile the “fuzziness” of my data may have either clarified or compounded the inherent “fuzziness” of the primary-source data.

In any case, the Google Map Engine proved to be phenomenal tool.  The point balloon icons worked well, and the labeling application was very easy to populate and save. I experimented with linking locations using lines, but those lines only seemed to confuse the visual depiction.  Overall, I enjoyed using this tool, and I plan to use it again in the future.

Steve Rusiecki

 

Week Six Practicum Blog: A Tangled Web of Networks

The two networking tools I used for this practicum — Palladio and RAW — were each unique in different ways. And, despite using the same Civil War data set for each one (battle and unit), I managed to glean different things from the same data by viewing it through the two different tools.

Palladio proved extremely easy to use. The drag-and-drop feature allowed me to load an Excel spreadsheet with the Battle and Unit data very quickly. The download box populated rapidly,  and I was able to produce a graph very quickly with a simple “click.” I’m not sure that the floating nature of the graph was of much help to me, though. Granted, I could expand and contract its various nodes and edges quickly, but the graph was most useful to me in its static form.

Interestingly, the Palladio graph allowed me to see very quickly that the various units represented in the data, such as the 1st Michigan Cavalry and the 136th New York Infantry, generally fought the war in the same region. The 1st Michigan Cavalry stayed principally in  northern Virginia as indicated by its visual linkage to battles such as Old Church, Winchester, Centreville, and Brentsville. By contrast, the 136th New York Infantry spent most of its time in the South, fighting at Atlanta, Chattanooga, and Stone Mountain. Yet the graph indicated that at some point, both units participated together in the battle of Gettysburg, suggesting that the “regionalization” of various Union regiments did not mean that the Army’s senior leaders could not call upon them to move and fight elsewhere. But the most significant thing I took away from the network visualization of these units and the battles in which they participated was that, for the most part, many of them fought in one general region within the United States and seldom moved from that area. Perhaps one explanation was the difficulty inherent in moving a foot-borne Army from one place to another quickly. Locomotives offered limited support, and damaged rail networks throughout the South likely complicated train traffic.

My only difficulty with Palladio was not with the program but with my ability to figure out how to import a screenshot of my graph into the body of my blog. I’m still figuring out how to do it. But, in the meantime, a pdf version of that screenshot appears at the following hyperlink: Palladio

Like Palladio, RAW was easy to use. The drag-and-drop upload feature resembled that of Palladio. The data uploaded quickly, and I was able to generate networking diagrams almost instantly using the Battle and Unit data set. I began with an Alluvial Diagram, which was very difficult to use, even after I adjusted the height and width repeatedly. The data lines were not easy to follow, but the tightly packed lines suggested, contrary to my Palladio graph,  that many units fought in some of the same battles. For example, the varying thicknesses of the bars to the left of each battle name seemed to suggest a hierarchy of common battle participation among units.  If I read it correctly, then this information was more useful than the Palladio graph, which did not really capture those commonalities in a clear, comprehensive manner. In addition, I was not sure what the various colors assigned to each unit were telling aside from possibly serving as a visual guide to lead me to certain  units in the network graph much more efficiently. Here is the Alluvial Diagram I generated.

Aldie
2
Aldie
Atlanta
1
Atlanta
Averasboro
1
Averasboro
Aylett’s
1
Aylett’s
Bealton Station
1
Bealton Station
Beaver Dam
1
Beaver Dam
Bentonville
1
Bentonville
Berryville
1
Berryville
Bethesda Church
1
Bethesda Church
Beverly Ford
1
Beverly Ford
Brandy Station
1
Brandy Station
Brentsville
2
Brentsville
Bull Run
4
Bull Run
Cassville
1
Cassville
Cedar Creek
1
Cedar Creek
Centreville
1
Centreville
Chancellorsville
3
Chancellorsville
Charles City Courthouse
1
Charles City Courthouse
Charlestown
1
Charlestown
Chattanooga
1
Chattanooga
Cold Harbor
2
Cold Harbor
Cross Keys
2
Cross Keys
Culpepper Court House
2
Culpepper Court House
Dallas
1
Dallas
Deep Bottom
1
Deep Bottom
Dinwiddle
1
Dinwiddle
Fairfax Courthouse
1
Fairfax Courthouse
Falling Waters
1
Falling Waters
Fisher’s Hill
1
Fisher’s Hill
Five Forks
1
Five Forks
Fort Scott
1
Fort Scott
Fredericksburg
1
Fredericksburg
Front Royal
2
Front Royal
Gaines Mill
1
Gaines Mill
Gettysburg
4
Gettysburg
Grove Church
1
Grove Church
Groveton
1
Groveton
Hagerstown
1
Hagerstown
Halltown
1
Halltown
Hanover Court House
1
Hanover Court House
Harrisonburg
1
Harrisonburg
Hartwood Church
1
Hartwood Church
Hawes’s Shop
1
Hawes’s Shop
Hope Landing
1
Hope Landing
Jefferson
1
Jefferson
Jones Cross Roads
1
Jones Cross Roads
Jones’ Bridge
1
Jones’ Bridge
Kelly’s Ford
1
Kelly’s Ford
Kenesaw Mountain
1
Kenesaw Mountain
Laurel Hill
1
Laurel Hill
Leetown
1
Leetown
Liberty Mills
1
Liberty Mills
Luray
1
Luray
Malvern Hill
1
Malvern Hill
Middleburg
2
Middleburg
Middletown
2
Middletown
Milford Station
1
Milford Station
Mine Run
1
Mine Run
Monterey
1
Monterey
New Creek Station
1
New Creek Station
New Market
1
New Market
North Anna
1
North Anna
Old Church
1
Old Church
Opequon
2
Opequon
Peach tree Creek
1
Peach tree Creek
Petersburg
1
Petersburg
Picket
1
Picket
Piedmont
1
Piedmont
Piney Branch Church
1
Piney Branch Church
Poplar Springs
1
Poplar Springs
Port Republic
1
Port Republic
Prince George Court House
1
Prince George Court House
Racoon Ford
1
Racoon Ford
Rapidan
1
Rapidan
Rapidan Station
1
Rapidan Station
Rappahanock Station
3
Rappahanock Station
Resaca
1
Resaca
Richmond
1
Richmond
Robertson’s River
1
Robertson’s River
Robertson’s Tavern
1
Robertson’s Tavern
Rood’s Hill
1
Rood’s Hill
Shepherdstown
2
Shepherdstown
Smithfield
2
Smithfield
Snicker’s Gap
1
Snicker’s Gap
Stone Mountain
1
Stone Mountain
Strasburg
1
Strasburg
Todd’s Tavern
1
Todd’s Tavern
Tom’s Brook
1
Tom’s Brook
Totopotomoy
1
Totopotomoy
Trevilian Station
2
Trevilian Station
Turner’s Ferry
1
Turner’s Ferry
Upperville
1
Upperville
Wauhatchie
1
Wauhatchie
Weldon Railroad
1
Weldon Railroad
White House
1
White House
White Post
1
White Post
Wilderness
2
Wilderness
Willow Springs
1
Willow Springs
Winchester
1
Winchester
Yellow Tavern
1
Yellow Tavern
Yorktown
1
Yorktown
136th New York Infantry
14
136th New York Infantry
1st Michigan Cavalry
28
1st Michigan Cavalry
29th New York Infantry
6
29th New York Infantry
44th New York Infantry
22
44th New York Infantry
4th New York Cavalry
54
4th New York Cavalry

 

 

The next network I generated was a Circular Dendogram (whatever that is), which tended to reinforce the “regionalization” impression I got from my Palladio graph. In  this case, though, the resulting graph was much, much clearer and easier to follow.  Yet unlike Palladio, I was able to see more examples of units fighting in multiple regions.  For example, the network diagram confirmed that the 4th New York Cavalry was strictly a regionally aligned regiment while the 136th New York Infantry, as suggested by the Palladio graph, fought in northern Georgia and later at Gettysburg. But the Dendogram helped me see that the 136th also fought cross-regionally throughout northern Virginia, most notably at places like Chancellorsville.  This particular network diagram did the most to convince me that not all Union regiments were wedded to one region, further suggesting a higher degree of deployment capability and mobility than one might expect of a horse-drawn army.  Here is how my Dendogram looked:

136th New York InfantryChancellorsvilleGettysburgWauhatchieChattanoogaResacaCassvilleDallasKenesaw MountainPeach tree CreekAtlantaStone MountainAverasboroBentonvilleTurner’s Ferry1st Michigan CavalryBrentsvilleFort ScottGettysburgMontereyHagerstownFalling WatersRapidanRobertson’s RiverBrandy StationCentrevilleTodd’s TavernBeaver DamYellow TavernMilford StationHawes’s ShopOld ChurchCold HarborTrevilian StationWinchesterFront RoyalShepherdstownSmithfieldOpequonCedar CreekPicketDinwiddleFive ForksWillow Springs29th New York InfantryBull RunCross KeysGrovetonChancellorsvilleGettysburg44th New York InfantryYorktownHanover Court HouseGaines MillMalvern HillBull RunShepherdstownFredericksburgChancellorsvilleMiddleburgGettysburgRappahanock StationMine RunWildernessPiney Branch ChurchLaurel HillNorth AnnaTotopotomoyCold HarborBethesda ChurchPetersburgWeldon RailroadPoplar Springs4th New York CavalryRappahanock StationPiedmontNew Creek StationStrasburgHarrisonburgCross KeysPort RepublicNew MarketMiddletownLurayBull RunFairfax CourthouseGrove ChurchHartwood ChurchHope LandingKelly’s FordSnicker’s GapAldieMiddleburgUppervilleJones Cross RoadsCulpepper Court HouseBrentsvilleRacoon FordRapidan StationBeverly FordBealton StationRobertson’s TavernRichmondAylett’sWildernessTrevilian StationWhite HouseJones’ BridgeCharles City CourthousePrince George Court HouseDeep BottomWhite PostBerryvilleFront RoyalCharlestownHalltownSmithfieldLeetownOpequonFisher’s HillTom’s BrookRood’s HillLiberty MillsJefferson

 

Overall, I found both tools somewhat useful in providing clues to the regional mobility of various Union regiments during the Civil War.  But I’m not certain that these tools told me something that I couldn’t have gleaned from the Excel spreadsheet. Frankly, the hours of data construction that went into the spreadsheet was the real work. Uploading it and creating the network graphs was a breeze. The visualizations were certainly intriguing; but, at some point, my guess is that the data compiler could have come to the same conclusions about mobility and regionalization without graphically representing the data. Frankly, I prefer the the visualization and its impact, an impact made all the more effective by the general ease of use involved in reading the results of both programs.

 

Steve Rusiecki

 

Week Five Practicum Blog: The Power of Text Mining

My initial foray into text mining with Google’s Ngram Viewer proved rather exciting. For the first time, I was able to generate a highly useful visualization of one topic’s frequency over a period of time from a large online corpus of information — Google books. I appreciated the program’s accessibility and ease of use for the average user. Most importantly, the distribution graph that the viewer generated was easy to interpret and made sense at first glance.

I selected as my search topic the phrase “invasion of Europe” for the 100 years between 1900 and 2000. I chose English as the preferred language of my book corpus and “3″ as the smoothing feature. The viewer instantly generated an easy-to-follow distribution graph that clearly showed the expected spike in frequency for the phrase “invasion of Europe”between the years 1940 to 1944. The values on the left showed an initial low frequency of use  followed by a remarkable five-fold jump during the World War II years (1940 to 1944) as expected. The viewer even tracked two variations of the phrase — one with the word “Invasion” capitalized and the other with all letters in the phrase capitalized. Each of these variations had a separate graph that was well below the frequency of my initial search feature, most likely because the variations captured by the Ngram Viewer highlighted the less frequent use of the phrase as a title while my version, with the lower case “i,” suggested greater use in the content of the books themselves.

The Viewer’s most useful feature was the ability to scroll the mouse over the distribution graph and view the actual frequency numbers at specific points in time. And then below that graph, hyperlinks for year groups (such as 1940-1944) led me directly to the online documents reflected in the numbers that produced the graph. The graphing tool and the accompanying date links made sorting through the relevant and irrelevant texts rather easy.  They came up as thumbnails of the covers, which made for quick scrolling and recognition. Since my topic focuses on the invasion of Europe in the context of World War II, I was able to locate quickly original digital scans of Life magazines from 1940 to 1944 while quickly moving past “false hits,” such as a book about the 1853 Turkish invasion of Europe or a 1903 economics treatise chronicling the “commercial” invasion of Europe at that time.

Overall, this foray into text mining using the Ngram Viewer was very productive for me. The results of my first search are as follows:

Next, I sampled Bookworm using the “Chronicling America” corpus. Like the Ngram Viewer, I found this program very easy to use and navigate. Unfortunately, due to copyright issues, the corpus of newspapers in this database does not extend beyond 1920. Thus, it falls outside the time period that interests me — 1939 to 1945. And, unlike Ngram, Bookworm will work only with a single word and not a phrase, a significant limitation for me. In any case, I decided to use the term “invasion” to see what it would yield for me between the available years of 1840 to 1920.  The results are at the following link: Bookworm Chart.

Like Ngram, the distribution graph came up quickly and was easy for me to read and interpret. Remarkably, I identified several spikes along the x / y axis (books per million / publication year) for the word “invasion.” Like Ngram, I could scroll along the distribution graph and see boxes describing briefly the articles per year that represented “hits” for my text search.But what I found most useful was that I could click on the graph and go directly to the OCR version of the newspaper that registered the “hit.” Although the earlier 19th Century newspapers were difficult to read without an extreme close-up view, they all had a small arrow icon that popped up near the margin to direct me to the line or lines where the word “invasion” was mentioned. Very cool.

I decided to check the three biggest spikes (words per million) against the newspaper publication years to see what was actually happening that required journalists to use the word “invasion.”  The greatest spike was for 1840, and the context used for “invasion” centered on discussions of America’s various militias and those militias’ Constitutional role as defenders against invasion. The next spike came in 1861 in the context of the North’s invasion of the South during the Civil War, and the final big spike came in 1898 during the Spanish-American War and the U.S. invasion of Cuba.

Once again, I was very pleased with how this program functioned and with the graph it produced.  I would have found the program more useful if I could have searched with word pairs, thus perhaps narrowing my search even further. Using more than one word in Bookworm automatically creates a flat-line result on the graph. The results of my tinkering with Bookworm appear at the following hyperlink:  http://bookworm.culturomics.org/ChronAm/#?%7B%22search_limits%22%3A%5B%7B%22word%22%3A%5B%22invasion%22%5D%7D%5D%7D

The third text-mining viewer I sampled was the NYT Chronicle, which included all newspaper editions of The New York Times from just before the Civil War up through 2010. Once again, I found this software easy to use and the rapidly generated distribution graph easy to understand. And, like the Google Ngram Viewer, I could search the NYT corpus using my complete ‘phrase of choice’ — “invasion of Europe.”  The graph produced the expected spike over the war years, but the fact that the spike extended out to 1952 suggests that D-Day was a topic of discussion in the Times more than eight years after the invasion occurred.  In the context of my research,  this remarkable bit of evidence demands some scrutiny, since (as the results suggest) the invasion seemingly took on such a powerfully iconic image in the minds of all Americans that it potentially came to embody all that was good about America — sacrifice and justice in the face of an evil adversary — and thus a topic worth emphasizing to the Times’ readers well after the invasion and the war.  Granted, that assertion is a significant leap based on what essentially is a ‘distant reading’ of the texts, but the result really sparked my analytical imagination.  Text mining clearly has possibilities for my research.

In addition to the excellent distribution graph, I appreciated the scroll-over technique reminiscent of Ngram and Bookworm, but I really liked the direct link from the graph to scanned OCR versions of the newspapers’ original pages.  Additionally, the scroll-over feature on the distribution graph provided the percentage of articles per year, a feature not available in Bookworm’s “chronicling America” (as far as I could tell).  Yet I experienced the most difficulty with the program when attempting to access specific copies of newspapers and trying to save my graph results for importing into the blog. First, I was not able to figure out how to get past the pay-wall for accessing the newspapers. Granted, GMU allows such access, but I could not find a way to log in and obtain it. Next, I had a heck of a time trying to save an image of my graph. My computer’s Screenshot feature only allowed me to save the file in Microsoft Memo or some other goofy program.  Finally, I gave up and just grabbed the link as follows: http://chronicle.nytlabs.com/?keyword=abolition.invasion%20of%20Europe. It actually worked when I exited my browser (Firefox) and pasted it into another browser (Internet Explorer). Go figure.

My final experiment was with Voyant, which would not allow me to upload plain-text files  through Internet Explorer. Very frustrating.  But I took Prof. Robertson’s advice and downloaded Firefox and set it as my default browser.  After that change, everything worked like a charm. But I must admit that I found this program to be highly confusing at first blush. I uploaded the “magazine” and Oscar Wilde’s “novel” from Prof. Roberston’s Dropbox and, given that Wilde had a penchant for flowers (a small point I recalled from my English literature days), I decided to mine both documents (the novel was actually Wilde’s The Picture of Dorian Gray) for the word “rose.”  I typed “rose” into the search box below the text corpus, and it produced color-coded search results in the left margin of the magazine (upper level) and novel (lower level). But only in one or two cases did I find the word highlighted. For both documents, the Words in the Entire Corpus feature showed 38 hits for “rose” and another 22 for “roses,” a surprisingly small result given that one of the metaphors Wilde often used for the effects of a decadent lifestyle was that of the withering rose (or at least some other type of flower).

But when scrolling along the colorized search “hits” beside the text, I could not see any digital enablers (save for an occasional highlighted “rose”) that led me quickly to the “hits” for “rose.” I had to squint my way through several pages before finally finding the identified “hit.”  I thought the screen with the Summary title and its attendant word-frequency analysis for oft-used words in the two documents was extremely interesting. Likewise, the “Words in the Entire Corpus” window, which made selecting and exploring specific words at a click, was quite remarkable.  At first, the “word cloud” in the Cirrus window did not make any  sense to me until I scrolled over selected words and found the same frequency-of-use data available in the other windows I described.

Although interesting as an art form, I found the Cirrus feature’s word cloud to be a bit over the top.  The Word Trends feature was very useful; the distribution graph clearly showed that the frequency of the word “rose,” in whatever context it was used (verb or noun), trended higher in the novel than in the magazine.

Likewise, the Words in Documents feature separated the hits for “rose” by document (23 in the novel and 15 in the magazine),  a feature that would certainly figure highly in my own research.

After toying with Voyant for several hours over two days, I did not feel that it was as user-friendly as Ngram or NYT Chronicle. Frankly, I was not able to keep up with the in-class explanation of Voyant because I was too busy wrestling with the “upload” feature.  Therefore, I went online and printed a “Getting Started” guide that helped me understand what the different features in Voyant were telling me and how they functioned.  The guide proved very useful, and led me to experiment with pasting a URL for a corpus source into Voyant and seeing what happened. I linked the 1 October 1914 issue of the Arizona Republican newspaper from “Chronicling America” and hit “reveal.” The results were very good; the text populated the corpus feature very well, and my sample search for “Germans” (World War I was in full swing when this edition was published) worked nicely.

My standing concern (call it a “fear”) is that the newspaper databases that I want to use for my class project — specifically papers dating from 1940 to 1944 — are behind pay-walls, are password protected, or are filtered in some way that won’t make them accessible via this URL feature. Instead, I will have to located Web sites with the newspapers I need and “snatch” individual examples in order to create my own corpus. I find that prospect to be very daunting.  I tested this approach by grabbing three OCR versions of newspaper snippets from ProQuest’s “Historical Newspapers” database and uploaded them as pdf files. The results were poor. The text was garbled and unclear, and the various features in Voyant simply registered that gibberish.  I’m not sure that the quality of the OCR was the reason, but my confidence level in building my own newspaper corpus for Voyant is all but nil.

Ultimately, Voyant offers much more than the other text-mining programs, but I just need to learn how to use all of the viewer’s features and find ways that it can help me in my research.  Using Voyant with the databases I need seems to be the biggest challenge I have to overcome.

Steve Rusiecki

 

Week Four Practicum Blog: Will Digital Sources Scuttle a Journal Article?

My survey of the main articles published in the last three years of the Journal of the Civil War Era suggest that until precisely one year ago, scholars submitting articles for publication avoided citing Web sites or any other digital sources. Could it be that citing such sources, or even using a database, in this particular journal was the kiss of death until this past year? These results intrigued me, so I carefully surveyed the footnotes in each journal article beginning with Volume 4, Number 3 (September 2014) and worked backward to Volume 2, Number 1 (March 2012). Overall, the dearth of digital sources, or at least the admission by an author that he or she used a Web site to locate a cited source, was striking, particularly since we know with some certainty that many historians routinely shop for source material on the Web, especially for journal articles.

Each edition of the Journal of the Civil War Era carried an average of three articles in addition to standard features such as editorials and book reviews. I focused solely on the articles and visually scanned each footnote section for any hint of a database in use or the mention of a Web site. The most recent edition published this month (September 2014) yielded no hits. But the June 2014 issue seemed like I hit the jackpot with an article by Chandra Manning, whose monograph titled What This Cruel War Was Over (2008) just happens to sit near the top of my personal hierarchy of masterful scholarship. In this particular article, Manning cited Web-based or digitized sources in six separate footnotes. Her use of each Web reference did not discuss any specific ways of using the Web site other than as a repository for some specific information. In one case, she argued quite pointedly that “antebellum state constitutional conventions demonstrated that a citizen was someone seen by others in the community as independent, self-reliant, and capable of contributing to ‘the harmony, well-being, and prosperity of the community.’” Her source for this assertion was a database titled “Debates and Proceedings of the Convention for the Revision of the Constitution of the State of Indiana, 1850″ (http://indiamond6.ulib. iupui.edu/cdm4/document.php?CISOROOT=/ISC&CISOPTR=6357&REC=10). She even offered up in the same footnote a database of state constitutions titled “The NBER / Maryland State Constitutions” (http://www.stateconstitutions.umd.edu/index.aspx) in case anyone wanted further examples. Unfortunately, she provided no insight into how she used these databases other than as repositories for specific historical documents. Given the  methodological transparency of her 2008 monograph, I was surprised that she was not more specific in describing how she used these digitized sources.

My sense of “hitting the jackpot” diminished quickly as I continued my backward trek through the journals. Manning would turn out to be the most prolific “citer” of digital sources whom I would encounter in the practicum.  In the March 2014 issue, three authors cited at only one digital source each to support a specific assertion. For example, Nicholas Marshall relied on a digital database titled “Dyer, A Compendium” (http://www.civilwar.net/searchstates.asp?searchstates=Ohio, http://www.civil-war.net/searchstates.asp?searchstates=Massachusetts) to distinguish war-related deaths from those that occurred due to illness or accidents. In that same issue, Sarah Bischoff Paulus used a database named “America’s Historical Newspapers” (http://www.readex.com/content/america%E2%80%99s-historical-newspapers-college-edition-1690-1922) to locate April editions for the years 1854, 1855, and 1856 in order to illustrate the admiration many Americans felt for the late Henry Clay through news reports of annual observances of his birthday. At least in Bischoff’s case, her use of the database seemed obvious: she searched for newspapers published in mid-April for a specific year and then selected the editions that mentioned celebrations of the late Henry Clay’s birthday.

Only a couple of articles in each of the September and December 2013 editions yielded any evidence of digital sources in the footnotes. In her article in the September issue, Beth Barton Schweiger queried the “UNESCO Institute for Statistics” database (http://stats.uis.unesco.org/unesco/TableViewer/document.aspx?ReportId=121&IF_Language=eng&BR_Country=7160&BR_ Region=40540) for the 2010 adult literacy rate in Zimbabwe (92.2 percent) to contend that high literacy rates did not necessarily equate to economic prosperity. In her article in the December issue, Thavolia Glymph used the “National Register Properties in South Carolina” database (http://www.nationalregister.sc.gov/berkeley/S10817708014/index.htm) to illustrate the historical significance of an antebellum inland planter-class settlement in South Carolina named Pineville Village. In both cases, the authors neither elaborated on their search techniques nor explained how they manipulated the databases they cited. My own impression was that they simply used these databases as encyclopedic repositories for information they could obtain from a basic search.

The next five editions of the journal represented a remarkable dry spell in digital source citations. Not a single article mentioned any type of digital source, even though several articles relied upon newspapers that could only have come from an online database search. The final digital entry I located came from the very last edition in my survey population, the March 2012 issue. In that edition, Matthew C. Hulbert used the Missouri Division’s “Sons of Confederate Veterans” Web site (http://www.missouridivision-scv.org/littledixie.htm) to argue for Missouri’s cultural link to the Old South and the Confederate cause. Once again, in the absence of any further explanation, I can only assume that he accessed the database strictly to see the information posted there.

After this detailed survey of three complete volumes (12 editions) of the Journal of the Civil War Era, one message echoed very loudly: citing digital sources for an article published in this particular journal was not (until very recently) a widely accepted — or perhaps tolerated — practice. More than half of the editions surveyed did not even list a single hyperlink in any of the articles’ footnotes, despite the citing of some sources the authors almost certainly found online.  I am hesitant to jump to conclusions based on such limited evidence, but this exercise has suggested to me that until very recently (and I mean within the last 12 months), transparency in the use of digital sources has remained elusive in the world of academic historical journals. The Academy seems to be undergoing its own “cultural turn” as digital sources claw their way to relevance — but ever so slowly. Few scholars seem as bold as Chandra Manning in identifying those sources for what they are – digitized products housed in online databases. From my perspective, digital sources are fully acceptable as long as the methodology remains clear. The potential pitfalls I have already witnessed in my brief exposure to digital  history suggests that we can’t accept all digital products at face value. We have to know the strengths and weaknesses of our digitized sources and explain to our readers how we compensated for those factors when we employed these sources in the service of our arguments. That standing theme of  cautious optimism when using digital sources dominates the existing landscape of digital history — and for good reason.

Steve Rusiecki

 

 

 

 

 

Week Three Practicum Blog: OCR and My Exercise in Frustration

Good Lord. Until this practicum, I thought I was fairly adept at manipulating computer programs. But to make parts of this exercise work, I felt like the proverbial one-legged man in an ass-kicking contest.  I feel more traumatized now than when I experienced ground combat for the first time. Just kidding — maybe. First, let me address my experience with the Google Drive OCR program.

I will admit that once I was able to open the older version of Google Drive OCR, the program proved rather easy to use.  I uploaded my assigned document from Prof. Robertson’s file (number 7) and ran it through the program. The instructions were easy to follow and worked perfectly. But I encountered two significant problems, one of which was practical computing (how to save the jpeg I created to my desktop) and the other the nature of the OCR results.  First, let me address the results. As you can see from the image below (sorry, but I could not figure out how to align it in the WordPress program), the typewritten script is very clear. Despite my efforts to crop the image and increase font size, I could not make the typewritten words any clearer than when they appeared in their original form.

OLYMPUS DIGITAL CAMERA

Next came the results. Wow. The digitized text following remediation appeared to be written in Klingon, Pig Latin, or some other unintelligible jib-jab.  I expected a far better result given the clarity of the typewritten page. Here’s what the program gave me:

ma charge of the translation. I admit Burns r0001″! money. I admit ho return! some monoy, and 1t Isl returned been“. by all and”.1 30 air, I didn’t.

A No sir. I think I got my information from tho audit department. I could explain to you how tint cam.

HR. BOARDMAR: is hero and be tho but proof of that. I think I will sustain the objection.

Q In your discussion with reference to thin lunatigaticn that yen wore making was there anything othor than innltigatm the birth of this 011116?

A Yu, than m the minus back of the pushing of the 0mg” against Thu. That m about all, an! to fin! out who :11 nor. inhrutoa Ln pronoun“; Thaw uni that their mutt?“ wort.

Q D16 you in flat connutlcn make my innitigatlon to “M out in or around I!” York in” how Ml “cap. I” uooonplilhoa and anything in commotion with um and how they In” trying to 301: Mm back?

As you can see, very few sentences proved coherent. In fact, in the absence of reading the original primary-source document, I could barely discern the topic of the interrogatory. I even manipulated the text after I imported it into the blog so that it would at least resemble the original in terms of format.  The error rate was well above 70 percent in this case, which shocked me. The digitized newspapers archived on the “Chronicling America” Web site fared much better, but I will discuss those results below. The impact of this error rate suggests that the OCR-generated text is marginally useful at best — and then perhaps only for indexing. Frankly, my confidence level in OCR dropped markedly after this exercise. I was unable to compare my results with Andrew’s, because he stated that his computer could not support the exercise.

The second problem I encountered with Google Drive OCR was trying to save the jpeg version of my OCR results to my desktop or to some other location that would allow me to import it into my practicum blog. I could neither drag nor copy it to the desktop from Google Drive. In fact, I could not locate a “save as” feature anywhere. Eventually, I copied the image and text results into a Word document and then used that file to import the image and text into WordPress. Once again, I struggled to perform what should have been a couple of routine computer tasks due to my inability to intuit the functions within the program.

In my review of the newspapers in “Chronicling America,” I chose to search for 19th Century American newspapers that mentioned the phrase “Lincoln assassinated.” I set my parameters for 1865 to 1865 (one year only), and I received numerous hits from both Northern and Southern newspapers beginning in April 1865 and later.  I reviewed carefully the pdf scans of the original newspapers and the OCR-generated text behind them.  Unlike my experience with Google Drive’s OCR results, the error rate was much, much lower. Like the document I used above, the newspaper text was generally clear save for a handful of selected letters and words that appeared smudged or that did not strike the paper firmly enough during the original printing process.  I engaged in some back-of-the-napkin quantitative analysis with a few paragraphs in each newspaper and discovered that, after counting both correctly captured words and those that were incorrect, I had calculated an error rate of roughly 12 incorrect words for every 100 scanned. Thus, this 12-percent error rate was much more manageable and gave me greater confidence that the newspapers would yield useful, accurate results for historians engaging in data mining, topic modeling, or indexing. Many of the incorrect words resulted from poor printing quality in the original; in one case, the tightness of the letters in relation to one another changed the word “bouquet” to “bonqnet.”

Since most of the primary sources I will use for my dissertation will be U.S. newspapers from June 1944,  I looked for an online archive other than “Chronicling America” that digitized newspapers using OCR .  Newspapers.com fit the bill, and I signed up for a free seven-day trial so that I could gain full access to the archive. I conducted an initial search for “6 June 1944″ and immediately saw scores of digitized front pages from D-Day. The thumbnails made scanning for examples extremely easy. I selected three front pages to test for OCR quality: the Daily Boston Globe, the Las Cruces Sun-News, and the Bakersfield Californian. The scan quality varied significantly from paper to paper, most likely because of the quality of the microfilm from which the newspapers were digitized.  For example, the image quality of the Daily Boston Globe was so poor that I was unable to read it even after zooming in as close as the program would allow. By contrast, the other two papers displayed much better digitized quality. After about 30 minutes of effort, I could not figure out how to view the OCR text behind each front page.  Therefore, I decided to test the OCR quality by conducting a search of a very prevalent word on each front page: “France.”  Even though I could make out numerous uses of the word “France” in the Boston paper, the search only highlighted one “hit,” clearly a result of the digitized original’s poor quality. The Bakersfield and Las Cruces papers fared much, much better. In each case, the search feature “hit” each cited instance of “France” that I could discern – a perfect record!

In the end, my experience with ProQuest Historical Newspapers and Newspapers.com restored my faith in OCR.  I am still not certain why the Google Drive OCR program produced such poor results.  Was I simply too inept to use it properly? Should I have somehow tried to improve the source image’s quality? Perhaps some skill is necessary in ensuring the proper digitization of documents using OCR. What is most important to me now is knowing that OCR-digitized newspaper archives exist out there that will prove both useful and reliable as I press on with my dissertation research.  In the meantime, I need to get on the stick and figure out how to use this software properly. I wish we would have had more time in class for the practicum so that we could have learned through each other’s mistakes.

Steve Rusiecki

 

 

Week Two Practicum Blog: Locating Digital History Sites on D-Day and the Media

A quick Google search of my dissertation topic on D-Day and the media nearly overwhelmed me with hits about the basic facts behind the Normandy invasion on 6 June 1944. Unfortunately, nothing I found suggested to me  an ongoing scholarly conversation about my topic. However, I found numerous examples of sites hosting digitized primary sources — specifically American newspapers from across the country and actual radio broadcasts of the invasion — that I know will prove extraordinarily useful to my research.

I began by searching for anything related to “D-Day and the Media.”  Surprisingly, the first result was for an organization, D-Day Media Group, which promotes music, film, art, and literature on behalf of the African diaspora.  I could not discern why the group chose to name itself “D-Day,” but my first guess would be to suggest a “global invasion” of African-produced and inspired art.  But the many, many links that followed this first search result highlighted numerous American and international news Web sites reporting on the recent 70th anniversary of D-Day. Clearly, the timing of my search had everything to do with elevating the status of these search hits to the first 20 or so on Google.  The commemorations of D-Day by these sites took many forms. The BBC, for example, had actor Benedict Cumberbatch reading aloud the actual transcripts of BBC  radio newsflashes while Quinnipiac University’s Web site advertised how the school had “Tweeted” facsimile images of D-Day front pages. While my research is clearly focused on the news media’s reporting at the time of the invasion, these commemorative efforts for the 70th anniversary essentially represented, as Roy Rosenzweig and Daniel Cohen might have suggested, Web-based examples of public and community history at work.  But these sites were extremely useful in supporting my broader contention (dare I say “hypothesis”) that D-Day remains today a seminal American memory of the war due largely to the media. Most notably, the use of period newspapers and old radio sound clips  to reinforce America’s (and the world’s) existing memory and perspective of D-Day was quite striking to me.

Despite altering the text of my search phrases to “D-Day and Newspapers” and “D-Day and Radio,” I still found no evidence of an ongoing scholarly conversation about how the media portrayed the invasion and that portrayal’s effect on American memory. A few hits highlighted the term of propaganda but only in the most extreme, negative sense (mainly Wikipedia). Actually, news reportage during D-Day was much more nuanced in this regard.

What I found to be most useful, or potentially useful, were the online archives that hosted complete audio files of the radio coverage on D-Day and troves upon troves of newspapers reporting the invasion in one way or another. Two of the most intriguing sites were “paywalled” and thus, for the moment, out of reach: www.archives.com and www.newspaperarchive.com. Both sites emphasized newspapers as a genealogy source, but they also carefully organized their archives by state, city, and township, which will prove very useful to me as I attempt to discern a “regional” (if it existed) flavor to the nature of the D-Day reporting.  For example, did the reporting on the West Coast, which was closest geographically to the war in the Pacific and whose state populations even felt directly threatened by the Japanese, elevate the D-Day invasion to top billing?  The answer is behind the paywalls.  Frankly, I did not expect to encounter such extensive online newspaper archives outside of what I have already sampled from ProQuest.  And, as expected, ProQuest’s sources did not “pop” on the Google search.

I targeted my next search to the place where I knew I would find an online trove of digitally archived newspapers and perhaps vintage radio broadcasts – the National Archives Web site.  I was surprised to see that, while www.newspaperarchive.com boasted newspapers dating back to 1607, the National Archives advertised only 1690 to the present, leaving 83 years of colonial and pre-colonial newspapers possibly unavailable online or listed as a holding.  Yet I was more surprised to find that while the Archives had digitized many newspapers (the OCR scans were of varying quality but the close-up feature was fantastic), many were simply listed as holdings accessible only on site via microfilm or in the document’s original form. The search engine was very good, though. The choices available to narrow one’s search — frequency of a topic, ethnicity of the target audience, etc. — seemed extremely useful and very relevant to my research.  I plan to spend much, much more time on this site in the future.

In the end, I found several digital-history sites — mostly archival in nature — that will help me immensely to locate the relevant and important sources I need to serve the cause of my argument.  At first blush, the online sources can seem overwhelming; but, after a bit of digging and classifying the sites, I found that I could screen for the most useful ones rather quickly.  This practicum was most productive!

Steve Rusiecki

How extensive should my online presence be?

Establishing this domain name is perhaps my most deliberate attempt to develop a more permanent online presence.  I agree with Miriam Posner’s assertion that “being visible on the Internet can benefit your scholarship, pedagogy, and even service.” Yet I have found myself advancing cautiously into this realm, quite possibly because my primary online experience for the last 25 years has been through the Army’s information-technology prism. The message the Army has beaten relentlessly into us Soldiers (and former Soldiers still serving the Army as civilians) is that the Web is a dangerous place prone to cyber attacks and the pilfering of personally identifiable information, the loss of which might jeopardize our safety and the safety of our families.

But I have steadily begun to recognize that those of us in the Army are no more and no less vulnerable than the average purveyor of the Web. However, for published historians like me, the risk of making myself available online seems worth it based on the benefits gained: collaboration with other scholars, constructive assessments of my work, and the ability to hear from people deeply affected by the military history I write.

When I Google myself, the primary links that come up are directly related to the two World War II books I published in 1996 and 2010 respectively. The fact that very little comes up about my background as an Army infantry officer testifies to my reticence during my active-duty days to plumb the Web’s depths and leave a more visible  thumbprint.  But the online presence I developed through my two books demonstrated clearly to me the necessity and benefit of a solid online presence.

Despite excellent reviews, my first book suffered from a lukewarm marketing campaign, principally because the publisher (an academic one at that) was more interested in selling library copies than individual copies to an interested public. But things changed dramatically with my next book. My publisher, the Naval Institute Press (an academic press like my first publisher), marketed the book widely on the Web.  They even convinced me to engage for the first time in social media by establishing a Facebook page so that readers could reach out to me directly.  Thanks to the Army’s overly cautionary propaganda, I hesitated at first but finally succumbed. My subsequent Facebook page was (and still is) a very “bare bones” sketch of my personal background with about a dozen “friends.” But in spite of this minor presence, some amazing things happened. Numerous people contacted me directly via Facebook about my new book and provided me with incredible feedback, including veterans of the battle whom I had not known earlier.  Most of all, my presence on Facebook, on Amazon.com (I still don’t have an author’s profile there!), and on CSPAN’s Web site (they posted a 15-minute video clip of me from their BookTV feature) all breathed new life into my first book, resulting in another publisher picking it up and re-publishing it in softcover. The sales exploded — and continue to outpace my second work. Amazing.

The most rewarding aspect of my online presence, though, is that I have been able to provide answers for several family members of men who fought and died in the two battles about which I wrote. In short, we were able to find out what happened to their loved ones. The World War II Army’s manual (and laborious) administrative machine  often provided very sketchy details about a Soldier’s death to his family, inadvertently leaving questions unresolved and old wounds unhealed. In one case, a Facebook request from a lady whose father died before she was born sent me back into my  primary sources to seek out some information for her. Remarkably, I was able to tell her precisely when, where, and under what circumstances her father died. The entire process was highly emotional for everyone involved. But without the ability to reach me directly online, I doubt this woman would have bothered to go through the publisher to seek me out.  The knowledge I provided to that lady and to her family helped heal  some long-standing wounds and provided me with a great sense of personal satisfaction.

My larger point, like Miriam Posner’s, is that we — as historians — have to overcome our fear of the Web and establish an online presence that will work for us. For me, Facebook opened up new possibilities very quickly, but I think other mediums, such as Twitter and our own personal Web pages, have great promise. In fact, I have longed to develop a Web site for my two books that will allow me to post photographs and maps that never made it into the published versions. Additionally,  I want to create a function on the site’s main page where the public may submit questions directly to me about my work. I would also like to scan and post some of the more interesting primary sources to give the reading public a sense of my qualitative approach to the material.  The only thing stopping me has been one simple fact: I don’t know how to build a flipping Web site! My experience so far with Word Press and Reclaim Hosting  has been very positive and user-friendly. My suspicion is that other tools out there for Web-site development are just as easy to use, and I look forward to diving in.

Steve Rusiecki