Week Five Discussion Questions
1. Describe text mining in the context of the readings. What are its possibilities for historians? What are its pitfalls?
2. Frederick W. Gibbs and Daniel J. Cohen believe that text mining is more relevant to open-ended questions, in which “the results of queries should be seen as signposts toward further exploration rather than conclusive evidence” (Gibbs and Cohen, 74). Explain what the authors mean by this statement.
3. Ted Underwood contends that historians must overcome two obstacles before engaging in text mining: (1) getting the data you need, and (2) getting the digital skills you need. What digital skills does Underwood feel that historians should develop?
4. According to Cameron Blevins, literary scholar Franco Moretti developed the digital method of “distant reading.” Describe the concept of distant reading. How is distant reading different from text mining? How is distant reading useful for historians?
5. Cameron Blevins argues that the promise of digital history is “to radically expand our ability to access and draw meaning from the historical record” (Blevins, 146). Do you agree? What other possibilities might Blevins be overlooking?
6. What is “topic modeling?” How does it relate to text mining and distant reading? How is it useful for historians?
7. According to Ted Underwood, an Internet search is a form of data mining. But it is only useful if you already know what you are expecting to find. Do you agree? What is Underwood’s remedy for seeking the unknown and the unexpected from the digital record?
Week Five Discussion Assessment
The seven questions I crafted for the Week Five Discussion certainly generated a productive discussion, much of which added clarity to the concepts of Text Mining, Distant Reading, and Topic Modeling. However, we ran down a few “rabbit holes” almost immediately due to some confusion between the concept of an Internet search and Text Mining. The overall flow of the discussion suggested that the sequencing of the questions helped to elicit responses from the class members, but I quickly recognized that I should have re-ordered the questions to define and compare up front the digital concepts of Text Mining, Distant Reading, and Topic Modeling. In the main, the discussion flowed fairly well (albeit a bit sporadically at times), and the different perspectives on each of the three concepts helped the class members to wrestle with and refine their own understandings of these digital approaches and their potential uses for further research.
The one area that created some confusion seemed to stem from a statement that Ted Underwood made in his article “Theorizing Research Practices We Forgot to Theorize Twenty Years Ago.” Underwood’s discussion of “[s]earch [as] a form of data mining” (Underwood, 3) called into question the real difference between a Full-Text Internet Search and Text Mining. I think we missed part of Underwood’s point and talked past the idea that a Search is radically different from the quantitative concept of Text Mining. My own narrow distinction between the two — that one mines the entire Internet while the other mines a selected corpus of material — certainly contributed to the confusion. But as the discussion progressed, the distinction between the two became clearer. The Search simply found things while Text Mining counted things, specifically words. This part of the discussion proved to be the most fruitful in laying the groundwork for exploring and comparing the other concepts. But I was disappointed in my own failure at steering the conversation toward a discussion of Ted Underwood’s six potential uses for text mining and what he described as the underlying theory behind Text Mining — Bayesian Statistics. Specifically, I wanted to leverage the digital expertise of the class members to explore this statistical theory in greater detail to enhance my own understanding of it.
The discussion also helped me to crystallize the distinction between a digital tool and a digital methodology. Several members of the class rightly stated that tool and methodology were often not one and the same, but they could be in some cases. The most enlightening aspect of this discussion centered on Cameron Blevins’s article and how he used what Underwood described as “noun entity extraction” to sift meticulously from selected newspapers the frequency of various city and town names to argue for the spatial construction of specific regions. In this case, everyone agreed that Blevins had used a digital methodology to respond to his hypothesis and that the results of this approach could stand on their own as sufficient evidence to serve his greater argument. But I think everyone agreed, even the one or two digital “resisters” among us, that Text Mining remains primarily a research tool that, according to Gibbs and Cohen, does not exclude more traditional methods such as close readings of the sources.
My efforts to explore Franco Moretti’s concept of Distant Reading did not produce the response I had intended. My own sense of this Text-Mining derivative is that it is useful in locating macro- and micro-patterns, but I struggled to envision what some of those patterns might actually resemble in practice. I wanted to develop this part of the discussion further for my own benefit, but I sensed that several of the class members did not understand Distant Reading and did not see it as a useful approach. Many of the class members seemed to have already assessed the potential pitfalls of Distant Reading and had decided that, as an abstract concept and in the absence of practical examples, it was not useful for their specific research needs.
I wanted to close the dialogue with a vigorous discussion of Topic Modeling and explore how, as Underwood asserted, that it differed from Text Mining by allowing historians to locate the unknown and the unexpected from this digital record. All of us seemed to struggle with this stated purpose for Topic Modeling. However, before we progressed to a discussion of what Underwood really meant, I got the “hook.” Thus, this part of my discussion went unfulfilled, and the door remains open for further discussions of Topic Modeling.
Overall, I was pleased with the level of engagement by my fellow class members. I think we all struggled at times to come to grips with Text Mining and Topic Modeling, but I feel confident that we collectively advanced our understanding of what these tools are, what they can do for us, what they can’t do for us, and how they may actually work. The discussion at least provided us with a broader understanding of Text Mining before we dove headlong into the various Text-Mining programs as part of the practicum. Lastly, I am grateful for everyone’s participation and for their indulgence in allowing a digital amateur to steer an advanced discussion of digital tools with some extremely bright and gifted historians. My thanks to all.