Week Five Reading Questions: Text Mining and Topic Modeling

1.  Describe text mining in the context of the readings. What are its possibilities for historians? What are its pitfalls?

2.  Frederick W. Gibbs and Daniel J. Cohen believe that text mining is more relevant to open-ended questions, in which “the results of queries should be seen as signposts toward further exploration rather than conclusive evidence” (Gibbs and Cohen, 74).  Explain what the authors mean by this statement.

3. Ted Underwood contends that historians must overcome two obstacles before engaging in text mining: (1) getting the data you need, and (2) getting the digital skills you need. What digital skills does Underwood feel that historians should develop?

4.  According to Cameron Blevins, literary scholar Franco Moretti developed the digital method of “distant reading.”  Describe the concept of distant reading. How is distant reading different from text mining?  How is distant reading useful for historians?

5.  Cameron Blevins argues that the promise of digital history is “to radically expand our ability to access and draw meaning from the historical record” (Blevins, 146).  Do you agree? What other possibilities might Blevins be overlooking?

6. What is “topic modeling?” How does it relate to text mining and distant reading? How is it useful for historians?

7. According to Ted Underwood,  an Internet search is a form of data mining. But it is only useful if you already know what you are expecting to find.  Do you agree? What is Underwood’s remedy for seeking the unknown and the unexpected from the digital record?

