Learning About Topic Modeling

Ted Underwood and Megan R. Brett raise some interesting points about the pros and cons of text analysis. In his article “Where to start with text mining,” Underwood states that there are two obstacles to getting started with text mining: gathering data and cultivating the necessary skills to pit that data. Underwood goes on to give the reader concise instructions to mitigate these two problems.

Brett, in “Topic Modeling: A Basic Introduction,” provides a brief overview of topic modeling. She defines it, explains how it works and how to get started, and provides a brief history of the tool. Brett seems to agree with Underwood about the challenges of text analysis when she states that you need to have the knowhow to make use of it. For example, she mentions that topic modelling does not always yield clear results. In fact, it is common for it be unreadable.

It is for this reason that digital humanists should be able to read algorithms. Per Brett, if the algorithm of the tool you are using for topic modeling is off, then your results will be all over the place. This could be avoided by gaining a basic understanding of algorithms, which is exactly what Benjamin M. Schmidt argues in “Do Digital Humanists Need to Understand Algorithms?

I still have some questions about text analysis. First, what are its potential use cases outside of academic projects? I would be curious to know why it is not as en vogue as big data in the field of journalism. Has it just not caught on yet? Second, is it a bad thing that businesses are starting to adopt text analysis? I am curious to know who the intended audience for this tool is, as it seems like that is not well defined at this point. Hopefully I will have something more meaningful to contribute once I become more familiar with the tool myself.

THUMBNAIL OF A 350 DOCUMENT ENVIRONMENT HUMANITIES CORPUS COURTEY OF DIGITAL ENVIRONMENTAL HUMANITIES