AmCAT Version |
---|
This page describes a feature in AmCAT 3.3 |
View other version: 3.3 - 3.4 - 3.5 |
The output option Graph/Table in the Query section enables you to produce frequency lists, cross-tables or various kinds of graphs from aggregated search results, and export these tables or graphs as data files or images. From each table or graph, you can click through to the text of the articles included in particular data points, making this output option another useful way to explore your data.
When you select the output option 'Graph/Table', AmCAT provides you with a list of options that will determine what your graph or table will look like.
You first specify which variable you want to display on the X-axis or in the rows (date, media outlet, search term or articleset) and on the Y-axis or in the columns (media outlet, search term, articleset or the total) of your graph or table. If you chose 'Date' for the X-axis or rows, you can further specify the date interval of your preference.
Click on the drop down list next to 'Date interval' and choose to display the data on the X-axis either per day, week, month, quarter or year (the default is ‘day’). Next, you click on the dropdown list next to 'Output type' and select how you want to display your results, either as a table or as one of four types of graphs (bar plot, scatter plot, line plot, and heatmap). If you want to create a table that you want to use outside of AmCAT (such as in Excel or in R), select ‘CSV’ as output type, which allows you to directly download the table from AmCAT as a csv file. Click on the 'Query' button below the output options to generate the graph or table.
As an example, we will discuss a possible use for line graphs and tables for exploring the co-occurrence of two concepts. We will use the article set called 'nyc nuclear* or atom* in lead', an article set with all new stories from the New York Times between 1945 and 2013 that have the word nuclear* or atom* in their lead paragraph. Let's say we want to explore the relation between nuclear (power) and danger in these particular articles. In order to do so, we make two search strings, first one that captures the concept ‘nuclear’ and second, one that can capture the concept danger or threat. As you can see below, the term ‘nuclear* OR atom*’ is used to capture the concept ‘nuclear’. ‘Danger’ is approximated with the search term ‘danger* crisis* catastroph* threat*’, where AmCAT automatically puts the search operator OR in every empty space between keywords. After entering the search term(s) to use, you can choose the variables and settings for the graph.
Which graph type to use depends on the question the graph should help answer. We could hypothesize that news coverage of nuclear energy peaks when something negative related to nuclear energy happens, such as a nuclear accident or the threat of nuclear war. But is this always the case? When there is a lot of news coverage about ‘nuclear’, is there also a lot of news coverage that includes ‘danger’, or do we also see peaks in coverage about ‘nuclear’ in which the attention for ‘danger’ stays close to zero? A graph of search results per search term (on the Y-axis) mapped over time (on the X-axis) will illustrate the answer to this question. The articleset we use covers a few decades, which makes ‘years’ a logical time interval to use (in the ‘Interval’ box). Since time series are typically illustrated using line graphs, we will create a line graph (in Output Type). For now, we are interested in absolute numbers, so we will leave the ‘relative to’ box empty. After selecting these options and clicking ‘Query’, AmCAT produces the graph below.
Hovering over the graph displays text bubbles that show the amount of articles for each of the two search terms. Regarding the question we used to guide this example (‘When there is a lot of news coverage about ‘nuclear’ is there also a lot of coverage about ‘danger’ in these articles?’) you can see that the period between 1958 and 1963 is rather interesting, since it shows a few big peaks for ‘nuclear’ but not for ‘danger’. What is going on here? Is ‘nuclear’ in the news in some other way than its dangerous aspects, or is the search string we created for ‘danger’ perhaps too narrow? Inspection of some of the articles from this time period can help to make this clear. If you are satisfied with the graph as it is you can download the data for the graph from AmCAT as a table in a CSV file by clicking the Export aggregation button to the right of the graph. Using this file, you can reproduce the graph in Excel or a similar program.
Of course, AmCAT can also display search results as a table. The screenshot below again shows the search results for ‘nuclear’ and ‘danger’ in the 'nyc nuclear* or atom* in lead' article set, but now as a table instead of a line graph. As you can see, results over time are ordered from older on top to more recent towards the bottom.
Another interesting way to look at the same data (still about the question ‘When there is a lot of news coverage about ‘nuclear’, is there also a lot of coverage about ‘danger’ in these articles?’) to show the search results for the ‘danger’ search string as a proportion of the search results for the ‘nuclear’ search string. That is, which percentage of articles about nuclear energy deal with danger, and how does this vary over time? You can do so using the ‘Relative to’ output option (tick the box 'Make variables relative to and exclude the first column'). For this example, we are interested as ‘danger’ as a proportion of ‘nuclear’, so the search term for ‘nuclear’ is put first, by renaming it '1nuclear' and the other search term '2accident'. This brings up the results below.
Note that the proportion of articles about 'nuclear' also dealing with 'danger' was 0 in 1951, and from then on, slowly increased. Again, the next step is to investigate why this could be the case, by looking at the articles published in these time periods.