Line 17: | Line 17: | ||
The area where found and actual search results overlap (TP) includes the correct search results. The TN area refers to te area outside of both the found and actual search results. These are search results that, indeed, should not have been found. The two remaining areas include incorrect search results. The FP area refers to search results that were found, but do not include an actual search result (i.e. they should not have been found). This is called a Type 1 error. The FN area refers to articles that are actual search results, but were not found. This is called a Type 2 error. | The area where found and actual search results overlap (TP) includes the correct search results. The TN area refers to te area outside of both the found and actual search results. These are search results that, indeed, should not have been found. The two remaining areas include incorrect search results. The FP area refers to search results that were found, but do not include an actual search result (i.e. they should not have been found). This is called a Type 1 error. The FN area refers to articles that are actual search results, but were not found. This is called a Type 2 error. | ||
+ | |||
== Precision == | == Precision == |
The quality of your automatic content analysis depends on the quality of your search strings, which in turn depends on the reliability of your search strings. The reliability of a study refers to the question of whether this study is replicable or not. When replicable, other researchers find the same results using your research method. Besides reliable, your research method should also be valid, meaning that you actually measure what you intend to measure. Inreliable search terms per definition lead to invalid results. In automatic content analysis reliability is high, as different computers with identical instructions will generate the exact same results. However, validity is low, since a computer recognizes words but not concepts. In contrast, human coders with the same instructions will not always get the same results due to personal interpretations and cultural backgrounds, so the reliability of manual content analysis is often lower than the reliability of automatic content analysis. However, human coders are capable of recogizing concepts, which improves the validity of the results.
An important distinction with regard to reliability in content analysis is the distinction between precision and recall. Precision refers to the question of whether the results found by AmCAT have truely been identified as positive results (i.e. texts that include the concept of your interest). Type 1 errors, search results that were falsely identified as positive, decrease the precision of your search term. Recall refers to the question of whether all the results that include your concept of interest have indeed been found by AmCAT. Type 2 errors, search results that were falsely identifies as negative, decrease the recall of your search terms. When the precision of your search terms is high, the recall is generally lower and vice versa. As a researcher, you thus need to find a balance between concessions with regard to precision and concessions with regard to recall. However, before you calculate the precision and recall of your search terms, it is important that you check the face validity of the search terms and the results.
You can check the face validity of your search terms by taking a look at the AmCAT search results. You can do so by reading the articles that are identified as including your search terms and estimating whether they include the concept you intend to measure. AmCAT provides you with various opportunities to get access to these articles:
Your search strings result in a collection of search results. With these search strings, you are looking for the articles in which your concepts occurs. The collection of actual search results refers to this collection of true search results, a collection that only contains articles referring to your concepts. If you have formulated very good search strings, these search terms measure the presence of the concept of your interest in a text. In that case, the collection of found articles and the collection of intented (= actual) articles are the same. However, incomplete or incorrect search strings can cause a discrepancy between the found versus actual search results. Figure 6.2.1 shows the overlap between a collection of found versus actual search results.
The area where found and actual search results overlap (TP) includes the correct search results. The TN area refers to te area outside of both the found and actual search results. These are search results that, indeed, should not have been found. The two remaining areas include incorrect search results. The FP area refers to search results that were found, but do not include an actual search result (i.e. they should not have been found). This is called a Type 1 error. The FN area refers to articles that are actual search results, but were not found. This is called a Type 2 error.
Precision refers to the extend to which your search results actually include the concept you want to measure. The central question is whether your search strings merely include the concept of your interest, or also include irrelevant search results. Irrelevant results that have been identified as including your concept, but actually don't, are false positives. Put differently, precision refers to the number of true positives relative to the total collection of true and false positives (see the entire red area in Figure 6.2.1). A calculation of the precision of a search string tests wheher the search string yields correct search results. The precision of your search string is expressed as the percentage of correct search results. The precision can be calculated by dividing the number of true positive search results by the sum of true and false positive search results. This means that: precision = true positives / (true positives + false positives).
So, how can you determine the number of true and false positives? You can use the Query function in AmCAT. The most simple way to check if your search results actually measure the intended concepts is by displaying them in the context in which they occur. When you select the Summary function, you get a list with search results and the context within which the search terms occur. You can calculate the precision by drawing a sample of X articles. For each article in this sample you check whether your search string has measured what you intended to measure in this particular article. If so, you label this article a true positive. If not, you label this article a false positive. Let's say, for example, that 13 of a total of 50 articles in your sample are false positives. 50 - 13 = 37 true positive > 37/50 = .74. Your precision would be 74%.