Editing 3.3:Reliability of search strings

The quality of your automatic content analysis depends on the quality of your search strings, which in turn depends on the reliability of your search strings. The reliability of a study refers to the question of whether this study is replicable or not. When replicable, other researchers find the same results using your research method. Besides reliable, your research method should also be valid, meaning that you actually measure what you intend to measure. Inreliable search terms per definition lead to invalid results. In automatic content analysis reliability is high, as different computers with identical instructions will generate the exact same results. However, validity is low, since a computer recognizes words but not concepts. In contrast, human coders with the same instructions will not always get the same results due to personal interpretations and cultural backgrounds, so the reliability of manual content analysis is often lower than the reliability of automatic content analysis. However, human coders are capable of recogizing concepts, which improves the validity of the results. 

An important distinction with regard to reliability in content analysis is the distinction between precision and recall. ''Precision'' refers to the question of whether the results found by AmCAT have truely been identified as positive results (i.e. texts that include the concept of your interest). Type 1 errors, search results that were falsely identified as positive, decrease the precision of your search term. ''Recall'' refers to the question of whether all the results that include your concept of interest have indeed been found by AmCAT. Type 2 errors, search results that were falsely identifies as negative, decrease the recall of your search terms. When the precision of your search terms is high, the recall is generally lower and vice versa. As a researcher, you thus need to find a balance between concessions with regard to precision and concessions with regard to recall. However, before you calculate the precision and recall of your search terms, it is important that you check the ''face validity'' of the search terms and the results. 

== Face validity ==

You can check the face validity of your search terms by taking a look at the AmCAT search results. You can do so by reading the articles that are identified as including your search terms and estimating whether they include the concept you intend to measure. AmCAT provides you with various opportunities to get access to these articles:
* Using the [[Summary|Summary function]], you can list all the articles including your search terms. You can access each of these documents by clicking on the titles in the list. You search terms are highlighted in red.
* Using the 'Graph' option of the [[Graph/Table|Graph/Table function]], you can click on every dot in the line and you will get a list of relevant articles. By clicking on the titles in the list you can access each article. 
* Using the 'ClusterMap' option of the [[Summary|Summary function]], you can make a Venn diagram. By clicking on a dot in the Venn diagram, you get access to this particular article. If you have a large number of articles, the venn digram displays a single large dot. By specifying your search instructions by selecting a certain period or medium, you can narrow the number of articles down and dots will appear. 

== Reliability ==

[[File:6.2.1 - AmCAT Navigator 3 Overlap Found Versus Actual Search Results.jpg|500px|thumb|right|Figure 6.2.1 - AmCAT Navigator 3 Overlap Found Versus Actual Search Results]]

Your search strings result in a collection of search results. With these search strings, you are looking for the articles in which your concepts occurs. The collection of actual search results refers to this collection of true search results, a collection that only contains articles referring to your concepts. If you have formulated very good search strings, these search terms measure the presence of the concept of your interest in a text. In that case, the collection of found articles and the collection of intented (= actual) articles are the same. However, incomplete or incorrect search strings can cause a discrepancy between the found versus actual search results. Figure 6.2.1 shows the overlap between a collection of found versus actual search results.

The area where found and actual search results overlap (TP) includes the correct search results. The TN area refers to te area outside of both the found and actual search results. These are search results that, indeed, should not have been found. The two remaining areas include incorrect search results. The FP area refers to search results that were found, but do not include an actual search result (i.e. they should not have been found). This is called a Type 1 error. The FN area refers to articles that are actual search results, but were not found. This is called a Type 2 error.

== Precision ==

Precision refers to the extend to which your search results actually include the concept you want to measure. The central question is whether your search strings merely include the concept of your interest, or also include irrelevant search results. Irrelevant results that have been identified as including your concept, but actually don't, are false positives. Put differently, precision refers to the number of true positives relative to the total collection of true and false positives (see the entire red/found results area in Figure 6.2.1). A calculation of the precision of a search string tests wheher the search string yields correct search results. The precision of your search string is expressed as the percentage of correct search results. The precision can be calculated by dividing the number of true positive search results by the sum of true and false positive search results. This means that: 


<div style="text-align: center;"> precision = true positives / (true positives + false positives) </div>


So, how can you determine the number of true and false positives? You can use the Query function in AmCAT. The most simple way to check if your search results actually measure the intended concepts is by displaying them in the context in which they occur. When you select the [[Summary|Summary function]], you get a list with search results and the context within which the search terms occur. You can calculate the precision by drawing a sample of X articles. For each article in this sample you check whether your search string has measured what you intended to measure in this particular article. If so, you label this article a true positive. If not, you label this article a false positive. Let's say, for example, that 13 of a total of 50 articles in your sample are false positives. 50 - 13 = 37 true positive > 37/50 = .74. Your precision would thus be 74%.  
 
== Recall ==

The recall of your search string refers to the extend to which you succeed in finding all relevant articles with your search string. Relevant articles that were not found with your search string are called false negatives. The central question is whether you have missed references to your concept in the text. This question is more difficult to answer than the question of whether your search results contain irrelevant results, as it concerns that you did not measure. The collection of articles in which your concept occurs is unknown. 

There are two strategies to determine the recall of your search string. The first is based on a formal calculation. You can calculate the recall of your search string by dividing the number of true positives by the sum of the number of true positives and the number of false negatives. This means that:


<div style="text-align: center;"> recall = true positives / (true positives + false negatives) </div>


In the section above is explained how you can find the number of true positives. Question remains how you can measure the number of false negatives. These latter results are not part of the collection of found articles, but remain unknown. The calculation of false negatives is based on an analysis of the articles in the index in which your search string does not occur. You read the articles in which your search string does not occur (true and false negatives) and determine how often your concept occurs (false negatives). You can use the index function in AmCAT for this. You enter '* NOT (SEARCH STRING) in the AmCAT Query search screen. Next, you read a sample of the found articles and check whether the concept of your interest occurs or not. If it does, this is a false negative. If it does not, it is a true negative. Let's say, for example, you find that 1200 articles that do not include your search term. 3 of a total of 50 articles in your sample are false negatives. You now know that approximately 3/50 = .06 = 6% of the articles was falsely identified as negative. Based on that, you can calculate that .06 * 1200 = 72 false negatives. If you have previously found 400 true positives (an example), your recall will be 400/(400+72) = .85 > 85%. 

This method has two limitations. First, the method works worse to the extent your search string occurs less often in the acrticles to be studied. In addition, the artciles you are looking for are unknown, which means that you have no idea how often you concept occurs in these articles. As a result, you don't know how many articles you have to read before you can make reliable statements about therecall of your search strings. Second, the method described above does not work for search strings including the the boolean operator 'NOT' (as it will then occur two times in one search string). There are two solutions for this problem. For concepts you don't expect to occur often you can adjust your recall check by replacing the * by a word you assume to be present in (almost) all the articles including your concept. However, this solution does not solve the problem of the double boolean operator 'NOT'. The second method is mostly theoretical. This method consists of a reflection on the question of whether your search string includes all possible references to the concept of your interest.