Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
Latest revision | Your text | ||
Line 1: | Line 1: | ||
− | |||
− | |||
The quality of your automatic content analysis depends on the quality of your search strings, which in turn depends on the reliability of your search strings. The reliability of a study refers to the question of whether this study is replicable or not. When replicable, other researchers find the same results using your research method. Besides reliable, your research method should also be valid, meaning that you actually measure what you intend to measure. Inreliable search terms per definition lead to invalid results. In automatic content analysis reliability is high, as different computers with identical instructions will generate the exact same results. However, validity is low, since a computer recognizes words but not concepts. In contrast, human coders with the same instructions will not always get the same results due to personal interpretations and cultural backgrounds, so the reliability of manual content analysis is often lower than the reliability of automatic content analysis. However, human coders are capable of recogizing concepts, which improves the validity of the results. | The quality of your automatic content analysis depends on the quality of your search strings, which in turn depends on the reliability of your search strings. The reliability of a study refers to the question of whether this study is replicable or not. When replicable, other researchers find the same results using your research method. Besides reliable, your research method should also be valid, meaning that you actually measure what you intend to measure. Inreliable search terms per definition lead to invalid results. In automatic content analysis reliability is high, as different computers with identical instructions will generate the exact same results. However, validity is low, since a computer recognizes words but not concepts. In contrast, human coders with the same instructions will not always get the same results due to personal interpretations and cultural backgrounds, so the reliability of manual content analysis is often lower than the reliability of automatic content analysis. However, human coders are capable of recogizing concepts, which improves the validity of the results. | ||
Line 8: | Line 6: | ||
You can check the face validity of your search terms by taking a look at the AmCAT search results. You can do so by reading the articles that are identified as including your search terms and estimating whether they include the concept you intend to measure. AmCAT provides you with various opportunities to get access to these articles: | You can check the face validity of your search terms by taking a look at the AmCAT search results. You can do so by reading the articles that are identified as including your search terms and estimating whether they include the concept you intend to measure. AmCAT provides you with various opportunities to get access to these articles: | ||
− | * Using the [[ | + | * Using the [[Summary|Summary function]], you can list all the articles including your search terms. You can access each of these documents by clicking on the titles in the list. You search terms are highlighted in red. |
− | * Using the 'Graph' option of the [[ | + | * Using the 'Graph' option of the [[Graph/Table|Graph/Table function]], you can click on every dot in the line and you will get a list of relevant articles. By clicking on the titles in the list you can access each article. |
− | * Using the 'ClusterMap' option of the [[ | + | * Using the 'ClusterMap' option of the [[Summary|Summary function]], you can make a Venn diagram. By clicking on a dot in the Venn diagram, you get access to this particular article. If you have a large number of articles, the venn digram displays a single large dot. By specifying your search instructions by selecting a certain period or medium, you can narrow the number of articles down and dots will appear. |
== Reliability == | == Reliability == | ||
Line 19: | Line 17: | ||
The area where found and actual search results overlap (TP) includes the correct search results. The TN area refers to te area outside of both the found and actual search results. These are search results that, indeed, should not have been found. The two remaining areas include incorrect search results. The FP area refers to search results that were found, but do not include an actual search result (i.e. they should not have been found). This is called a Type 1 error. The FN area refers to articles that are actual search results, but were not found. This is called a Type 2 error. | The area where found and actual search results overlap (TP) includes the correct search results. The TN area refers to te area outside of both the found and actual search results. These are search results that, indeed, should not have been found. The two remaining areas include incorrect search results. The FP area refers to search results that were found, but do not include an actual search result (i.e. they should not have been found). This is called a Type 1 error. The FN area refers to articles that are actual search results, but were not found. This is called a Type 2 error. | ||
+ | |||
== Precision == | == Precision == | ||
− | Precision refers to the extend to which your search results actually include the concept you want to measure. The central question is whether your search strings merely include the concept of your interest, or also include irrelevant search results. Irrelevant results that have been identified as including your concept, but actually don't, are false positives. Put differently, precision refers to the number of true positives relative to the total collection of true and false positives (see the entire red | + | Precision refers to the extend to which your search results actually include the concept you want to measure. The central question is whether your search strings merely include the concept of your interest, or also include irrelevant search results. Irrelevant results that have been identified as including your concept, but actually don't, are false positives. Put differently, precision refers to the number of true positives relative to the total collection of true and false positives (see the entire red area in Figure 6.2.1). A calculation of the precision of a search string tests wheher the search string yields correct search results. The precision of your search string is expressed as the percentage of correct search results. The precision can be calculated by dividing the number of true positive search results by the sum of true and false positive search results. This means that: precision = true positives / (true positives + false positives). |
− | + | So, how can you determine the number of true and false positives? You can use the Query function in AmCAT. The most simple way to check if your search results actually measure the intended concepts is by displaying them in the context in which they occur. When you select the [[Summary|Summary function]], you get a list with search results and the context within which the search terms occur. You can calculate the precision by drawing a sample of X articles. For each article in this sample you check whether your search string has measured what you intended to measure in this particular article. If so, you label this article a true positive. If not, you label this article a false positive. Let's say, for example, that 13 of a total of 50 articles in your sample are false positives. 50 - 13 = 37 true positive > 37/50 = .74. Your precision would be 74%. | |
− | + | ||
− | + | ||
− | + | ||
− | So, how can you determine the number of true and false positives? You can use the Query function in AmCAT. The most simple way to check if your search results actually measure the intended concepts is by displaying them in the context in which they occur. When you select the [[ | + | |
== Recall == | == Recall == | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− |