19.04.2015 EXERCISES STR-VALIDATOR GETTING STARTED It is assumed that you have installed R (e.g. RGui or RStudio). To install STR-validator (package name is without dash and all lower case letters: strvalidator) follow the instructions given in the document strvalidator_installation.pdf that can be downloaded from the STR-validator website (https://sites.google.com/site/forensicapps/strvalidator). To start the graphical user interface type the following two lines in the R console (press enter after each command to execute): library(strvalidator) strvalidator() Although most STR-validator functions can be performed by R commands these exercises will focus on using the graphical user interface. EXERCISE 1 – REPEATABILITY In this exercise you will load an in-built example dataset with 8 replicates of a positive control sample and calculate height metrics to evaluate the repeatability of the STR analysis. a) First load the in-built example data into the R environment by typing data(set1) in the R console. The name of the dataset is ‘set1’. b) Select the Workspace tab in STR-validator and click the drop-down list in the Load objects from R workspace area. Select set1 and click the button Load object. set1 appears in the object list in the STRvalidator workspace for the current project. To inspect the data select set1 and click the View button. The dataset contain 8 positive controls, 1 negative control and 1 allelic ladder. c) To enable analysis the GeneMapper-formatted (Figure 1) data must be converted to the STR-validator format (Figure 2). In the Tools tab click the Slim button. A new window open. Select dataset set1 in the drop-down list. Accept the defaults and click the Slim dataset button. The new dataset has been saved in the STR-validator workspace. d) Select the Tools tab and click the Height button. Select the set1_slim dataset and uncheck the option Add result to dataset (by doing this a new dataset will be created). Let the other options remain checked. Click the Calculate button. The result is now saved and can be view using the View or Edit button in any tab. It should look like in Figure 3 (left). e) We now have the total peak height and number of peaks for each sample in set1, even the negative control and ladder. But what if we want to exclude artefact peaks, like stutters, and only sum the heights for actual alleles? Also, we probably want to get rid of the negative control and ladder. To accomplish this we need to import the known profile of the positive control sample. Type data(ref1) in the R console and load the dataset into STR-validator as explained in point b). Convert the dataset as explained in point c) but this time also rename the result to ref1 so to not clutter the workspace (overwrite when asked to). Page 1 (7) 19.04.2015 f) Select the Tools tab and click the Filter button. Select the set1_slim dataset and the ref1 reference dataset. The Check subsetting button is used to see which samples are matching the reference samples. Leave the default settings and click the Filter profile button. The known alleles and peak heights have now been pulled out from each replicate control sample and saved as a new dataset set1_slim_filter. g) Repeat point d) with the set1_slim_filter dataset. The result (set1_slim_filter_height) now looks like Figure 3 (middle). As can be seen the negative control and ladder is now absent. They did not match the reference sample “PC” and was therefore newer included in filtered dataset. h) A better metric than total peak height is average peak height (H) since it is easy to relate to the DNA profile. To calculate H we need to know if a marker is heterozygous or homozygous. Normally this information is calculated for reference datasets (where you know that all alleles are present), and then added to other datasets (where you can’t be sure that all alleles are observed). See section 9.3 and 9.4 in the STR-validator tutorial (version 1.4). In this case, however, we can see from Figure 3 (middle) that all alleles are present in all samples (i.e. 33 alleles per sample) so we can take a short-cut. In the Tools tab click the Heterozygous button. Select the set1_slim_filter dataset in the drop-down list. Click Calculate. This adds a column Heterozygous to the dataset with 1 to indicate heterozygous alleles and 0 to indicate homozygous alleles. The new dataset is saved as set1_slim_filter_het. i) Repeat point d) with the set1_slim_filter_het dataset. The result (set1_slim_filter_het_height) now looks like Figure 3 (right). There is a new column H in the dataset. In the current version (1.4.0) there is no summary function for peak heights. If you would like to continue the analysis in a spread-sheet software, open the dataset using the Edit button. Select the dataset set1_slim_filter_het_height and click the Copy to clipboard button. Paste the data in the spread-sheet program and apply e.g. functions for average and standard deviation to the data. The result can look like in Figure 4. Sample.Name PC1 PC1 PC1 PC1 PC1 PC1 PC1 Marker AMEL D3S1358 TH01 D21S11 D18S51 D10S1248 D1S1656 Dye Allele.1 Allele.2 Allele.3 Allele.4 Allele.5 Height.1 Height.2 Height.3 Height.4 Height.5 B X OL Y NA NA 2486 81 2850 NA NA B 16 17 18 NA NA 260 3251 2985 NA NA B 6 9.3 NA NA NA 3357 2687 NA NA NA B 28 29 30.2 31.2 NA 183 2036 180 1942 NA B 15 16 17 18 NA 161 2051 203 1617 NA G 12 13 14 15 NA 168 2142 243 2230 NA G 11 12 13 NA NA 249 3149 3965 NA NA FIGURE 1. GENEMAPPER FORMATTED DATA. Sample.Name PC1 PC1 PC1 PC1 PC1 PC1 PC1 Marker AMEL AMEL AMEL D3S1358 D3S1358 D3S1358 TH01 Dye Allele Height B X 2486 B OL 81 B Y 2850 B 16 260 B 17 3251 B 18 2985 B 6 3357 FIGURE 2. STR-VALIDATOR FORMATTED DATA. Page 2 (7) 19.04.2015 Sample.Name PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 NC Ladder TPH Peaks 109875 59 135465 62 139962 60 115650 60 138392 60 106037 60 118332 61 92227 60 0 0 111651 82 Sample.Name PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 TPH Peaks 103748 33 127760 33 131594 33 108599 33 130122 33 99703 33 111548 33 86923 33 Sample.Name PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 H TPH Peaks 3051.4 103748 33 3757.6 127760 33 3870.4 131594 33 3194.1 108599 33 3827.1 130122 33 2932.4 99703 33 3280.8 111548 33 2556.6 86923 33 FIGURE 3. RESULT AFTER COMPLETION OF POINT D) (LEFT), G) (MIDDLE) AND I) (RIGHT) RESPECTIVELY. Sample.Name PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 Mean: StdDev: H TPH Peaks 3051.41 103748 33 3757.65 127760 33 3870.41 131594 33 3194.09 108599 33 3827.12 130122 33 2932.44 99703 33 3280.82 111548 33 2556.56 86923 33 3308.8 112499.6 33 474.1 16118.5 0 FIGURE 4. EXAMPLE OF EXTERNAL ANALYSIS OF STR-VALIDATOR RESULTS. EXERCISE 2 – HETEROZYGOTE- AND INTER-LOCUS BALANCE You will use the same data as in Exercise 1 and calculate the heterozygote balance and the inter-locus balance. However we will start with saving the project before continuing the analysis. a) Select the Workspace tab and click the Save As button. Locate the folder where you want to save the project. Click Ok, and then type a project name in the dialog box. Click Ok to save the project. A dialog box showing the complete path to the project is shown when the project has been saved. The next time it is enough to click the Save button to save the project. b) Select the Balance tab and click the Calculate button in the Intralocus and interlocus balance group. Select the set1_slim_filter dataset and ref1 reference dataset in the respective drop-down list. Select to calculate the heterozygote balance using the High peak / low peak option. Calculate the proportional inter-locus balance within each dye. Click the Calculate button. The result is saved as set1_slim_filter_balance. c) To plot the balance data click the Plot button in the Intralocus and interlocus balance group. Select the set1_slim_filter_balance dataset in the drop-down list. Select the option fixed scales under the Axes expandable option group. Click the Hb_vs_Height to plot the heterozygote balance and Lb_vs_Height to plot the inter-locus balance. The plots can be saved as modifiable plot objects or as images. Page 3 (7) 19.04.2015 d) Click the Summarize button to calculate summary statistics. Select the set1_slim_filter_balance th dataset in the drop-down list. Calculate the 5 quantile per locus. The now rather long name for the result dataset is set1_slim_filter_balance_table_locus. The name for the result can be changed before performing a calculation. It can also be changed at any time in the Workspace tab. However, for the sake of clarity, we will often accept the default name during the exercises (it gives us a summary of the actions that has been applied to the datasets). e) To complete the summary statistics we are going to add dye information to the dataset. Click the Dye button in the Tools tab. Select the set1_slim_filter_balance_table_locus dataset in the drop-down list. Click Add dye. The result, rounded to two decimals, is shown in Figure 5. Marker Hb.n Hb.Min Hb.Mean Hb.Stdv Hb.Perc.5 Lb.n Lb.Min Lb.Mean Lb.Stdv Lb.Perc.5 Color AMEL 8 0.66 0.85 0.09 0.71 8 0.20 0.21 0.01 0.20 blue D3S1358 8 0.65 0.79 0.11 0.67 8 0.20 0.24 0.03 0.20 blue TH01 8 0.60 0.77 0.12 0.64 8 0.20 0.22 0.02 0.20 blue D21S11 8 0.56 0.84 0.15 0.61 8 0.16 0.17 0.02 0.16 blue D18S51 8 0.61 0.82 0.13 0.66 8 0.13 0.16 0.02 0.13 blue D10S1248 8 0.57 0.86 0.13 0.66 8 0.20 0.23 0.03 0.20 green D1S1656 8 0.79 0.90 0.08 0.80 8 0.24 0.29 0.03 0.25 green D2S1338 8 0.68 0.80 0.10 0.68 8 0.23 0.26 0.02 0.23 green D16S539 8 0.66 0.85 0.10 0.71 8 0.19 0.22 0.02 0.20 green D22S1045 0 NA NA NA NA 8 0.19 0.23 0.03 0.20 yellow vWA 8 0.68 0.82 0.10 0.69 8 0.18 0.21 0.02 0.18 yellow D8S1179 8 0.71 0.90 0.09 0.75 8 0.24 0.28 0.02 0.25 yellow FGA 8 0.68 0.89 0.12 0.70 8 0.24 0.28 0.03 0.24 yellow D2S441 8 0.72 0.88 0.09 0.74 8 0.32 0.37 0.04 0.33 red D12S391 8 0.58 0.78 0.14 0.61 8 0.15 0.19 0.03 0.15 red D19S433 8 0.68 0.85 0.10 0.71 8 0.22 0.25 0.02 0.22 red SE33 8 0.59 0.75 0.13 0.60 8 0.17 0.19 0.02 0.17 red Dye B B B B B G G G G Y Y Y Y R R R R FIGURE 5. BALANCE SUMMARY STATISTICS FOR THE POSITIVE CONTROL SAMPLES. D22S1045 IS LACKING HB VALUES BECAUSE THE POSITIVE CONTROL SAMPLE IS HOMOZYGOUS IN THIS MARKER. THE OPTIMAL LOCUS BALANCE IS CALCULATED BY DIVIDING 1 WITH THE NUMBER OF MARKERS IN EACH COLOUR (OR THE TOTAL NUMBER OF MARKERS IF CALCULATING THE OVERALL BALANCE). EXERCISE 3 – CREATE AN EPG DNA profiles can be visualised as EPGs from within STR-validator. Select the Tools tab and click the EPG button. Select the unfiltered dataset set1_slim in the drop-down list. Select one of the positive control samples in the sample drop-down list. Click Generate EPG. EXERCISE 4 –STUTTER RATIOS We will continue with dataset set1 and calculate stutter ratios. The stutter ratios will be plotted and the result will be summarized. a) Select the Stutter tab and click the Calculate button. Select the unfiltered dataset set1_slim and reference ref1. It is important to note that no stutter filter was used to analyse this data in GeneMapperID-X. Calculate stutters within the range ±1 repeat unit. Select the option no overlap between stutters and alleles. Leave the replacement table as is and click the Calculate button. Page 4 (7) 19.04.2015 b) Click the Plot button. Select the set1_slim_stutter dataset in the drop-down list and plot the Ratio vs. Allele. From the plot you can quickly tell that all observed alleles are below 15%. Both -1 and +1 repeat stutters are observed. Do you observe any notable differences between the loci? c) Now plot the Ratio vs. Height. What is your conclusion? Close the plotting window. d) In the Stutter tab click the Summarize button. Select the set1_slim_stutter dataset in the drop-down th list. Calculate the 95 quantile on a per stutter basis. Click the Summarize button. The result is saved as set1_slim_stutter_table_stutter and should be similar to Figure 6. Marker Type n.alleles n.stutters Mean Stdv Perc.95 Max D21S11 -1 2 16 0.080 0.007 0.091 0.093 D18S51 -1 1 8 0.080 0.004 0.086 0.087 D10S1248 -1 1 8 0.078 0.004 0.084 0.085 D2S1338 -1 2 16 0.086 0.018 0.112 0.112 D16S539 -1 2 15 0.064 0.023 0.089 0.094 D22S1045 -1 1 8 0.119 0.005 0.126 0.126 D22S1045 1 1 8 0.058 0.005 0.065 0.069 vWA -1 2 16 0.091 0.012 0.112 0.125 FGA -1 2 16 0.063 0.012 0.076 0.077 D2S441 -1 2 16 0.052 0.010 0.065 0.069 D2S441 1 1 1 0.009 NA 0.009 0.009 D12S391 -1 2 16 0.106 0.027 0.140 0.146 D12S391 1 1 1 0.025 NA 0.025 0.025 FIGURE 6. RESULT FROM STUTTER ANALYSIS OF 8 POSITIVE CONTROL SAMPLES. EXERCISE 5 – CONCORDANCE In this exercise you will perform a concordance analysis of samples analysed with SGM Plus and ESX 17. Any discordance is listed and the overall concordance is calculated. The data for this exercise is found in the Concordance folder. a) Select the Workspace tab and click the Import button. Locate the file concordance_esx17.txt and give the dataset a name e.g. conc_esx and click the Import button. Repeat the procedure to import the file concordance_sgm_plus.txt as for example conc_sgm. The data have been analysed according to standard procedures in GeneMapperID-X and artefacts such as stutters and pull-up peaks has been removed. The data are already in STR-validator format so there is no need to Slim the data. b) Select the Concordance tab and click the Calculate button. Select the conc_esx dataset and check that the kit has been correctly detected as ESX17. Click the Add button. Select the conc_sgm dataset and check that the kit has been correctly detected as SGMPlus. Click the Add button. The kit names in the Names for analysis kit text field can be changed as they are only labels for the result tables. Click the Calculate button. Two result tables are created: one for the overall concordance (table_concordance) and one listing all discordances (table_discordance). c) Click the Edit button to view the result. Select the table_discordance dataset. It should look as in Figure 7. Select the table_concordance to check the overall result (Figure 8). Page 5 (7) 19.04.2015 Sample.Name Sample_02 Sample_04 Sample_07 Marker ESX17 SGMPlus D3S1358 15,20 15,OL D16S539 9,13 9 D3S1358 16,21 16,OL FIGURE 7. A LIST OF DISCORDANCES (TWO OF-LADDER ALLELES AND ONE FALSE HOMOZYGOTE) FOUND COMPARING THE ESX17 AND THE SGMPLUS TYPING KIT. Kits Samples Loci Alleles Discordances Concordance ESX17 vs. SGMPlus 10 10 200 3 98.5 FIGURE 8. THE OVERALL CONCORDANCE BETWEEN ESX17 AND SGMPLUS TYPING KIT. EXERCISE 6 – MIXTURES In this exercise you will perform a validation analysis of mixtures. For each mixture the mixture proportion (Mx) per marker, the sample Mx average, and the difference from the sample average is calculated. The number of observed and expected unshared alleles is listed and the percentage profile is calculated. In addition any dropin peaks are counted. The data for this exercise is found in the Mixture folder. a) Select the Workspace tab and click the Import button. Locate the file mixtures.txt and give the dataset a name e.g. mix_data and click the Import button. Repeat the procedure to import the file ref_major.txt and ref_minor.txt as e.g. mix_ref_major and mix_ref_minor respectively. The data have been analysed according to standard procedures in GeneMapperID-X and artefacts such as stutters and pull-up peaks has been removed. The data are already in STR-validator format so there is no need to Slim the data. b) Select the Mixture tab and click the Calculate button. Select mix_data in the drop-down list for datasets. Select mix_ref_major reference dataset for the major mixture component and mix_ref_minor reference dataset for the minor mixture component. Check the options Remove offladder alleles and Ignore drop-out. Click the Calculate button. The result is saved in mix_data_mixture and is summarized in Figure 9. Sample.Name Average Mx Observed Expected Profile Dropin major_minor_1 0.255 19 19 100.0 0 major_minor_2 0.160 13 19 68.4 3 FIGURE 9. EXAMPLE OF EXTERNAL ANALYSIS OF STR-VALIDATOR MIXTURE RESULTS. EXERCISE 7 – KIT MARKER RANGE COMPARISON Kit range marker plots can easily be created in STR-validator. Select the DryLab tab and click the Plot Kit button. Check one or more typing kits that you want to plot. Click the plot button. Many plot options can be customised. Click the plot button again to update the plot. Plots can be saved by clicking the Save as object or Save as image button. EXERCISE 8* – DROP-OUT This is a slightly updated guided exercise from a previous course. In this exercise you will perform a drop-out analysis from serially diluted samples to estimate the stochastic threshold. This is a more challenging exercise Page 6 (7) 19.04.2015 as it does not include a reference dataset, and there is contamination. The data for this exercise is found in the Dropout folder. The step-by-step guide guided_exercise_dropout.pdf is available in the same folder. EXERCISE 9* – DROP-IN This is a slightly updated guided exercise from a previous course. In this exercise you will perform process control of extraction negative controls and PCR non-template controls to estimate the probability of drop-in contamination. This is a more challenging exercise as it includes multiple steps and tools of STR-validator. The data for this exercise is found in the Contamination folder. The step-by-step guide guided_exercise_dropin.pdf is available in the same folder. EXERCISE 10 – RESULT TYPE AND PEAKS In Chapter 6 and 7 in the tutorial you learn how to perform a result type and peak analysis. This can be an easy way to compare e.g. two extraction methods using real crime scene samples by categorise the result as e.g. full, partial, or negative. EXERCISE 11 – PRECISION In Chapter 8 in the tutorial you learn how to perform a precision analysis of allelic ladders. Page 7 (7)
© Copyright 2024