Remove Mismatched DNA Sequences
rmBadStrings_2.Rd
Identify and remove bad DNA sequences within a string set. Sequentially removes mismatches until all sequences align.
Arguments
- DNAStringSet
A DNA string set object.
- specimen_dataframe
A dataframe with speciment data created using querySpecData().
- rmOutliers
A logical value to state whether to remove DNA distance outlier strings
- max_Z_score
A numerical value to change the max Z score when removing outliers.
Value
A list with two elements: the DNA string set with the mismatched sequences removed (1st element) and the specimen dataframe with data for the mismatched sequences removed (2nd element).
Examples
specdata <- PhyInsight::querySpecData("Panthera leo")
specdata <- subset(specdata, markercode == "COI-5P")
DNABin_Leo <- genDNABin(specdata)
DNAStringset_Leo <- genDNAStringSet(DNABin_Leo)
DNAStringSet_Leo_manipulated <- ManipStringSet(DNAStringset_Leo)
#> ========================================================================================================================================================================================================
#>
#> Time difference of 0.03 secs
#> Determining distance matrix based on shared 9-mers:
#> ================================================================================
#>
#> Time difference of 0.01 secs
#>
#> Clustering into groups by similarity:
#> ================================================================================
#>
#> Time difference of 0.01 secs
#>
#> Aligning Sequences:
#> ================================================================================
#>
#> Time difference of 0.49 secs
#>
#> Iteration 1 of 2:
#>
#> Determining distance matrix based on alignment:
#> ================================================================================
#>
#> Time difference of 0 secs
#>
#> Reclustering into groups by similarity:
#> ================================================================================
#>
#> Time difference of 0.01 secs
#>
#> Realigning Sequences:
#> ================================================================================
#>
#> Time difference of 0.25 secs
#>
#> Iteration 2 of 2:
#>
#> Determining distance matrix based on alignment:
#> ================================================================================
#>
#> Time difference of 0 secs
#>
#> Reclustering into groups by similarity:
#> ================================================================================
#>
#> Time difference of 0.01 secs
#>
#> Realigning Sequences:
#> ================================================================================
#>
#> Time difference of 0.01 secs
#>
#> Refining the alignment:
#> ================================================================================
#>
#> Time difference of 0.11 secs
#>
StringsAndSpecdataframe <- rmMismatchStrings_2(
DNAStringSet = DNAStringSet_Leo_manipulated,
specimen_dataframe = specdata
)
#> Error in rmMismatchStrings_2(DNAStringSet = DNAStringSet_Leo_manipulated, specimen_dataframe = specdata): could not find function "rmMismatchStrings_2"
DNAStringSet_NEW <- StringsAndSpecdataframe[[1]]
#> Error in eval(expr, envir, enclos): object 'StringsAndSpecdataframe' not found
tail(DNAStringSet_NEW)
#> Error in eval(expr, envir, enclos): object 'DNAStringSet_NEW' not found
specimen_dataframe_NEW <- StringsAndSpecdataframe[[2]]
#> Error in eval(expr, envir, enclos): object 'StringsAndSpecdataframe' not found
tail(specimen_dataframe_NEW$processid)
#> Error in eval(expr, envir, enclos): object 'specimen_dataframe_NEW' not found