Skip to contents

Identify and remove bad DNA sequences within a string set. Sequentially removes mismatches until all sequences align.


  rmOutliers = F,
  max_Z_score = 3



A DNA string set object.


A dataframe with speciment data created using querySpecData().


A logical value to state whether to remove DNA distance outlier strings


A numerical value to change the max Z score when removing outliers.


A list with two elements: the DNA string set with the mismatched sequences removed (1st element) and the specimen dataframe with data for the mismatched sequences removed (2nd element).


specdata <- PhyInsight::querySpecData("Panthera leo")

specdata <- subset(specdata, markercode == "COI-5P")

DNABin_Leo <- genDNABin(specdata)

DNAStringset_Leo <- genDNAStringSet(DNABin_Leo)

DNAStringSet_Leo_manipulated <- ManipStringSet(DNAStringset_Leo)
#> ========================================================================================================================================================================================================
#> Time difference of 0.03 secs
#> Determining distance matrix based on shared 9-mers:
#> ================================================================================
#> Time difference of 0.01 secs
#> Clustering into groups by similarity:
#> ================================================================================
#> Time difference of 0.01 secs
#> Aligning Sequences:
#> ================================================================================
#> Time difference of 0.49 secs
#> Iteration 1 of 2:
#> Determining distance matrix based on alignment:
#> ================================================================================
#> Time difference of 0 secs
#> Reclustering into groups by similarity:
#> ================================================================================
#> Time difference of 0.01 secs
#> Realigning Sequences:
#> ================================================================================
#> Time difference of 0.25 secs
#> Iteration 2 of 2:
#> Determining distance matrix based on alignment:
#> ================================================================================
#> Time difference of 0 secs
#> Reclustering into groups by similarity:
#> ================================================================================
#> Time difference of 0.01 secs
#> Realigning Sequences:
#> ================================================================================
#> Time difference of 0.01 secs
#> Refining the alignment:
#> ================================================================================
#> Time difference of 0.11 secs

StringsAndSpecdataframe <- rmMismatchStrings_2(

 DNAStringSet = DNAStringSet_Leo_manipulated,
 specimen_dataframe = specdata

#> Error in rmMismatchStrings_2(DNAStringSet = DNAStringSet_Leo_manipulated,     specimen_dataframe = specdata): could not find function "rmMismatchStrings_2"

DNAStringSet_NEW <- StringsAndSpecdataframe[[1]]
#> Error in eval(expr, envir, enclos): object 'StringsAndSpecdataframe' not found

#> Error in eval(expr, envir, enclos): object 'DNAStringSet_NEW' not found

specimen_dataframe_NEW <- StringsAndSpecdataframe[[2]]
#> Error in eval(expr, envir, enclos): object 'StringsAndSpecdataframe' not found

#> Error in eval(expr, envir, enclos): object 'specimen_dataframe_NEW' not found