Remove Mismatched DNA Sequences — rmBadStrings

Identify and remove bad DNA sequences within a string set. Sequentially removes mismatches until all sequences align.

Usage

rmBadStrings_2(
  DNAStringSet,
  specimen_dataframe,
  rmOutliers = F,
  max_Z_score = 3
)

Arguments

DNAStringSet: A DNA string set object.
specimen_dataframe: A dataframe with speciment data created using querySpecData().
rmOutliers: A logical value to state whether to remove DNA distance outlier strings
max_Z_score: A numerical value to change the max Z score when removing outliers.

Value

A list with two elements: the DNA string set with the mismatched sequences removed (1st element) and the specimen dataframe with data for the mismatched sequences removed (2nd element).

Examples

specdata <- PhyInsight::querySpecData("Panthera leo")

specdata <- subset(specdata, markercode == "COI-5P")

DNABin_Leo <- genDNABin(specdata)

DNAStringset_Leo <- genDNAStringSet(DNABin_Leo)

DNAStringSet_Leo_manipulated <- ManipStringSet(DNAStringset_Leo)
#> ========================================================================================================================================================================================================
#> 
#> Time difference of 0.03 secs
#> Determining distance matrix based on shared 9-mers:
#> ================================================================================
#> 
#> Time difference of 0.01 secs
#> 
#> Clustering into groups by similarity:
#> ================================================================================
#> 
#> Time difference of 0.01 secs
#> 
#> Aligning Sequences:
#> ================================================================================
#> 
#> Time difference of 0.49 secs
#> 
#> Iteration 1 of 2:
#> 
#> Determining distance matrix based on alignment:
#> ================================================================================
#> 
#> Time difference of 0 secs
#> 
#> Reclustering into groups by similarity:
#> ================================================================================
#> 
#> Time difference of 0.01 secs
#> 
#> Realigning Sequences:
#> ================================================================================
#> 
#> Time difference of 0.25 secs
#> 
#> Iteration 2 of 2:
#> 
#> Determining distance matrix based on alignment:
#> ================================================================================
#> 
#> Time difference of 0 secs
#> 
#> Reclustering into groups by similarity:
#> ================================================================================
#> 
#> Time difference of 0.01 secs
#> 
#> Realigning Sequences:
#> ================================================================================
#> 
#> Time difference of 0.01 secs
#> 
#> Refining the alignment:
#> ================================================================================
#> 
#> Time difference of 0.11 secs
#> 

StringsAndSpecdataframe <- rmMismatchStrings_2(

 DNAStringSet = DNAStringSet_Leo_manipulated,
 specimen_dataframe = specdata

)
#> Error in rmMismatchStrings_2(DNAStringSet = DNAStringSet_Leo_manipulated,     specimen_dataframe = specdata): could not find function "rmMismatchStrings_2"

DNAStringSet_NEW <- StringsAndSpecdataframe[[1]]
#> Error in eval(expr, envir, enclos): object 'StringsAndSpecdataframe' not found

tail(DNAStringSet_NEW)
#> Error in eval(expr, envir, enclos): object 'DNAStringSet_NEW' not found

specimen_dataframe_NEW <- StringsAndSpecdataframe[[2]]
#> Error in eval(expr, envir, enclos): object 'StringsAndSpecdataframe' not found

tail(specimen_dataframe_NEW$processid)
#> Error in eval(expr, envir, enclos): object 'specimen_dataframe_NEW' not found