5th Feb, 2014

Industry

Question

Asked 2nd Feb, 2014

** Package ade4, command dist.binary

I would like to estimate the genetic distance using simple matching approach "Simple matching coefficient of Sokal & Michener (1958)"

My file name is snp and I run the command as follow

snp

for (i in 1:10) {

d <- dist.binary(snp, method = 2)

cat(attr(d, "2"), is.euclid(d), "\n")}

and the results were

> for (i in 1:10) {

+ d <- dist.binary(snp, method = 2)

+ cat(attr(d, "2"), is.euclid(d), "\n")}

TRUE

TRUE

TRUE

TRUE

TRUE

TRUE

TRUE

TRUE

TRUE

TRUE

Warning messages:

1: In is.euclid(d) : Zero distance(s)

2: In is.euclid(d) : Zero distance(s)

3: In is.euclid(d) : Zero distance(s)

4: In is.euclid(d) : Zero distance(s)

5: In is.euclid(d) : Zero distance(s)

6: In is.euclid(d) : Zero distance(s)

7: In is.euclid(d) : Zero distance(s)

8: In is.euclid(d) : Zero distance(s)

9: In is.euclid(d) : Zero distance(s)

10: In is.euclid(d) : Zero distance(s)

I would like to ask you,

1- what does the warning message mean?

I would like to be sure that I correctly deal with missing values.

2- How can I deal with missing values?

The warning message means that there is at least a Zero distance in your d distance matrix. This should be caused by at least two identical rows in your snp matrix (so the distance between them is equal to 0). In fact, the warning is given by the is.euclid() function and just is there to inform you that have two identical rows (and maybe you want to simplify them).

This is an example:

# ALL ROWS DIFFERENT - No warning

> snp <- matrix(c(1,0,0,0,0,0,1,1,1,1,0,1), nrow=3)

> snp

[,1] [,2] [,3] [,4]

[1,] 1 0 1 1

[2,] 0 0 1 0

[3,] 0 0 1 1

> dist.binary(snp, method=2)

1 2

2 0.7071068

3 0.5000000 0.5000000

> is.euclid(dist.binary(snp, method=2))

[1] TRUE

#ROWS 1 AND · IDENTICAL - Warning

> snp <- matrix(c(0,0,0,0,0,0,1,1,1,1,0,1), nrow=3)

> snp

[,1] [,2] [,3] [,4]

[1,] 0 0 1 1

[2,] 0 0 1 0

[3,] 0 0 1 1

> dist.binary(snp, method=2)

1 2

2 0.5

3 0.0 0.5

> is.euclid(dist.binary(snp, method=2))

[1] TRUE

Warning message:

In is.euclid(dist.binary(snp, method = 2)) : Zero distance(s)

By other way, dist.binary doesn't allow for missing values as it only accepts FALSE(0)/TRUE(any positive integer) binary values. If you try to add any NA (missing value) it yields an error:

> snp <- matrix(c(NA,0,0,0,0,0,1,1,1,1,0,1), nrow=3)

> snp

[,1] [,2] [,3] [,4]

[1,] NA 0 1 1

[2,] 0 0 1 0

[3,] 0 0 1 1

> dist.binary(snp, method=2)

Error in if (any(df < 0)) stop("non negative value expected in df") :

missing value where TRUE/FALSE needed

Please note that in you example code you are doing the same thing in the 10 iterations of the for loop. I think that code is adapted from the example given in the dist.binary vignette where the loop is intended to show the outcome for all the 10 methods. So, you should use method=i instead of method=2:

d <- dist.binary(snp, method = 2)

I hope this answers your question.

7 Recommendations

**Get help with your research**

Join ResearchGate to ask questions, get input, and advance your work.

I am not sure why you are running a "for" loop since it looks like you are simply performing the same calculation ten times. I am guessing the error meant that something went wrong at the dist.binary stage and produced a dist object d that is empty. Have you looked at the contents of the R object "snp" to make sure your data was imported correctly? Have you looked at the contents of the object "d" to make sure that dist.binary did what was expected?

The warning message means that there is at least a Zero distance in your d distance matrix. This should be caused by at least two identical rows in your snp matrix (so the distance between them is equal to 0). In fact, the warning is given by the is.euclid() function and just is there to inform you that have two identical rows (and maybe you want to simplify them).

This is an example:

# ALL ROWS DIFFERENT - No warning

> snp <- matrix(c(1,0,0,0,0,0,1,1,1,1,0,1), nrow=3)

> snp

[,1] [,2] [,3] [,4]

[1,] 1 0 1 1

[2,] 0 0 1 0

[3,] 0 0 1 1

> dist.binary(snp, method=2)

1 2

2 0.7071068

3 0.5000000 0.5000000

> is.euclid(dist.binary(snp, method=2))

[1] TRUE

#ROWS 1 AND · IDENTICAL - Warning

> snp <- matrix(c(0,0,0,0,0,0,1,1,1,1,0,1), nrow=3)

> snp

[,1] [,2] [,3] [,4]

[1,] 0 0 1 1

[2,] 0 0 1 0

[3,] 0 0 1 1

> dist.binary(snp, method=2)

1 2

2 0.5

3 0.0 0.5

> is.euclid(dist.binary(snp, method=2))

[1] TRUE

Warning message:

In is.euclid(dist.binary(snp, method = 2)) : Zero distance(s)

By other way, dist.binary doesn't allow for missing values as it only accepts FALSE(0)/TRUE(any positive integer) binary values. If you try to add any NA (missing value) it yields an error:

> snp <- matrix(c(NA,0,0,0,0,0,1,1,1,1,0,1), nrow=3)

> snp

[,1] [,2] [,3] [,4]

[1,] NA 0 1 1

[2,] 0 0 1 0

[3,] 0 0 1 1

> dist.binary(snp, method=2)

Error in if (any(df < 0)) stop("non negative value expected in df") :

missing value where TRUE/FALSE needed

Please note that in you example code you are doing the same thing in the 10 iterations of the for loop. I think that code is adapted from the example given in the dist.binary vignette where the loop is intended to show the outcome for all the 10 methods. So, you should use method=i instead of method=2:

d <- dist.binary(snp, method = 2)

I hope this answers your question.

7 Recommendations

Article

Full-text available

- Nov 2014

Methodical aspects of using the analysis of DNA single-nucleotide polymorphism (SNP-analysis) for certification and identification of maize lines are considered. It is shown that SNP-genotyping is a method with high discriminatory potential that can differentiate maize lines among themselves and is recommended to use for certification of maize line...

Article

- Aug 2004

To investigate the relationship of common single nucleotide polymorphisms (SNPs) of the beta(2)-adrenergic receptor (AR) gene at codons 16 and 27, and the intermediate phenotype of airways hyperresponsiveness.
A case-control study in 543 white men (152 case patients and 391 control subjects), who were nested in an ongoing longitudinal cohort.
Subje...

Article

Full-text available

- Mar 2019

Background
Current World Health Organization guidelines for conducting anti-malarial drug efficacy clinical trials recommend genotyping Plasmodium falciparum genes msp1 and msp2 to distinguish recrudescence from reinfection. A more recently developed potential alternative to this method is a molecular genotyping assay based on a panel of 24 single...

Get high-quality answers from experts.