About
15
Publications
8,863
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
27
Citations
Introduction
Current institution
Additional affiliations
Publications
Publications (15)
Cancer arises from the complex interplay of various factors. Traditionally, the identification of driver genes focuses primarily on the analysis of somatic mutations. We describe a new method for the detection of driver gene pairs based on an epistasis analysis that considers both germline and somatic variations. Specifically, the identification of...
Objective
In 2006, an age estimation method was proposed utilizing Bayesian inference to interpret age‐progressive changes in the acetabulum. This was accompanied by the IDADE2 software to facilitate calculations. However, the MS‐DOS operating system on which the software was based became obsolete. The main goal of this article is to present the ne...
The earliest developmental origins of dysmorphologies are poorly understood in many congenital diseases. They often remain elusive because the first signs of genetic misregulation may initiate as subtle changes in gene expression, which are hard to detect and can be obscured later in development by secondary effects. Here, we develop a method to tr...
The earliest developmental origins of dysmorphologies are poorly understood in many congenital diseases. They often remain elusive because the first signs of genetic misregulation may initiate as subtle changes in gene expression, which can be obscured later in development due to secondary phenotypic effects. We here develop a method to trace back...
Ranking Putative Cancer Driver Gene Subsets
Altered Fgf/Fgfr gene expression patterns affect early limb development in Apert syndrome
Recent advances in next generation sequencing (NGS) have provided such a huge amount of data that is even beyond the analysis capacity of the scientific community. Therefore, it has become a necessity, the development of new bioinformatic tools for the detection of genetic variants associated with many genetic disorders. An important factor in this...
Objetivos Los avances en la secuenciación masiva de genomas están proporcionando una cantidad ingente de datos que la comunidad científica no puede procesar. Son necesarias nuevas herramientas para el análisis y descubrimiento de nuevas variantes implicadas en el desarrollo de enfermedades genéticas. Un factor fundamental en este análisis, es la di...
Apert syndrome is a rare congenital disorder characterized by cranial, neural, limb and visceral malformations. Over 98% of Apert cases are caused by two FGFR2 mutations, Ser252Trp and Pro253Arg, which alter the ligand-binding specificity of the receptors. Patients carrying the P253R mutation show more severe limb malformations, such as syndactyly...
Questions
Questions (4)
Hi again.
I have the following problem:
Let say I have two random variables X, Y, discrete, each with three possible values, a,b and c.
I would like to test if in a sample of 500 objects, the two values X and Y are independent in a special way:
H_0: Pr(X=x ^ Y=y) <= Pr(X=x) Pr(Y=y) for each x,y = a,b,c
I use as a statistic s(T') the one in the chi squared test but in the contingency table T' of 9 cells, I add the discrepancy only if Observed_ij > Expected_ij.
In order to calculate p-values, I calculate the distribution of s(T') over a million random tables, each with the same marginal frequencies of T'.
To do so, I generate a random permutation of the values of X on the 500 objects, and another random vector with the frequencies of Y,
calculate the contingency table T' and s(T'), over a million T'
Then, the distribution of s(T') is used to calculate p-values.
But I realize that the null hypothesis H'_0 for the Montecarlo procedure is that X and Y are independent which is not exactly what I need.
Fortunately, for a fixed r, the number of times that there is a T' where s(T') >= r, when H'_0: Pr(X=x ^ Y=y) = Pr(X=x) Pr(Y=y), is greater than
the number of times that there is a T' where s(T') >= r, when H_0: Pr(X=x ^ Y=y) <= Pr(X=x) Pr(Y=y), because, intuitively, in H_0 there are more chances that the table have more cells with Observed <= Expected, that are not counted in the sum for s(T').
Therefore, Pr(s(T')>=r | H'_0) >= Pr(s(T') >= r | H_0).
So, if I have a table T, and I calculate s(T) = r, I know that if Pr(s(T')>=r | H'_0) <= \alpha then Pr(s(T') >= r | H_0) <= \alpha.
Therefore, the Montecarlo distribution of s(T) under H'_0 is useful for rejecting H_0 for T. However, it is conservative.
Is this procedure correct? How can it be done better?
Thanks in advance, again.
I have a list of genes (and their interactions) and I would like to see how these genes are located in a specific network.
I would appreciate any help!
Thanks.
I have two discrete variables with three values each, A,B, and C for one, and D,E, and F, for the other (for around 500 objects).
I am only interested to reject independence in three cells of the nine cells in the table,(A,D),(A,E), (B,D).
So I have three null hypothesis:
H_AD : Pr(AD) <= Pr(A) P(D)
H_AE : Pr(AE) <= Pr(A) P(E)
H_BD : Pr(BD) <= Pr(B) P(D)
which also say that for those cells, the observed values are lower or equal than the expected values according to the marginals. If the null hypothesis can be rejected, then we have that, on at least of those cells, the observed counts are greater than the expected, with some confidence.
So what I do is to use the usual chi squared test, but instead of adding over all 9 cells, I sum over only 3 cells and only if the
observation counts are above the expected ones. Then, I calculate the p-value using the chi squared distribution (one tail) with 4 degrees of freedom since all table cells still affect the answer.
Is it correct?
I have a small dataset (N=400 samples). And I want to apply Fisher exact test because I my contingency table has a lot of 0s but I have read that Fisher Exact test is only for N≤90 so I returned to Chi2-test but I read that the chi-square test is performed only if at least 80% of the cells have an expected frequency of 5 or greater, and no cell has an expected frequency smaller than 1.0, which is not my case.
Which test do I have to use?
Any response might be fine!
Thanks in advance.