Recent Posts

Pages: [1] 2 3 ... 9
1
General Discussion / What is the proper input and reporting score for Transfactivity?
« Last post by jason on March 11, 2020, 09:55:13 pm »
Hello,

I have been using transfactivity on my data and am very pleased with the results! It looks very coherent and seems to be giving the "right" answer. I am close to publishing the results, but wanted to run my current process by you and get your opinion regarding whether this is valid given the statistical inferences used in Transfactivity.

I calculated effect sizes for my RNAseq dataset using Sleuth (same idea as DESeq2 or EdgeR), and fed the raw effect sizes into Transfactivity to make TF motif activity inferences. I subsetted the results on motifs that passed an arbitrary significance threshold in at least 1 sample, then reported the actual coefficients (f value) of the significant motifs. Since the raw coefficients vary so much in value, I rescaled them by dividing by the largest coefficient in the row. Therefore, at least 1 sample will have a value of either -1 or +1 in each row, and everything else is relative to that.

You can find an illustrated example of this process at this link: https://drive.google.com/open?id=1T6wy3ho5nml7f5tq83u0UDvmsVPw4DF3

However, there are many alternative ways to input the data and ways to report it, and I was just hoping to get your opinions on the options.

Input:

1. Input the effect sizes for each gene as I did above. The problem here is many of the largest effect sizes are not actually significant (usually lowly expressed genes), and this will throw off the TF activity inferences.

2. Input the effect sizes only for genes that passed significance. I presume this will throw off the predictions though if I only feed it data for ~500 genes.

3. Input the effect sizes, but use some arbitrary process to get rid of the signal from non-significance genes, such as discarding only the ~200 genes with large effect sizes but no significant p-value, or alternatively just setting the effect size to 0 if it did not pass significance.

4. Input the p-values directly (signed -log10). I tried this and it works decently well, but obviously Transfactivity is expecting to predict magnitude of gene expression change.... not its significance which can vary wildly even just based on things like # of replicates I used.... so this seems wrong.

5. Input the row-normalized TPM matrix directly (or normalized count matrix), and then, to figure out which motifs associate with my statistical covariates, feed the Transfactivity coefficients into another regression to predict those that track with my covariates.

Which one would you recommend?


Reporting the output:

1. Report row-normalized coefficients as I did.

2. Report the signed -log p-values. As you can see in the pdf example, some motifs are MUCH more significant than others in the results, and this distinction is lost using the coefficients.


Thanks again for writing and maintaining such a great tool, hope to hear your opinions.


2
General Discussion / Re: TB binding site binding affinity compared to mutant
« Last post by xiangjun on October 01, 2019, 09:16:05 pm »
Did you read the Documentation Section?
3
General Discussion / Re: TB binding site binding affinity compared to mutant
« Last post by amanzour on October 01, 2019, 05:16:41 pm »
Thank you Lu.

How can I install Reducesuite?
are there a series of command line for linux?
Thank you
Amir
4
General Discussion / Re: TB binding site binding affinity compared to mutant
« Last post by xiangjun on September 18, 2019, 09:36:42 pm »
Thanks for using the REDUCE Suite and for posting your questions on the Forum.

Unfortunately, I do not know how to calculate the change in binding affinity between wild and mutated sites of a TF binding site. Currently, the REDUCE Suite is command-line based only, no web interface to it.

Best regards,

Xiang-Jun
5
General Discussion / TB binding site binding affinity compared to mutant
« Last post by amanzour on September 17, 2019, 10:48:43 pm »
Hi
How can I calculate the change between binding affinity of a transcription factor binding site and its corresponding mutated version of it. TF protein is known.
Would it be possible to do it web based?
Basically inputting: Sequence, mutant, TF and outputting change in TF binding affinity

Thank you so much beforehand
6
General Discussion / Re: ddG LogoGenerator
« Last post by xiangjun on July 12, 2019, 06:17:36 pm »
Quote
'm curious about the math behind PWM logos with ddG on the y axis.

The ddG is calculated from PSAM via a simple log-transformation. See the source code in file LogoGenerator.c. You may find the thread "IC from PSAM" informative, especially the summary post at the end from Dr. Bussemaker.

Quote
The LogoGenerator docs suggest I need a PSAM file or multiple sequence alignment to generate this kind of plot, but all I have is a PWM for each transcription factor.

Is it possible to generate a figure like the one above directly from a PWM? Could you point me towards a reference describing how to do this (mathematically)?


With a PWM, you could run Convert2PSAM to transform it into a pseudo-PSAM, and then call LogoGenerator.

HTH,

Xiang-Jun
7
General Discussion / ddG LogoGenerator
« Last post by lenail on July 12, 2019, 11:13:01 am »
'm curious about the math behind PWM logos with ddG on the y axis.

Specifically, [http://humantfs.ccbr.utoronto.ca/]The Human Transcription Factors[/http://humantfs.ccbr.utoronto.ca/] PWM logos were generated via the REDUCE LogoGenerator, so they tell me. They have figures like this (image attached).



The LogoGenerator docs suggest I need a PSAM file or multiple sequence alignment to generate this kind of plot, but all I have is a PWM for each transcription factor.

Is it possible to generate a figure like the one above directly from a PWM? Could you point me towards a reference describing how to do this (mathematically)?

Many thanks,
8
General Discussion / Re: IC from PSAM
« Last post by hjb2004 on June 05, 2019, 08:52:57 am »
The most pedagogical and detailed explanation of the relationship between the various logo and matrix representations can be found in a review that we wrote in 2007 (Bussemaker et al., Annu Rev Biophys Biomol Struct. 2007;36:329-47; https://www.ncbi.nlm.nih.gov/pubmed/17311525). I recommend that you read this paper, and some of the other relevant papers that it refers to.

A brief summary:

1. The philosophy of MatrixREDUCE is fundamentally different from that of traditional weigh matrix discovery methods such as MEME. We use a matrix representation of DNA binding specificity called position-specific affinity matrix (PSAM). The entries in the PSAM quantity the relative affinity of sequences that differ from the optimal DNA sequence by a single point mutation. Therefore, the largest value in each columns equals one, and all other values are between zero and one. The entries in a PSAM do not have to add up to one at each position.

2. The negative of the natural logarithm of the entries in the PSAM correspond to delta-delta-G values in units of RT, which are commonly used in the biophysical literature. The energy logo that we introduced in Foat et al. 2006 (https://www.ncbi.nlm.nih.gov/pubmed/16873464) is a graphical representation in which the letter heights are directly related to these ddG/RT values. This is the logo representation that we recommend for visualizing MatrixREDUCE models.

3. It is not possible to construct a traditional information-content logo from a PSAM without making further ad hoc assumptions about the "background frequency" of each base, and therefore we do not recommend it. Traditional PWM's are statistical models of sets of aligned binding sites, in which at each position the frequency of each base is specified. These frequencies do add up to one by definition, unlike in a PSAM. While we do not recommend this, you could divide each column of the PSAM by its sum to convert the four relative affinities for each position to obtain a set of four base frequencies. You could then convert these "foreground frequencies" to relative entropies as is done to construct traditional sequence logos (for each base, divide foreground frequency by background frequency, take the logarithm base two of this ratio, then multiply this logarithm by the foreground frequency for each base, and finally sum over all four bases to get the relative entropy, which is also known as the information gain, measured in bits). The total height of the letter stack in the logo will correspond to this relative entropy, and the height of the individual letters will be proportional to the corresponding base frequency.

Finally, here is a numerical example for how a single position within the binding site could be represented:

ACGT
relative affinity1.00.50.10.9
ddG/RT0.000.692.300.11
base frequency (not recommended) 0.400.200.040.36

Hope this helps!
9
General Discussion / Re: IC from PSAM
« Last post by mora on May 28, 2019, 04:45:27 pm »
Thanks a lot but I do not know how to read C code.

I am posting an example of a matrix I created using Optimize PSAM and two logos from LogoGenerator. Basically what I want to know if how exactly LogoGenerator translates the PSAM values to nucleotides height in bits and in ddG.


For example: for position 3 G value is 1. How does Logo Generator turns that 1 into a ddD of ~ 3 and ~ 0.6 bits
10
General Discussion / Re: IC from PSAM
« Last post by xiangjun on May 27, 2019, 10:18:22 pm »
Quote
How exactly logoGenerator calculates the height of each nucleotide at each position?

Check the source code. More specifically, the logo_psam() function in LogoGenerator.c. If you can go over a specific example, step-by-step, I'll be able to help.

As for the specifics in the two articles, Harmen may chime in to make a comment.

Best regards,

Xiang-Jun
Pages: [1] 2 3 ... 9
Created and maintained by Dr. Xiang-Jun Lu [律祥俊]. See also http://forum.x3dna.org and http://x3dna.org