1

##### General Discussion / Re: TB binding site binding affinity compared to mutant

« Last post by**xiangjun**on

*October 01, 2019, 09:16:05 pm*»

Did you read the Documentation Section?

1

Did you read the Documentation Section?

2

Thank you Lu.

How can I install Reducesuite?

are there a series of command line for linux?

Thank you

Amir

How can I install Reducesuite?

are there a series of command line for linux?

Thank you

Amir

3

Thanks for using the REDUCE Suite and for posting your questions on the Forum.

Unfortunately, I do**not** know how to calculate the change in binding affinity between wild and mutated sites of a TF binding site. Currently, the REDUCE Suite is command-line based only, no web interface to it.

Best regards,

Xiang-Jun

Unfortunately, I do

Best regards,

Xiang-Jun

4

Hi

How can I calculate the change between binding affinity of a transcription factor binding site and its corresponding mutated version of it. TF protein is known.

Would it be possible to do it web based?

Basically inputting: Sequence, mutant, TF and outputting change in TF binding affinity

Thank you so much beforehand

How can I calculate the change between binding affinity of a transcription factor binding site and its corresponding mutated version of it. TF protein is known.

Would it be possible to do it web based?

Basically inputting: Sequence, mutant, TF and outputting change in TF binding affinity

Thank you so much beforehand

5

Quote

'm curious about the math behind PWM logos with ddG on the y axis.

The ddG is calculated from PSAM via a simple log-transformation. See the source code in file

Quote

The LogoGenerator docs suggest I need a PSAM file or multiple sequence alignment to generate this kind of plot, but all I have is a PWM for each transcription factor.

Is it possible to generate a figure like the one above directly from a PWM? Could you point me towards a reference describing how to do this (mathematically)?

With a PWM, you could run

HTH,

Xiang-Jun

6

'm curious about the math behind PWM logos with ddG on the y axis.

Specifically, [http://humantfs.ccbr.utoronto.ca/]The Human Transcription Factors[/http://humantfs.ccbr.utoronto.ca/] PWM logos were generated via the REDUCE LogoGenerator, so they tell me. They have figures like this (image attached).

The LogoGenerator docs suggest I need a PSAM file or multiple sequence alignment to generate this kind of plot, but all I have is a PWM for each transcription factor.

Is it possible to generate a figure like the one above directly from a PWM? Could you point me towards a reference describing how to do this (mathematically)?

Many thanks,

Specifically, [http://humantfs.ccbr.utoronto.ca/]The Human Transcription Factors[/http://humantfs.ccbr.utoronto.ca/] PWM logos were generated via the REDUCE LogoGenerator, so they tell me. They have figures like this (image attached).

The LogoGenerator docs suggest I need a PSAM file or multiple sequence alignment to generate this kind of plot, but all I have is a PWM for each transcription factor.

Is it possible to generate a figure like the one above directly from a PWM? Could you point me towards a reference describing how to do this (mathematically)?

Many thanks,

7

The most pedagogical and detailed explanation of the relationship between the various logo and matrix representations can be found in a review that we wrote in 2007 (Bussemaker et al., Annu Rev Biophys Biomol Struct. 2007;36:329-47; https://www.ncbi.nlm.nih.gov/pubmed/17311525). I recommend that you read this paper, and some of the other relevant papers that it refers to.

A brief summary:

1. The philosophy of MatrixREDUCE is fundamentally different from that of traditional weigh matrix discovery methods such as MEME. We use a matrix representation of DNA binding specificity called position-specific affinity matrix (PSAM). The entries in the PSAM quantity the relative affinity of sequences that differ from the optimal DNA sequence by a single point mutation. Therefore, the largest value in each columns equals one, and all other values are between zero and one. The entries in a PSAM do not have to add up to one at each position.

2. The negative of the natural logarithm of the entries in the PSAM correspond to delta-delta-G values in units of RT, which are commonly used in the biophysical literature. The energy logo that we introduced in Foat et al. 2006 (https://www.ncbi.nlm.nih.gov/pubmed/16873464) is a graphical representation in which the letter heights are directly related to these ddG/RT values. This is the logo representation that we recommend for visualizing MatrixREDUCE models.

3. It is not possible to construct a traditional information-content logo from a PSAM without making further ad hoc assumptions about the "background frequency" of each base, and therefore we do not recommend it. Traditional PWM's are statistical models of sets of aligned binding sites, in which at each position the frequency of each base is specified. These frequencies do add up to one by definition, unlike in a PSAM. While we do not recommend this, you could divide each column of the PSAM by its sum to convert the four relative affinities for each position to obtain a set of four base frequencies. You could then convert these "foreground frequencies" to relative entropies as is done to construct traditional sequence logos (for each base, divide foreground frequency by background frequency, take the logarithm base two of this ratio, then multiply this logarithm by the foreground frequency for each base, and finally sum over all four bases to get the relative entropy, which is also known as the information gain, measured in bits). The total height of the letter stack in the logo will correspond to this relative entropy, and the height of the individual letters will be proportional to the corresponding base frequency.

Finally, here is a numerical example for how a single position within the binding site could be represented:

Hope this helps!

A brief summary:

1. The philosophy of MatrixREDUCE is fundamentally different from that of traditional weigh matrix discovery methods such as MEME. We use a matrix representation of DNA binding specificity called position-specific affinity matrix (PSAM). The entries in the PSAM quantity the relative affinity of sequences that differ from the optimal DNA sequence by a single point mutation. Therefore, the largest value in each columns equals one, and all other values are between zero and one. The entries in a PSAM do not have to add up to one at each position.

2. The negative of the natural logarithm of the entries in the PSAM correspond to delta-delta-G values in units of RT, which are commonly used in the biophysical literature. The energy logo that we introduced in Foat et al. 2006 (https://www.ncbi.nlm.nih.gov/pubmed/16873464) is a graphical representation in which the letter heights are directly related to these ddG/RT values. This is the logo representation that we recommend for visualizing MatrixREDUCE models.

3. It is not possible to construct a traditional information-content logo from a PSAM without making further ad hoc assumptions about the "background frequency" of each base, and therefore we do not recommend it. Traditional PWM's are statistical models of sets of aligned binding sites, in which at each position the frequency of each base is specified. These frequencies do add up to one by definition, unlike in a PSAM. While we do not recommend this, you could divide each column of the PSAM by its sum to convert the four relative affinities for each position to obtain a set of four base frequencies. You could then convert these "foreground frequencies" to relative entropies as is done to construct traditional sequence logos (for each base, divide foreground frequency by background frequency, take the logarithm base two of this ratio, then multiply this logarithm by the foreground frequency for each base, and finally sum over all four bases to get the relative entropy, which is also known as the information gain, measured in bits). The total height of the letter stack in the logo will correspond to this relative entropy, and the height of the individual letters will be proportional to the corresponding base frequency.

Finally, here is a numerical example for how a single position within the binding site could be represented:

A | C | G | T | |

relative affinity | 1.0 | 0.5 | 0.1 | 0.9 |

ddG/RT | 0.00 | 0.69 | 2.30 | 0.11 |

base frequency (not recommended) | 0.40 | 0.20 | 0.04 | 0.36 |

Hope this helps!

8

Thanks a lot but I do not know how to read C code.

I am posting an example of a matrix I created using Optimize PSAM and two logos from LogoGenerator. Basically what I want to know if how exactly LogoGenerator translates the PSAM values to nucleotides height in bits and in ddG.

For example: for position 3 G value is 1. How does Logo Generator turns that 1 into a ddD of ~ 3 and ~ 0.6 bits

I am posting an example of a matrix I created using Optimize PSAM and two logos from LogoGenerator. Basically what I want to know if how exactly LogoGenerator translates the PSAM values to nucleotides height in bits and in ddG.

For example: for position 3 G value is 1. How does Logo Generator turns that 1 into a ddD of ~ 3 and ~ 0.6 bits

9

Quote

How exactly logoGenerator calculates the height of each nucleotide at each position?

Check the source code. More specifically, the

As for the specifics in the two articles, Harmen may chime in to make a comment.

Best regards,

Xiang-Jun

10

Thanks a lot!! One more question:

How exactly logoGenerator calculates the height of each nucleotide at each position?

your Foat et al 2005 paper says that it "the height of each nucleotide is determined by subtracting the smallest weight for any nucleotide at that position and then dividing by the sum of all four weights".

I replicated this on a PSAM I made and the results were similar, but not identical, to the results obtained by Logo nucleotides (y-axis) height.

Does your LogoGenerator makes any other step other than subtracting the smallest weight? maybe some type of normalization?

Ohh I see in your 2006 papers says that for the affinity logo you used the average right? what about when you used the option -style=bits_info? How nucleotide height is calculated then to approximate bits?

How exactly logoGenerator calculates the height of each nucleotide at each position?

your Foat et al 2005 paper says that it "the height of each nucleotide is determined by subtracting the smallest weight for any nucleotide at that position and then dividing by the sum of all four weights".

I replicated this on a PSAM I made and the results were similar, but not identical, to the results obtained by Logo nucleotides (y-axis) height.

Does your LogoGenerator makes any other step other than subtracting the smallest weight? maybe some type of normalization?

Ohh I see in your 2006 papers says that for the affinity logo you used the average right? what about when you used the option -style=bits_info? How nucleotide height is calculated then to approximate bits?