Recent Posts

Pages: [1] 2 3 ... 8
1
General Discussion / Re: IC from PSAM
« Last post by hjb2004 on June 05, 2019, 08:52:57 am »
The most pedagogical and detailed explanation of the relationship between the various logo and matrix representations can be found in a review that we wrote in 2007 (Bussemaker et al., Annu Rev Biophys Biomol Struct. 2007;36:329-47; https://www.ncbi.nlm.nih.gov/pubmed/17311525). I recommend that you read this paper, and some of the other relevant papers that it refers to.

A brief summary:

1. The philosophy of MatrixREDUCE is fundamentally different from that of traditional weigh matrix discovery methods such as MEME. We use a matrix representation of DNA binding specificity called position-specific affinity matrix (PSAM). The entries in the PSAM quantity the relative affinity of sequences that differ from the optimal DNA sequence by a single point mutation. Therefore, the largest value in each columns equals one, and all other values are between zero and one. The entries in a PSAM do not have to add up to one at each position.

2. The negative of the natural logarithm of the entries in the PSAM correspond to delta-delta-G values in units of RT, which are commonly used in the biophysical literature. The energy logo that we introduced in Foat et al. 2006 (https://www.ncbi.nlm.nih.gov/pubmed/16873464) is a graphical representation in which the letter heights are directly related to these ddG/RT values. This is the logo representation that we recommend for visualizing MatrixREDUCE models.

3. It is not possible to construct a traditional information-content logo from a PSAM without making further ad hoc assumptions about the "background frequency" of each base, and therefore we do not recommend it. Traditional PWM's are statistical models of sets of aligned binding sites, in which at each position the frequency of each base is specified. These frequencies do add up to one by definition, unlike in a PSAM. While we do not recommend this, you could divide each column of the PSAM by its sum to convert the four relative affinities for each position to obtain a set of four base frequencies. You could then convert these "foreground frequencies" to relative entropies as is done to construct traditional sequence logos (for each base, divide foreground frequency by background frequency, take the logarithm base two of this ratio, then multiply this logarithm by the foreground frequency for each base, and finally sum over all four bases to get the relative entropy, which is also known as the information gain, measured in bits). The total height of the letter stack in the logo will correspond to this relative entropy, and the height of the individual letters will be proportional to the corresponding base frequency.

Finally, here is a numerical example for how a single position within the binding site could be represented:

ACGT
relative affinity1.00.50.10.9
ddG/RT0.000.692.300.11
base frequency (not recommended) 0.400.200.040.36

Hope this helps!
2
General Discussion / Re: IC from PSAM
« Last post by mora on May 28, 2019, 04:45:27 pm »
Thanks a lot but I do not know how to read C code.

I am posting an example of a matrix I created using Optimize PSAM and two logos from LogoGenerator. Basically what I want to know if how exactly LogoGenerator translates the PSAM values to nucleotides height in bits and in ddG.


For example: for position 3 G value is 1. How does Logo Generator turns that 1 into a ddD of ~ 3 and ~ 0.6 bits
3
General Discussion / Re: IC from PSAM
« Last post by xiangjun on May 27, 2019, 10:18:22 pm »
Quote
How exactly logoGenerator calculates the height of each nucleotide at each position?

Check the source code. More specifically, the logo_psam() function in LogoGenerator.c. If you can go over a specific example, step-by-step, I'll be able to help.

As for the specifics in the two articles, Harmen may chime in to make a comment.

Best regards,

Xiang-Jun
4
General Discussion / Re: IC from PSAM
« Last post by mora on May 26, 2019, 04:56:22 pm »
Thanks a lot!! One more question:

How exactly logoGenerator calculates the height of each nucleotide at each position?

your Foat et al 2005 paper says that it "the height of each nucleotide is determined by subtracting the smallest weight for any nucleotide at that position and then dividing by the sum of all four weights".

I replicated this on a PSAM I made and the results were similar, but not identical, to the results obtained by Logo nucleotides (y-axis) height.

Does your LogoGenerator makes any other step other than subtracting the smallest weight? maybe some type of normalization?

Ohh I see in your 2006 papers says that for the affinity logo you used the average right? what about when you used the option -style=bits_info? How nucleotide height is calculated then to approximate bits?


5
General Discussion / Re: IC from PSAM
« Last post by xiangjun on May 25, 2019, 10:09:32 am »
Hi Mora,

Thanks for your questions.

Regarding the notation PSAM, please refer to the two MatrixREDUCE-related publications:

You may simply take the PSAM logo as an alternative to the classic information content (IC) type. If you want to dig deep in technical details, please refer to the source code. The LogoGenerator program in the REDUCE Suite is just a tool for users to employ, as they see fit. Notably, it has more options just the IC or PSAM logos.

Hope this helps a bit.

Xiang-Jun

6
General Discussion / IC from PSAM
« Last post by mora on May 24, 2019, 04:36:26 pm »
What does the information content (IC) from logos built from a PSAM matrix mean? how does this differ from IC inferred from a classic PSSM/PWM matrix?

Also, is there any information about how LogoGenerator calculate the IC in bits from the values in the PSAM? I assume it has to be very different from how it is calculated according to classic PWM right? (see link)
https://en.wikipedia.org/wiki/Position_weight_matrix

 
7
General Discussion / Re: LogoGenerator background
« Last post by xiangjun on May 15, 2019, 09:18:47 pm »
Hi Rocky,

Thanks for using the REDUCE Suite and for posting your questions on the Forum.

The LogoGenerator program simply uses the N-by-4 'PSAM' input data, with possible transformations (ddG, bits_info etc), to create a sequence logo. In essence, the program does not care about nucleotide frequency at each position, other than a log transformation for the ddG style.

To have full control of the logo-generation process, try the '-style=raw' option. You can then generate/edit the input data to LogoGenerator in whatever way that fits your purpose. Note also the Convert2PSAM utility program that may be helpful/convenient for the correct input format.

Hope this helps.

Xiang-Jun
8
General Discussion / LogoGenerator background
« Last post by mparida on May 15, 2019, 12:04:47 pm »
Hi Xiang-Jun
The algorithm works great. I was able to generate ddG logos using this software. However, I was wondering if the LogoGenerator uses the deviation from the uniform background frequency (A,T,G,C), such as 0.25 to show over-represented and under-represented nucleotide frequency at each position. If so, is there an easy way to change it to the actual background frequency observed from our data instead. For example, instead of assuming the background frequency of A,T,G,C as 0.25 each can we change it to 0.20,0.20,0.30,0.30. Any help in this regard is appreciated.
 
Rocky
9
General Discussion / Re: "Affinity Profile"
« Last post by xiangjun on February 22, 2019, 11:05:11 am »
Hi Azadeh,

I'm glad to hear that you have got the REDUCE Suite up and running.

Quote
Could you please help me to understand these results ?
What is this matrix  "aff_psam_001.xml" ?
How could I obtain more details on the affinity change and dynamics of binding site ?

Interpretation of the results, however, is beyond my scope of support. Reading publications from the Bussemaker lab may help.

Best regards,

Xiang-Jun

10
General Discussion / Re: "Affinity Profile"
« Last post by azadeh.s on February 22, 2019, 10:15:54 am »
Hello

I'am also novice for using your great package REDUCE_Suit.
I'm interested to study the affinity profile for a vary famous gene COL1A1( I downloded COL1A1 from Ensembel database) I tried to understand the dynamic of affinity profile for a transcription factor Sp1 through its binding site for both wild and mutant COL1A1 genes .

To start my study I ran the commande :
AffinityProfile \
   -prefix=aff_ \
   -threshold=0 \
   -psam=../../Desktop/source/REDUCE-Suite-v2.2/examples/AffinityProfile/psam_001.xml \
   -sequence=sequence_COL1A1.fasta

As a Psam matrix, I used the matrix given in your example.
But then I don't understand the result.

cat seq_psam.dat
   psam_001.xml
NM_000088.3   0.0301094
####################
cat AffinityProfile.log
sequence file: sequence_COL1A1.fasta   cut-off: 0
reading in sequences from file [sequence_COL1A1.fasta]
reading in PSAM ../../Desktop/source/REDUCE-Suite-v2.2/examples/AffinityProfile/psam_001.xml [1/1]
[  1]: ../../Desktop/source/REDUCE-Suite-v2.2/examples/AffinityProfile/psam_001.xml (forward); pmax=0.025626

Time used: 00:00:00:00

#################
cat aff_psam_001.xml

NM_000088.3   0.0301094   1:2.78962e-104   2:2.8034e-101   3:2.508e-07   4:2.97891e-94   5:3.73255e-103   6:4.75891e-116   7:6.73541e-70   8:6.52039e-71   9:5.81238e-65   10:4.04049e-53   11:9.0064e-52   12:2.29734e-51   13:1.30952e-52   14:1.14509e-48   15:2.33791e-51   16:6.36074e-41   17:4.78319e-46   18:9.88362e-103   19:1.18759e-104   20:7.56195e-118   21:3.05295e-99   22:1.19914e-103   23:2.68201e-21   24:6.17375e-91   25:6.30612e-151   26:6.59257e-50   27:1.50964e-89   28:1.19137e-86   29:9.42612e-47   30:5.0609e-66   31:3.89768e-149   32:1.41179e-49   33:1.62746e-79   34:2.80803e-108   35:5.26342e-77   36:1.28883e-55   37:4.3514e-24   38:5.40903e-86   39:5.26603e-61   40:2.5533e-83   41:2.91799e-121   42:4.92685e-68   43:1.42043e-72   44:3.66844e-54   45:1.07013e-92   46:2.2805e-56   47:4.31802e-112   48:2.12206e-67   49:4.32273e-65   50:5.36831e-108   51:3.10334e-59   52:5.76347e-71   53:3.189e-74   54:1.16991e-15   55:2.31613e-120   56:4.17858e-69   57:3.11132e-70   58:6.68288e-27   59:1.72033e-105   60:7.81592e-63   61:7.91623e-64   62:9.43495e-63   63:1.05312e-74   64:9.53001e-123   65:2.04295e-66   66:5.71577e-30   67:5.90979e-108   68:8.76473e-54   69:2.51005e-73   70:2.08671e-81   71:9.14769e-38   72:5.51473e-31   73:2.33375e-44   74:1.11588e-46   75:1.80495e-39   76:3.66284e-49   77:1.52272e-44   78:4.28484e-71   79:8.19242e-23   80:7.09162e-57   81:1.31163e-58   82:7.40765e-44   83:1.63155e-65   84:6.85533e-61   85:1.67216e-60   86:3.10819e-77   87:7.99735e-26   88:2.20357e-81   89:2.67345e-53   90:7.94344e-34   91:2.97164e-83   92:5.35325e-70   93:4.00621e-67   94:9.56203e-61   95:8.90908e-50   96:2.06665e-60   97:4.78235e-53   98:2.39696e-50   99:1.73158e-52   100:4.63302e-32   101:1.96568e-84   102:9.10009e-69   103:1.6031e-73   104:8.72453e-44   105:3.37299e-88   106:5.02218e-44   107:1.70837e-107   108:6.30181e-110   109:6.10561e-89   110:6.96046e-35   111:2.86401e-101   112:2.9393e-31   113:4.69209e-129   114:2.34756e-56   115:9.01049e-68   116:3.00809e-54   ...
...
..
...
5906:9.52601e-72   5907:9.93085e-28   5908:1.0446e-76   5909:1.53848e-65   5910:6.02333e-52   5911:2.12289e-43   5912:7.00049e-71   5913:7.89907e-37   5914:8.86224e-60   5915:1.47167e-70


####################
cat   seq_psam.dat
   psam_001.xml
NM_000088.3   0.0301094


#########################

Could you please help me to understand these results ?
What is this matrix  "aff_psam_001.xml" ?
How could I obtain more details on the affinity change and dynamics of binding site ?

Thank you in advance
Best Regards

Azadeh
Pages: [1] 2 3 ... 8
Created and maintained by Dr. Xiang-Jun Lu [律祥俊]. See also http://forum.x3dna.org and http://x3dna.org