General Category > General Discussion
Error converting PWM to PSAM
jason:
Hi, I've downloaded a set of PWMs from YeTFaSCo: http://yetfasco.ccbr.utoronto.ca/1.02/Downloads/Expert_PWMs.tar.gz
I would like to use these PWMs with the Transfactivity program.
However, I can't seem to get the convert2psam utility to work on these... I assume that it expects a slightly different PWM format than the ones provided by the download, but I can't figure out exactly what format it expects. Could you perhaps let me know if I'm doing something wrong?
Here's the command I ran and the error:
bin/Convert2PSAM -source=pw -inp=data/yetfasco/ALIGNED_ENOLOGO_FORMAT_PWMS/YDR146C_569.pwm -pwmfile=test.xml
<data/yetfasco/ALIGNED_ENOLOGO_FORMAT_PWMS/YDR146C_569.pwm> not in PWM format: [A 0.381537584575116 1.07668300283655 -800 -800 1.68965987938785 -800 -800 0.989220160345073] contains invalid W a
xiangjun:
Hi Jason,
Thanks for using the REDUCE Suite and for posting on the Forum.
The error message seems to hint a PWM format variant that Convert2PSAM cannot handle. I'll look into the details, and revise Convert2PSAM as necessary. I'll post back on the Forum, probably by tomorrow.
Best regards,
Xiang-Jun
xiangjun:
Hi Jason,
I've looked into the issue. As expected, it is indeed yet another PWM variant that need special attention to be converted to PSAM.
One example (YDR146C_569.pwm) from Expert_PWMs.tar.gz is as below:
--- Code: ---A 0.381537584575116 1.07668300283655 -800 -800 1.68965987938785 -800 -800 0.989220160345073
T -0.31034012061215 -3.01077985606558 -800 -800 -800 -800 -800 -2.01077983731055
G 0.395928676331139 -2.30451105912229 -800 -800 -800 2.39592867633114 -800 -1.30451104036726
C -0.982582949230903 0.502843879011055 2.39592867633114 2.39592867633114 -800 -800 2.39592867633114 0.280451460353898
--- End code ---
It has negative values, including a presumably cutoff value of -800. On the other hand, entries in PSAM should all be positive. So we need a way to convert the negative values to positive ones.
In a similar case, the TAMO (as in the MacIsaac dataset) format distributed with the REDUCE Suite looks like the following:
--- Code: ---Log-odds matrix for Motif 0 rGAA..TtctrGAA (0)
# 0 1 2 3 4 5 6 7 8 9 10 11 12 13
#A 0.743 -1.052 1.647 1.443 -1.558 -0.374 -3.255 -5.001 -0.793 -2.480 1.175 -3.678 1.635 1.629
#C -1.105 -10.336 -8.324 -3.641 0.691 0.311 -1.463 -0.208 1.931 -1.053 -2.000 -10.641 -2.819 -4.350
#T -3.868 -3.114 -4.032 -2.297 0.288 -0.426 1.428 1.320 -2.576 1.393 -5.066 -3.566 -5.030 -3.764
#G 0.967 2.100 -4.305 -1.267 0.140 0.632 -1.563 -1.879 -2.088 -2.285 0.357 2.321 -8.368 -4.399
--- End code ---
Here, Convert2PSAM performs a 2**score transformation so that the scores become positive.
Should we take a similar transformation for the Expert_PWMs.tar.gz data? Harmen, what's your take?
Please let me know your opinions.
Xiang-Jun
PS. For the record, it is worth noting that the $REDUCE_SUITE/data/formats/ folder contains several other commonly used PWM-like files that can be handled by Convert2PSAM.
--- Code: ---#transfac.dat
ID any_old_name_for_motif_1
BF species_name_for_motif_1
P0 A C G T
01 1 2 2 0 S
02 2 1 2 0 R
03 3 0 1 1 A
04 0 5 0 0 C
05 5 0 0 0 A
06 0 0 4 1 G
07 0 1 4 0 G
08 0 0 0 5 T
09 0 0 5 0 G
10 0 1 2 2 K
11 0 2 0 3 Y
12 1 0 3 1 G
XX
//
--- End code ---
--- Code: ---#jaspar_ex1.dat
1 6 1 0 13 0 6 0 13 15 2 5
4 0 0 0 1 15 0 9 4 0 3 5
8 12 0 3 2 1 12 0 1 1 1 3
5 0 17 15 2 2 0 9 0 2 12 5
--- End code ---
--- Code: ---#jaspar_ex2.dat
A [ 1 6 1 0 13 0 6 0 13 15 2 5 ]
C [ 4 0 0 0 1 15 0 9 4 0 3 5 ]
G [ 8 12 0 3 2 1 12 0 1 1 1 3 ]
T [ 5 0 17 15 2 2 0 9 0 2 12 5 ]
--- End code ---
The Convert2PSAM has been created explicitly for such real-world wild cases.
jason:
I see! I don't have much experience with the PWM format, so I thank you for looking into this.
Is the expected PWM format then only integer values, like a count matrix?
Yetfasco also provides the data as a "PFM" or Position Frequency Matrix. In this matrix, all the values are stored as fractions between 0 and 1. http://yetfasco.ccbr.utoronto.ca/1.02/Downloads/Expert_PFMs.tar.gz
The equivalent PFM to YDR146C that you pasted is this:
A 0.403846154 0.653846154 0.0 0.0 1.0 0.0 0.0 0.615384615
T 0.25 0.038461538 0.0 0.0 0.0 0.0 0.0 0.076923077
G 0.25 0.038461538 0.0 0.0 0.0 1.0 0.0 0.076923077
C 0.096153846 0.269230769 1.0 1.0 0.0 0.0 1.0 0.230769231
I also tried converting the PFM to PSAM and got an error, but perhaps I could, for example, scale each PFM to integer values from 0 to 100 and convert that?
xiangjun:
The PFM is row-wise, while the PWM format accepted by Convert2PSAM is column-wise, in order of A, C, G, and T. See $REDUCE_SUITE/data/formats/pwm_ex.dat for an example.
I'll revise Convert2PSAM to accept the PFM format, so you do not need to do extra work.
Xiang-Jun
Navigation
[0] Message Index
[#] Next page
Go to full version