Hi Jason,
I've looked into the issue. As expected, it is indeed yet another PWM variant that need special attention to be converted to PSAM.
One example (YDR146C_569.pwm) from Expert_PWMs.tar.gz is as below:
A 0.381537584575116 1.07668300283655 -800 -800 1.68965987938785 -800 -800 0.989220160345073
T -0.31034012061215 -3.01077985606558 -800 -800 -800 -800 -800 -2.01077983731055
G 0.395928676331139 -2.30451105912229 -800 -800 -800 2.39592867633114 -800 -1.30451104036726
C -0.982582949230903 0.502843879011055 2.39592867633114 2.39592867633114 -800 -800 2.39592867633114 0.280451460353898
It has negative values, including a presumably cutoff value of -800. On the other hand, entries in PSAM should all be positive. So we need a way to convert the negative values to positive ones.
In a similar case, the TAMO (as in the MacIsaac dataset) format distributed with the REDUCE Suite looks like the following:
Log-odds matrix for Motif 0 rGAA..TtctrGAA (0)
# 0 1 2 3 4 5 6 7 8 9 10 11 12 13
#A 0.743 -1.052 1.647 1.443 -1.558 -0.374 -3.255 -5.001 -0.793 -2.480 1.175 -3.678 1.635 1.629
#C -1.105 -10.336 -8.324 -3.641 0.691 0.311 -1.463 -0.208 1.931 -1.053 -2.000 -10.641 -2.819 -4.350
#T -3.868 -3.114 -4.032 -2.297 0.288 -0.426 1.428 1.320 -2.576 1.393 -5.066 -3.566 -5.030 -3.764
#G 0.967 2.100 -4.305 -1.267 0.140 0.632 -1.563 -1.879 -2.088 -2.285 0.357 2.321 -8.368 -4.399
Here, Convert2PSAM performs a 2**score transformation so that the scores become positive.
Should we take a similar transformation for the Expert_PWMs.tar.gz data? Harmen, what's your take?
Please let me know your opinions.
Xiang-Jun
PS. For the record, it is worth noting that the $REDUCE_SUITE/data/formats/ folder contains several other commonly used PWM-like files that can be handled by Convert2PSAM.
#transfac.dat
ID any_old_name_for_motif_1
BF species_name_for_motif_1
P0 A C G T
01 1 2 2 0 S
02 2 1 2 0 R
03 3 0 1 1 A
04 0 5 0 0 C
05 5 0 0 0 A
06 0 0 4 1 G
07 0 1 4 0 G
08 0 0 0 5 T
09 0 0 5 0 G
10 0 1 2 2 K
11 0 2 0 3 Y
12 1 0 3 1 G
XX
//
#jaspar_ex1.dat
1 6 1 0 13 0 6 0 13 15 2 5
4 0 0 0 1 15 0 9 4 0 3 5
8 12 0 3 2 1 12 0 1 1 1 3
5 0 17 15 2 2 0 9 0 2 12 5
#jaspar_ex2.dat
A [ 1 6 1 0 13 0 6 0 13 15 2 5 ]
C [ 4 0 0 0 1 15 0 9 4 0 3 5 ]
G [ 8 12 0 3 2 1 12 0 1 1 1 3 ]
T [ 5 0 17 15 2 2 0 9 0 2 12 5 ]
The Convert2PSAM has been created explicitly for such real-world wild cases.