Author Topic: Error converting PWM to PSAM  (Read 147749 times)

jason

  • with posts
  • *
  • Posts: 5
    • View Profile
Error converting PWM to PSAM
« on: December 18, 2017, 02:01:28 pm »
Hi, I've downloaded a set of PWMs from YeTFaSCo: http://yetfasco.ccbr.utoronto.ca/1.02/Downloads/Expert_PWMs.tar.gz

I would like to use these PWMs with the Transfactivity program.

However, I can't seem to get the convert2psam utility to work on these... I assume that it expects a slightly different PWM format than the ones provided by the download, but I can't figure out exactly what format it expects. Could you perhaps let me know if I'm doing something wrong?

Here's the command I ran and the error:

 bin/Convert2PSAM -source=pw -inp=data/yetfasco/ALIGNED_ENOLOGO_FORMAT_PWMS/YDR146C_569.pwm -pwmfile=test.xml

<data/yetfasco/ALIGNED_ENOLOGO_FORMAT_PWMS/YDR146C_569.pwm> not in PWM format: [A   0.381537584575116   1.07668300283655   -800   -800   1.68965987938785   -800   -800   0.989220160345073] contains invalid W a


xiangjun

  • Administrator
  • with posts
  • *****
  • Posts: 42
    • View Profile
Re: Error converting PWM to PSAM
« Reply #1 on: December 18, 2017, 02:14:05 pm »
Hi Jason,

Thanks for using the REDUCE Suite and for posting on the Forum.

The error message seems to hint a PWM format variant that Convert2PSAM cannot handle. I'll look into the details, and revise Convert2PSAM as necessary. I'll post back on the Forum, probably by tomorrow.

Best regards,

Xiang-Jun

xiangjun

  • Administrator
  • with posts
  • *****
  • Posts: 42
    • View Profile
Re: Error converting PWM to PSAM
« Reply #2 on: December 19, 2017, 01:12:49 pm »
Hi Jason,

I've looked into the issue. As expected, it is indeed yet another PWM variant that need special attention to be converted to PSAM.

One example (YDR146C_569.pwm) from Expert_PWMs.tar.gz is as below:

Code: [Select]
A 0.381537584575116 1.07668300283655 -800 -800 1.68965987938785 -800 -800 0.989220160345073
T -0.31034012061215 -3.01077985606558 -800 -800 -800 -800 -800 -2.01077983731055
G 0.395928676331139 -2.30451105912229 -800 -800 -800 2.39592867633114 -800 -1.30451104036726
C -0.982582949230903 0.502843879011055 2.39592867633114 2.39592867633114 -800 -800 2.39592867633114 0.280451460353898

It has negative values, including a presumably cutoff value of -800. On the other hand, entries in PSAM should all be positive. So we need a way to convert the negative values to positive ones.

In a similar case, the TAMO (as in the MacIsaac dataset) format distributed with the REDUCE Suite looks like the following:

Code: [Select]
Log-odds matrix for Motif   0 rGAA..TtctrGAA (0)
#        0         1         2         3         4         5         6         7         8         9        10        11        12        13
#A     0.743    -1.052     1.647     1.443    -1.558    -0.374    -3.255    -5.001    -0.793    -2.480     1.175    -3.678     1.635     1.629
#C    -1.105   -10.336    -8.324    -3.641     0.691     0.311    -1.463    -0.208     1.931    -1.053    -2.000   -10.641    -2.819    -4.350
#T    -3.868    -3.114    -4.032    -2.297     0.288    -0.426     1.428     1.320    -2.576     1.393    -5.066    -3.566    -5.030    -3.764
#G     0.967     2.100    -4.305    -1.267     0.140     0.632    -1.563    -1.879    -2.088    -2.285     0.357     2.321    -8.368    -4.399

Here, Convert2PSAM performs a 2**score transformation so that the scores become positive.

Should we take a similar transformation for the Expert_PWMs.tar.gz data? Harmen, what's your take?

Please let me know your opinions.

Xiang-Jun


PS. For the record, it is worth noting that the $REDUCE_SUITE/data/formats/ folder contains several other commonly used PWM-like files that can be handled by Convert2PSAM.

Code: [Select]
#transfac.dat
ID any_old_name_for_motif_1
BF species_name_for_motif_1
P0      A      C      G      T
01      1      2      2      0      S
02      2      1      2      0      R
03      3      0      1      1      A
04      0      5      0      0      C
05      5      0      0      0      A
06      0      0      4      1      G
07      0      1      4      0      G
08      0      0      0      5      T
09      0      0      5      0      G
10      0      1      2      2      K
11      0      2      0      3      Y
12      1      0      3      1      G
XX
//

Code: [Select]
#jaspar_ex1.dat
 1  6  1  0 13  0  6  0 13 15  2  5
 4  0  0  0  1 15  0  9  4  0  3  5
 8 12  0  3  2  1 12  0  1  1  1  3
 5  0 17 15  2  2  0  9  0  2 12  5

Code: [Select]
#jaspar_ex2.dat
A  [ 1  6  1  0 13  0  6  0 13 15  2  5 ]
C  [ 4  0  0  0  1 15  0  9  4  0  3  5 ]
G  [ 8 12  0  3  2  1 12  0  1  1  1  3 ]
T  [ 5  0 17 15  2  2  0  9  0  2 12  5 ]

The Convert2PSAM has been created explicitly for such real-world wild cases.

jason

  • with posts
  • *
  • Posts: 5
    • View Profile
Re: Error converting PWM to PSAM
« Reply #3 on: December 19, 2017, 01:27:00 pm »
I see! I don't have much experience with the PWM format, so I thank you for looking into this.

Is the expected PWM format then only integer values, like a count matrix?

Yetfasco also provides the data as a "PFM" or Position Frequency Matrix. In this matrix, all the values are stored as fractions between 0 and 1. http://yetfasco.ccbr.utoronto.ca/1.02/Downloads/Expert_PFMs.tar.gz

The equivalent PFM to YDR146C that you pasted is this:

A       0.403846154     0.653846154     0.0     0.0     1.0     0.0     0.0     0.615384615
T       0.25    0.038461538     0.0     0.0     0.0     0.0     0.0     0.076923077
G       0.25    0.038461538     0.0     0.0     0.0     1.0     0.0     0.076923077
C       0.096153846     0.269230769     1.0     1.0     0.0     0.0     1.0     0.230769231


I also tried converting the PFM to PSAM and got an error, but perhaps I could, for example, scale each PFM to integer values from 0 to 100 and convert that?

xiangjun

  • Administrator
  • with posts
  • *****
  • Posts: 42
    • View Profile
Re: Error converting PWM to PSAM
« Reply #4 on: December 19, 2017, 01:42:50 pm »
The PFM is row-wise, while the PWM format accepted by Convert2PSAM is column-wise, in order of A, C, G, and T. See $REDUCE_SUITE/data/formats/pwm_ex.dat for an example.

I'll revise Convert2PSAM to accept the PFM format, so you do not need to do extra work.

Xiang-Jun

jason

  • with posts
  • *
  • Posts: 5
    • View Profile
Re: Error converting PWM to PSAM
« Reply #5 on: December 19, 2017, 02:08:41 pm »
Great! If all it takes is a simple transpose of the PFM matrix, I could handle the rest, but thank you for offering to upgrade the script! Hopefully other users in the future find it helpful.

-Jason

xiangjun

  • Administrator
  • with posts
  • *****
  • Posts: 42
    • View Profile
Re: Error converting PWM to PSAM
« Reply #6 on: December 19, 2017, 10:42:12 pm »
Hi Jason,

I've updated the REDUCE Suite to version 2.2.5-2017dec19 in which Convert2PSAM has an additional source option of PFM. An example run on YDR146C (you posted) is shown below:

Code: [Select]
Convert2PSAM -source=pf -inp=$REDUCE_SUITE/data/formats/pfm_YDR146C.dat -psam=stdout
Please have a try, and let me know if you've further problems.

Xiang-Jun

jason

  • with posts
  • *
  • Posts: 5
    • View Profile
Re: Error converting PWM to PSAM
« Reply #7 on: December 21, 2017, 08:40:13 pm »
It works great! Thanks a ton.

 

Created and maintained by Dr. Xiang-Jun Lu [律祥俊]. See also http://forum.x3dna.org and http://x3dna.org