Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - xiangjun

Pages: [1] 2
1
In the REDUCE Suite, a topology is a shorthand form for exploring a sequence motif, possibly including gaps. It is a flexible and convenient way to test different motif patterns as users see fit in a particular application.

For example, X8 means an 8-mer, with 4^8=65,536 possible combinations of the four canonical DNA or RNA bases. As another example, the topology X3--X4 represents a 7-mer with a 2-nt gap in between. The k-mer size (number of X positions) can be greater than 8: the maximum number is 15. Check the source code for details.

Try the utility program Topo2Dictfile to see a list of sequences corresponding to a given topology. These are the base sequences tested by MotifREDUCE/MatrixREDUCE.

Hope this helps,

Xiang-Jun

2
General Discussion / Re: Multicollinearity
« on: September 07, 2018, 09:32:52 pm »
Hi,

Thanks for using the REDUCE Suite and for posting your questions on the Forum.

As can be seen from the source code, Transfactivity checks for degeneracy in input data using SVD. However, multicollinearity is not checked by the program, as you already noticed. Users need to deal with the multicollinearity issue using other tools.

Best regards,

Xiang-Jun

3
General Discussion / Re: download previous versions of REDUCE_Suite
« on: August 16, 2018, 08:44:17 pm »
Hi Kate,

Sorry, no previous versions of the REDUCE_Suite are available for download.

May I know what you want to achieve with the original Convert2PSAM -source=v1 option, possibly with a concrete example?

Xiang-Jun

4
Documentation / Re: Other utility programs
« on: July 18, 2018, 07:53:14 am »
Hi,

Type ProcessFASTA -h for more info. Check the source code for technical details.

Best regards,

Xiang-Jun

5
General Discussion / Re: availability of FeatureREDUCE?
« on: July 17, 2018, 06:58:47 pm »
Thanks for posting your FeatureREDUCE question(s) on the REDUCE Suite Forum. Unfortunately (and as you noticed), FeatureREDUCE is not available from the basic REDUCE Suite which includes MatrixREDUCE/MotifREDUCE and some accessory programs. I was not involved in the development of FeatureREDUCE and its support (if any) is not covered by the Forum (I've made this point clear from the announcement page). Sorry for not being able to provide you with a more positive answer.

Xiang-Jun

6
General Discussion / Re: Error when generating logos in PDF format
« on: February 11, 2018, 09:53:30 am »
Quote
I can't directly read LogoGenerator-created EPS file on Linux. Am I missing something?

Could you be more specific? Please provide a concrete example so I (and others) can reproduce what you failed to achieve.

Best regards,

Xiang-Jun

7
General Discussion / Re: Error converting PWM to PSAM
« on: December 19, 2017, 10:42:12 pm »
Hi Jason,

I've updated the REDUCE Suite to version 2.2.5-2017dec19 in which Convert2PSAM has an additional source option of PFM. An example run on YDR146C (you posted) is shown below:

Code: [Select]
Convert2PSAM -source=pf -inp=$REDUCE_SUITE/data/formats/pfm_YDR146C.dat -psam=stdout
Please have a try, and let me know if you've further problems.

Xiang-Jun

8
General Discussion / Re: Error converting PWM to PSAM
« on: December 19, 2017, 01:42:50 pm »
The PFM is row-wise, while the PWM format accepted by Convert2PSAM is column-wise, in order of A, C, G, and T. See $REDUCE_SUITE/data/formats/pwm_ex.dat for an example.

I'll revise Convert2PSAM to accept the PFM format, so you do not need to do extra work.

Xiang-Jun

9
General Discussion / Re: Error converting PWM to PSAM
« on: December 19, 2017, 01:12:49 pm »
Hi Jason,

I've looked into the issue. As expected, it is indeed yet another PWM variant that need special attention to be converted to PSAM.

One example (YDR146C_569.pwm) from Expert_PWMs.tar.gz is as below:

Code: [Select]
A 0.381537584575116 1.07668300283655 -800 -800 1.68965987938785 -800 -800 0.989220160345073
T -0.31034012061215 -3.01077985606558 -800 -800 -800 -800 -800 -2.01077983731055
G 0.395928676331139 -2.30451105912229 -800 -800 -800 2.39592867633114 -800 -1.30451104036726
C -0.982582949230903 0.502843879011055 2.39592867633114 2.39592867633114 -800 -800 2.39592867633114 0.280451460353898

It has negative values, including a presumably cutoff value of -800. On the other hand, entries in PSAM should all be positive. So we need a way to convert the negative values to positive ones.

In a similar case, the TAMO (as in the MacIsaac dataset) format distributed with the REDUCE Suite looks like the following:

Code: [Select]
Log-odds matrix for Motif   0 rGAA..TtctrGAA (0)
#        0         1         2         3         4         5         6         7         8         9        10        11        12        13
#A     0.743    -1.052     1.647     1.443    -1.558    -0.374    -3.255    -5.001    -0.793    -2.480     1.175    -3.678     1.635     1.629
#C    -1.105   -10.336    -8.324    -3.641     0.691     0.311    -1.463    -0.208     1.931    -1.053    -2.000   -10.641    -2.819    -4.350
#T    -3.868    -3.114    -4.032    -2.297     0.288    -0.426     1.428     1.320    -2.576     1.393    -5.066    -3.566    -5.030    -3.764
#G     0.967     2.100    -4.305    -1.267     0.140     0.632    -1.563    -1.879    -2.088    -2.285     0.357     2.321    -8.368    -4.399

Here, Convert2PSAM performs a 2**score transformation so that the scores become positive.

Should we take a similar transformation for the Expert_PWMs.tar.gz data? Harmen, what's your take?

Please let me know your opinions.

Xiang-Jun


PS. For the record, it is worth noting that the $REDUCE_SUITE/data/formats/ folder contains several other commonly used PWM-like files that can be handled by Convert2PSAM.

Code: [Select]
#transfac.dat
ID any_old_name_for_motif_1
BF species_name_for_motif_1
P0      A      C      G      T
01      1      2      2      0      S
02      2      1      2      0      R
03      3      0      1      1      A
04      0      5      0      0      C
05      5      0      0      0      A
06      0      0      4      1      G
07      0      1      4      0      G
08      0      0      0      5      T
09      0      0      5      0      G
10      0      1      2      2      K
11      0      2      0      3      Y
12      1      0      3      1      G
XX
//

Code: [Select]
#jaspar_ex1.dat
 1  6  1  0 13  0  6  0 13 15  2  5
 4  0  0  0  1 15  0  9  4  0  3  5
 8 12  0  3  2  1 12  0  1  1  1  3
 5  0 17 15  2  2  0  9  0  2 12  5

Code: [Select]
#jaspar_ex2.dat
A  [ 1  6  1  0 13  0  6  0 13 15  2  5 ]
C  [ 4  0  0  0  1 15  0  9  4  0  3  5 ]
G  [ 8 12  0  3  2  1 12  0  1  1  1  3 ]
T  [ 5  0 17 15  2  2  0  9  0  2 12  5 ]

The Convert2PSAM has been created explicitly for such real-world wild cases.

10
General Discussion / Re: Error converting PWM to PSAM
« on: December 18, 2017, 02:14:05 pm »
Hi Jason,

Thanks for using the REDUCE Suite and for posting on the Forum.

The error message seems to hint a PWM format variant that Convert2PSAM cannot handle. I'll look into the details, and revise Convert2PSAM as necessary. I'll post back on the Forum, probably by tomorrow.

Best regards,

Xiang-Jun

11
General Discussion / Re: Error when generating logos in PDF format
« on: November 15, 2017, 11:09:40 pm »
As a followup, the REDUCE Suite has been updated to v2.2.4-2017nov16. The LogoGenerator bug for PDF output has been fixed. The obsolete GIF output has been removed to avoid a dependency on the convert program from ImageMagick. The default PNG format is the choice for use with HTMLSummary-generated webpage. The LogoGenerator documentation has been also revised.

Some examples:

Code: Ruby
  1. # By default, the output is in PNG format
  2. LogoGenerator -file=$REDUCE_SUITE/data/formats/psam_ex.dat -logo=sample.png
  3. # Using the -format=pdf option for PDF output
  4. LogoGenerator -file=$REDUCE_SUITE/data/formats/psam_ex.dat -logo=sample.pdf -format=pdf
  5. # Output in the raw EPS format with -format=eps
  6. LogoGenerator -file=$REDUCE_SUITE/data/formats/psam_ex.dat -logo=sample.eps -format=eps

The LogoGenerator utility in the REDUCE Suite is a general purpose, robust logo generator of DNA or RNA base sequences. It creates a logo in the vector EPS format, which can be easily converted to other vector or raster image format using numerous third-party tools. Internally, LogoGenerator takes advantage of the widely available gs program (Ghostscript).

It is worth noting that on Mac OS X, the preview program can directly read LogoGenerator-created EPS file and convert it to PDF format. On Linux and Windows, the situation should be similar.

Xiang-Jun

12
General Discussion / Re: Error when generating logos in PDF format
« on: November 15, 2017, 07:04:41 pm »
Hi Harmen,

Thanks for your quick feedback.

I'll update the software code with 'gif' output removed, but keep the PDF option. A new release will be made available on the download page late tonight.

Best regards,

Xiang-Jun


13
General Discussion / Re: Error when generating logos in PDF format
« on: November 15, 2017, 06:07:43 pm »
Hi Harmen,

Thanks for posting on the Forum!

Yes, I can reproduce the error message with regard to generating the logos in PDF format. It is indeed due to the Ghostscript "-dTextAlphaBits=4" option you reported. I am using Ghostscript 9.21.

I remember taking the "-dTextAlphaBits=4" option from reading on the docs/examples somewhere. Now that we know the problem, we have the following options to go:

  • Simply remove the "-dTextAlphaBits=4" option from the system call.
  • Or we can remove the support of the PDF output format (from the documentation).

While we are here, I'd also want to remove the largely out-of-date GIF output format. By doing so, we also get rid of the dependency on the convert from ImageMagick.

What's your take? Please let me know, and I will update and code for a new release late tonight (or tomorrow).

Xiang-Jun

14
Documentation / Re: Set up the REDUCE Suite
« on: June 19, 2017, 04:39:58 pm »
Hi Rahul,

Thanks for your feedback. Step #5 should work as is if step #4 has been performed as advertised, which adds the bin/ directory to PATH. I've slightly refined the instruction for step #4 to make it clearer.

Executing 'bin/MatrixREDUCE -h' assumes one is at the $REDUCE_SUITE root directory.

Xiang-Jun

15
General Discussion / Re: Affinity score calculation
« on: January 05, 2017, 01:30:05 pm »
Hi,

Thanks for using the REDUCE Suite and for posting your question(s) on the Forum.

The concept of affinity in the REDUCE Suite is simple, but technical. As is often the case, the idea can be best illustrated with a concrete example.

Let's suppose we have a PSAM (sample-psam.xml) as shown below:

Code: [Select]
<matrix_reduce>

<directionality>forward</directionality>
<psam_length>6</psam_length>

<psam>
# A            C            G            T
# +============+============+============+=======
  1            0.25         0.1          0.1   #1
  0.1          0.5          0.2          1     #2
  0.1          1            0.1          0.1   #3
  1            0.1          0.6          0.1   #4
  0.2          0.6          1            0.1   #5
  0.1          0.1          1            0.3   #6
</psam>
</matrix_reduce>

And a short base sequence (sample-seq.txt) as below:

Code: [Select]
>sample
GTCATGGT

Since the PSAM has a length of 6, and the single sequence has 8 bases, there are three sliding windows, as detailed below:

Code: [Select]
w1: GTCATG      --- affinity of w1: 0.1 * 1 * 1 * 0.1 * 1 * 1 = 0.01
w2:   TCATGG    --- affinity of w2: 0.1 * 0.5 * 0.1 * 0.1 * 1 * 1 = 0.0005
w3:     CATGGT  --- affinity of w3: 0.2 * 0.1 * 0.1 * 0.6 * 1 * 0.3 = 0.00045
---- sum of affinity = 0.01095

If you run:
Code: [Select]
AffinityProfile -seq=sample-seq.txt -psam=sample-psam.xml
you will find the following content in the default output file seq_psam.dat:
Code: [Select]
        sample-psam.xml
sample  0.01095

There are quite a few variations for the calculation of affinity in AffinityProfile, but the above example covers the essence. Since the REDUCE Suite is open source, you can and are encouraged to dive into the details.

Hope this helps,

Xiang-Jun


Pages: [1] 2
Created and maintained by Dr. Xiang-Jun Lu [律祥俊]. See also http://forum.x3dna.org and http://x3dna.org