Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - xiangjun

Pages: [1] 2
1
Hi Kate,

Sorry, no previous versions of the REDUCE_Suite are available for download.

May I know what you want to achieve with the original Convert2PSAM -source=v1 option, possibly with a concrete example?

Xiang-Jun

2
Documentation / Re: Other utility programs
« on: July 18, 2018, 07:53:14 am »
Hi,

Type ProcessFASTA -h for more info. Check the source code for technical details.

Best regards,

Xiang-Jun

3
General Discussion / Re: availability of FeatureREDUCE?
« on: July 17, 2018, 06:58:47 pm »
Thanks for posting your FeatureREDUCE question(s) on the REDUCE Suite Forum. Unfortunately (and as you noticed), FeatureREDUCE is not available from the basic REDUCE Suite which includes MatrixREDUCE/MotifREDUCE and some accessory programs. I was not involved in the development of FeatureREDUCE and its support (if any) is not covered by the Forum (I've made this point clear from the announcement page). Sorry for not being able to provide you with a more positive answer.

Xiang-Jun

4
General Discussion / Re: Error when generating logos in PDF format
« on: February 11, 2018, 09:53:30 am »
Quote
I can't directly read LogoGenerator-created EPS file on Linux. Am I missing something?

Could you be more specific? Please provide a concrete example so I (and others) can reproduce what you failed to achieve.

Best regards,

Xiang-Jun

5
General Discussion / Re: Error converting PWM to PSAM
« on: December 19, 2017, 10:42:12 pm »
Hi Jason,

I've updated the REDUCE Suite to version 2.2.5-2017dec19 in which Convert2PSAM has an additional source option of PFM. An example run on YDR146C (you posted) is shown below:

Code: [Select]
Convert2PSAM -source=pf -inp=$REDUCE_SUITE/data/formats/pfm_YDR146C.dat -psam=stdout
Please have a try, and let me know if you've further problems.

Xiang-Jun

6
General Discussion / Re: Error converting PWM to PSAM
« on: December 19, 2017, 01:42:50 pm »
The PFM is row-wise, while the PWM format accepted by Convert2PSAM is column-wise, in order of A, C, G, and T. See $REDUCE_SUITE/data/formats/pwm_ex.dat for an example.

I'll revise Convert2PSAM to accept the PFM format, so you do not need to do extra work.

Xiang-Jun

7
General Discussion / Re: Error converting PWM to PSAM
« on: December 19, 2017, 01:12:49 pm »
Hi Jason,

I've looked into the issue. As expected, it is indeed yet another PWM variant that need special attention to be converted to PSAM.

One example (YDR146C_569.pwm) from Expert_PWMs.tar.gz is as below:

Code: [Select]
A 0.381537584575116 1.07668300283655 -800 -800 1.68965987938785 -800 -800 0.989220160345073
T -0.31034012061215 -3.01077985606558 -800 -800 -800 -800 -800 -2.01077983731055
G 0.395928676331139 -2.30451105912229 -800 -800 -800 2.39592867633114 -800 -1.30451104036726
C -0.982582949230903 0.502843879011055 2.39592867633114 2.39592867633114 -800 -800 2.39592867633114 0.280451460353898

It has negative values, including a presumably cutoff value of -800. On the other hand, entries in PSAM should all be positive. So we need a way to convert the negative values to positive ones.

In a similar case, the TAMO (as in the MacIsaac dataset) format distributed with the REDUCE Suite looks like the following:

Code: [Select]
Log-odds matrix for Motif   0 rGAA..TtctrGAA (0)
#        0         1         2         3         4         5         6         7         8         9        10        11        12        13
#A     0.743    -1.052     1.647     1.443    -1.558    -0.374    -3.255    -5.001    -0.793    -2.480     1.175    -3.678     1.635     1.629
#C    -1.105   -10.336    -8.324    -3.641     0.691     0.311    -1.463    -0.208     1.931    -1.053    -2.000   -10.641    -2.819    -4.350
#T    -3.868    -3.114    -4.032    -2.297     0.288    -0.426     1.428     1.320    -2.576     1.393    -5.066    -3.566    -5.030    -3.764
#G     0.967     2.100    -4.305    -1.267     0.140     0.632    -1.563    -1.879    -2.088    -2.285     0.357     2.321    -8.368    -4.399

Here, Convert2PSAM performs a 2**score transformation so that the scores become positive.

Should we take a similar transformation for the Expert_PWMs.tar.gz data? Harmen, what's your take?

Please let me know your opinions.

Xiang-Jun


PS. For the record, it is worth noting that the $REDUCE_SUITE/data/formats/ folder contains several other commonly used PWM-like files that can be handled by Convert2PSAM.

Code: [Select]
#transfac.dat
ID any_old_name_for_motif_1
BF species_name_for_motif_1
P0      A      C      G      T
01      1      2      2      0      S
02      2      1      2      0      R
03      3      0      1      1      A
04      0      5      0      0      C
05      5      0      0      0      A
06      0      0      4      1      G
07      0      1      4      0      G
08      0      0      0      5      T
09      0      0      5      0      G
10      0      1      2      2      K
11      0      2      0      3      Y
12      1      0      3      1      G
XX
//

Code: [Select]
#jaspar_ex1.dat
 1  6  1  0 13  0  6  0 13 15  2  5
 4  0  0  0  1 15  0  9  4  0  3  5
 8 12  0  3  2  1 12  0  1  1  1  3
 5  0 17 15  2  2  0  9  0  2 12  5

Code: [Select]
#jaspar_ex2.dat
A  [ 1  6  1  0 13  0  6  0 13 15  2  5 ]
C  [ 4  0  0  0  1 15  0  9  4  0  3  5 ]
G  [ 8 12  0  3  2  1 12  0  1  1  1  3 ]
T  [ 5  0 17 15  2  2  0  9  0  2 12  5 ]

The Convert2PSAM has been created explicitly for such real-world wild cases.

8
General Discussion / Re: Error converting PWM to PSAM
« on: December 18, 2017, 02:14:05 pm »
Hi Jason,

Thanks for using the REDUCE Suite and for posting on the Forum.

The error message seems to hint a PWM format variant that Convert2PSAM cannot handle. I'll look into the details, and revise Convert2PSAM as necessary. I'll post back on the Forum, probably by tomorrow.

Best regards,

Xiang-Jun

9
General Discussion / Re: Error when generating logos in PDF format
« on: November 15, 2017, 11:09:40 pm »
As a followup, the REDUCE Suite has been updated to v2.2.4-2017nov16. The LogoGenerator bug for PDF output has been fixed. The obsolete GIF output has been removed to avoid a dependency on the convert program from ImageMagick. The default PNG format is the choice for use with HTMLSummary-generated webpage. The LogoGenerator documentation has been also revised.

Some examples:

Code: Ruby
  1. # By default, the output is in PNG format
  2. LogoGenerator -file=$REDUCE_SUITE/data/formats/psam_ex.dat -logo=sample.png
  3. # Using the -format=pdf option for PDF output
  4. LogoGenerator -file=$REDUCE_SUITE/data/formats/psam_ex.dat -logo=sample.pdf -format=pdf
  5. # Output in the raw EPS format with -format=eps
  6. LogoGenerator -file=$REDUCE_SUITE/data/formats/psam_ex.dat -logo=sample.eps -format=eps

The LogoGenerator utility in the REDUCE Suite is a general purpose, robust logo generator of DNA or RNA base sequences. It creates a logo in the vector EPS format, which can be easily converted to other vector or raster image format using numerous third-party tools. Internally, LogoGenerator takes advantage of the widely available gs program (Ghostscript).

It is worth noting that on Mac OS X, the preview program can directly read LogoGenerator-created EPS file and convert it to PDF format. On Linux and Windows, the situation should be similar.

Xiang-Jun

10
General Discussion / Re: Error when generating logos in PDF format
« on: November 15, 2017, 07:04:41 pm »
Hi Harmen,

Thanks for your quick feedback.

I'll update the software code with 'gif' output removed, but keep the PDF option. A new release will be made available on the download page late tonight.

Best regards,

Xiang-Jun


11
General Discussion / Re: Error when generating logos in PDF format
« on: November 15, 2017, 06:07:43 pm »
Hi Harmen,

Thanks for posting on the Forum!

Yes, I can reproduce the error message with regard to generating the logos in PDF format. It is indeed due to the Ghostscript "-dTextAlphaBits=4" option you reported. I am using Ghostscript 9.21.

I remember taking the "-dTextAlphaBits=4" option from reading on the docs/examples somewhere. Now that we know the problem, we have the following options to go:

  • Simply remove the "-dTextAlphaBits=4" option from the system call.
  • Or we can remove the support of the PDF output format (from the documentation).

While we are here, I'd also want to remove the largely out-of-date GIF output format. By doing so, we also get rid of the dependency on the convert from ImageMagick.

What's your take? Please let me know, and I will update and code for a new release late tonight (or tomorrow).

Xiang-Jun

12
Documentation / Re: Set up the REDUCE Suite
« on: June 19, 2017, 04:39:58 pm »
Hi Rahul,

Thanks for your feedback. Step #5 should work as is if step #4 has been performed as advertised, which adds the bin/ directory to PATH. I've slightly refined the instruction for step #4 to make it clearer.

Executing 'bin/MatrixREDUCE -h' assumes one is at the $REDUCE_SUITE root directory.

Xiang-Jun

13
General Discussion / Re: Affinity score calculation
« on: January 05, 2017, 01:30:05 pm »
Hi,

Thanks for using the REDUCE Suite and for posting your question(s) on the Forum.

The concept of affinity in the REDUCE Suite is simple, but technical. As is often the case, the idea can be best illustrated with a concrete example.

Let's suppose we have a PSAM (sample-psam.xml) as shown below:

Code: [Select]
<matrix_reduce>

<directionality>forward</directionality>
<psam_length>6</psam_length>

<psam>
# A            C            G            T
# +============+============+============+=======
  1            0.25         0.1          0.1   #1
  0.1          0.5          0.2          1     #2
  0.1          1            0.1          0.1   #3
  1            0.1          0.6          0.1   #4
  0.2          0.6          1            0.1   #5
  0.1          0.1          1            0.3   #6
</psam>
</matrix_reduce>

And a short base sequence (sample-seq.txt) as below:

Code: [Select]
>sample
GTCATGGT

Since the PSAM has a length of 6, and the single sequence has 8 bases, there are three sliding windows, as detailed below:

Code: [Select]
w1: GTCATG      --- affinity of w1: 0.1 * 1 * 1 * 0.1 * 1 * 1 = 0.01
w2:   TCATGG    --- affinity of w2: 0.1 * 0.5 * 0.1 * 0.1 * 1 * 1 = 0.0005
w3:     CATGGT  --- affinity of w3: 0.2 * 0.1 * 0.1 * 0.6 * 1 * 0.3 = 0.00045
---- sum of affinity = 0.01095

If you run:
Code: [Select]
AffinityProfile -seq=sample-seq.txt -psam=sample-psam.xml
you will find the following content in the default output file seq_psam.dat:
Code: [Select]
        sample-psam.xml
sample  0.01095

There are quite a few variations for the calculation of affinity in AffinityProfile, but the above example covers the essence. Since the REDUCE Suite is open source, you can and are encouraged to dive into the details.

Hope this helps,

Xiang-Jun


14
General Discussion / Re: What do the bases mean in PSAM?
« on: December 01, 2016, 10:56:56 am »
Dear Pan Shen,

Thanks for using the REDUCE Suite and for asking your questions on the Forum.

The W in the converted PSAM notation means A or T (Weak, since the A-T Watson-Crick pair has two H-bonds, compared to three in a G-C pair). Not surprising, S (for Strong) represents G or C.

More details on "Nucleic acid notation" can be found in the Wikipedia, among many other online resources.

Hope this helps.

Xiang-Jun

15
Documentation / Other utility programs
« on: September 29, 2016, 01:27:04 pm »
The REDUCE Suite distribution also includes the following auxiliary programs. Simple type the corresponding program name with -h (e.g., Convert2PSAM -h) should provide sufficient information to get one started.

Convert2PSAM
As its name suggests, Convert2PSAM is a utility program that converts other commonly used motif (pattern) representations in nucleic acid sequences to PSAM, which is unique to the REDUCE Suite. It can also be used to standardize the various formats to a simplified PWM format for easy communication.

Topo2Dictfile
The default topological pattern mechanism can be used to specify sequence motifs in a compact, convenient, and flexible way. However, it defines the motifs implicitly, has length limit (15 non-gap positions), and does not take into consideration of the IUPAC degenerate symbols. As an example, X6 stands for exactly 4^6 = 4096 combinations, from AAAAAA, AAAAAC, ... TTTTTT. Sometimes, we may need more control by specifying the motifs explicitly in a dictionary file, with arbitrary length and IUPAC symbols. This can be facilitated by Topo2Dictfile by first generating a motif dictionary accordingly to user-specified topological patterns, and then editing it as needed, e.g., deleting some motifs, adding more, or introducing IUPAC degeneracy symbols etc.

ProcessFASTA
ProcessFASTA is a simple utility program to process a sequence file in FASTA format, e.g., to select a list of sequences based on ids, convert to reverse complementary, combine id and sequence into one-line etc. While such functionalities are surely available in various heavy-duty toolboxes/environments (BioPerl, EMBOSS, BioConductor etc.), none fits ours needs perfectly. We have thus developed this simple utility program mainly for our own convenience.

ProcessTdat
This is simple utility program to process a tab-delimited text file, e.g., to extract a subset, perform log transformation, and sort entries by id order etc. It is created following the same idea as for ProcessFASTA.

ExtractWindows
A simple utility program to extract sequence fragments from a sequence file, probably of a chromosome.

psamdir2list
A Perl utility program to generate a list of PSAM in a given directory. The resultant list can be fed into AffinityProfile or Transfactivity.

Pages: [1] 2
Created and maintained by Dr. Xiang-Jun Lu [律祥俊]. See also http://forum.x3dna.org and http://x3dna.org