Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - xiangjun

Pages: 1 [2] 3
16
Documentation / Re: Other utility programs
« on: July 18, 2018, 07:53:14 am »
Hi,

Type ProcessFASTA -h for more info. Check the source code for technical details.

Best regards,

Xiang-Jun

17
General Discussion / Re: availability of FeatureREDUCE?
« on: July 17, 2018, 06:58:47 pm »
Thanks for posting your FeatureREDUCE question(s) on the REDUCE Suite Forum. Unfortunately (and as you noticed), FeatureREDUCE is not available from the basic REDUCE Suite which includes MatrixREDUCE/MotifREDUCE and some accessory programs. I was not involved in the development of FeatureREDUCE and its support (if any) is not covered by the Forum (I've made this point clear from the announcement page). Sorry for not being able to provide you with a more positive answer.

Xiang-Jun

18
General Discussion / Re: Error when generating logos in PDF format
« on: February 11, 2018, 09:53:30 am »
Quote
I can't directly read LogoGenerator-created EPS file on Linux. Am I missing something?

Could you be more specific? Please provide a concrete example so I (and others) can reproduce what you failed to achieve.

Best regards,

Xiang-Jun

19
General Discussion / Re: Error converting PWM to PSAM
« on: December 19, 2017, 10:42:12 pm »
Hi Jason,

I've updated the REDUCE Suite to version 2.2.5-2017dec19 in which Convert2PSAM has an additional source option of PFM. An example run on YDR146C (you posted) is shown below:

Code: [Select]
Convert2PSAM -source=pf -inp=$REDUCE_SUITE/data/formats/pfm_YDR146C.dat -psam=stdout
Please have a try, and let me know if you've further problems.

Xiang-Jun

20
General Discussion / Re: Error converting PWM to PSAM
« on: December 19, 2017, 01:42:50 pm »
The PFM is row-wise, while the PWM format accepted by Convert2PSAM is column-wise, in order of A, C, G, and T. See $REDUCE_SUITE/data/formats/pwm_ex.dat for an example.

I'll revise Convert2PSAM to accept the PFM format, so you do not need to do extra work.

Xiang-Jun

21
General Discussion / Re: Error converting PWM to PSAM
« on: December 19, 2017, 01:12:49 pm »
Hi Jason,

I've looked into the issue. As expected, it is indeed yet another PWM variant that need special attention to be converted to PSAM.

One example (YDR146C_569.pwm) from Expert_PWMs.tar.gz is as below:

Code: [Select]
A 0.381537584575116 1.07668300283655 -800 -800 1.68965987938785 -800 -800 0.989220160345073
T -0.31034012061215 -3.01077985606558 -800 -800 -800 -800 -800 -2.01077983731055
G 0.395928676331139 -2.30451105912229 -800 -800 -800 2.39592867633114 -800 -1.30451104036726
C -0.982582949230903 0.502843879011055 2.39592867633114 2.39592867633114 -800 -800 2.39592867633114 0.280451460353898

It has negative values, including a presumably cutoff value of -800. On the other hand, entries in PSAM should all be positive. So we need a way to convert the negative values to positive ones.

In a similar case, the TAMO (as in the MacIsaac dataset) format distributed with the REDUCE Suite looks like the following:

Code: [Select]
Log-odds matrix for Motif   0 rGAA..TtctrGAA (0)
#        0         1         2         3         4         5         6         7         8         9        10        11        12        13
#A     0.743    -1.052     1.647     1.443    -1.558    -0.374    -3.255    -5.001    -0.793    -2.480     1.175    -3.678     1.635     1.629
#C    -1.105   -10.336    -8.324    -3.641     0.691     0.311    -1.463    -0.208     1.931    -1.053    -2.000   -10.641    -2.819    -4.350
#T    -3.868    -3.114    -4.032    -2.297     0.288    -0.426     1.428     1.320    -2.576     1.393    -5.066    -3.566    -5.030    -3.764
#G     0.967     2.100    -4.305    -1.267     0.140     0.632    -1.563    -1.879    -2.088    -2.285     0.357     2.321    -8.368    -4.399

Here, Convert2PSAM performs a 2**score transformation so that the scores become positive.

Should we take a similar transformation for the Expert_PWMs.tar.gz data? Harmen, what's your take?

Please let me know your opinions.

Xiang-Jun


PS. For the record, it is worth noting that the $REDUCE_SUITE/data/formats/ folder contains several other commonly used PWM-like files that can be handled by Convert2PSAM.

Code: [Select]
#transfac.dat
ID any_old_name_for_motif_1
BF species_name_for_motif_1
P0      A      C      G      T
01      1      2      2      0      S
02      2      1      2      0      R
03      3      0      1      1      A
04      0      5      0      0      C
05      5      0      0      0      A
06      0      0      4      1      G
07      0      1      4      0      G
08      0      0      0      5      T
09      0      0      5      0      G
10      0      1      2      2      K
11      0      2      0      3      Y
12      1      0      3      1      G
XX
//

Code: [Select]
#jaspar_ex1.dat
 1  6  1  0 13  0  6  0 13 15  2  5
 4  0  0  0  1 15  0  9  4  0  3  5
 8 12  0  3  2  1 12  0  1  1  1  3
 5  0 17 15  2  2  0  9  0  2 12  5

Code: [Select]
#jaspar_ex2.dat
A  [ 1  6  1  0 13  0  6  0 13 15  2  5 ]
C  [ 4  0  0  0  1 15  0  9  4  0  3  5 ]
G  [ 8 12  0  3  2  1 12  0  1  1  1  3 ]
T  [ 5  0 17 15  2  2  0  9  0  2 12  5 ]

The Convert2PSAM has been created explicitly for such real-world wild cases.

22
General Discussion / Re: Error converting PWM to PSAM
« on: December 18, 2017, 02:14:05 pm »
Hi Jason,

Thanks for using the REDUCE Suite and for posting on the Forum.

The error message seems to hint a PWM format variant that Convert2PSAM cannot handle. I'll look into the details, and revise Convert2PSAM as necessary. I'll post back on the Forum, probably by tomorrow.

Best regards,

Xiang-Jun

23
General Discussion / Re: Error when generating logos in PDF format
« on: November 15, 2017, 11:09:40 pm »
As a followup, the REDUCE Suite has been updated to v2.2.4-2017nov16. The LogoGenerator bug for PDF output has been fixed. The obsolete GIF output has been removed to avoid a dependency on the convert program from ImageMagick. The default PNG format is the choice for use with HTMLSummary-generated webpage. The LogoGenerator documentation has been also revised.

Some examples:

Code: Ruby
  1. # By default, the output is in PNG format
  2. LogoGenerator -file=$REDUCE_SUITE/data/formats/psam_ex.dat -logo=sample.png
  3. # Using the -format=pdf option for PDF output
  4. LogoGenerator -file=$REDUCE_SUITE/data/formats/psam_ex.dat -logo=sample.pdf -format=pdf
  5. # Output in the raw EPS format with -format=eps
  6. LogoGenerator -file=$REDUCE_SUITE/data/formats/psam_ex.dat -logo=sample.eps -format=eps

The LogoGenerator utility in the REDUCE Suite is a general purpose, robust logo generator of DNA or RNA base sequences. It creates a logo in the vector EPS format, which can be easily converted to other vector or raster image format using numerous third-party tools. Internally, LogoGenerator takes advantage of the widely available gs program (Ghostscript).

It is worth noting that on Mac OS X, the preview program can directly read LogoGenerator-created EPS file and convert it to PDF format. On Linux and Windows, the situation should be similar.

Xiang-Jun

24
General Discussion / Re: Error when generating logos in PDF format
« on: November 15, 2017, 07:04:41 pm »
Hi Harmen,

Thanks for your quick feedback.

I'll update the software code with 'gif' output removed, but keep the PDF option. A new release will be made available on the download page late tonight.

Best regards,

Xiang-Jun


25
General Discussion / Re: Error when generating logos in PDF format
« on: November 15, 2017, 06:07:43 pm »
Hi Harmen,

Thanks for posting on the Forum!

Yes, I can reproduce the error message with regard to generating the logos in PDF format. It is indeed due to the Ghostscript "-dTextAlphaBits=4" option you reported. I am using Ghostscript 9.21.

I remember taking the "-dTextAlphaBits=4" option from reading on the docs/examples somewhere. Now that we know the problem, we have the following options to go:

  • Simply remove the "-dTextAlphaBits=4" option from the system call.
  • Or we can remove the support of the PDF output format (from the documentation).

While we are here, I'd also want to remove the largely out-of-date GIF output format. By doing so, we also get rid of the dependency on the convert from ImageMagick.

What's your take? Please let me know, and I will update and code for a new release late tonight (or tomorrow).

Xiang-Jun

26
Documentation / Re: Set up the REDUCE Suite
« on: June 19, 2017, 04:39:58 pm »
Hi Rahul,

Thanks for your feedback. Step #5 should work as is if step #4 has been performed as advertised, which adds the bin/ directory to PATH. I've slightly refined the instruction for step #4 to make it clearer.

Executing 'bin/MatrixREDUCE -h' assumes one is at the $REDUCE_SUITE root directory.

Xiang-Jun

27
General Discussion / Re: Affinity score calculation
« on: January 05, 2017, 01:30:05 pm »
Hi,

Thanks for using the REDUCE Suite and for posting your question(s) on the Forum.

The concept of affinity in the REDUCE Suite is simple, but technical. As is often the case, the idea can be best illustrated with a concrete example.

Let's suppose we have a PSAM (sample-psam.xml) as shown below:

Code: [Select]
<matrix_reduce>

<directionality>forward</directionality>
<psam_length>6</psam_length>

<psam>
# A            C            G            T
# +============+============+============+=======
  1            0.25         0.1          0.1   #1
  0.1          0.5          0.2          1     #2
  0.1          1            0.1          0.1   #3
  1            0.1          0.6          0.1   #4
  0.2          0.6          1            0.1   #5
  0.1          0.1          1            0.3   #6
</psam>
</matrix_reduce>

And a short base sequence (sample-seq.txt) as below:

Code: [Select]
>sample
GTCATGGT

Since the PSAM has a length of 6, and the single sequence has 8 bases, there are three sliding windows, as detailed below:

Code: [Select]
w1: GTCATG      --- affinity of w1: 0.1 * 1 * 1 * 0.1 * 1 * 1 = 0.01
w2:   TCATGG    --- affinity of w2: 0.1 * 0.5 * 0.1 * 0.1 * 1 * 1 = 0.0005
w3:     CATGGT  --- affinity of w3: 0.2 * 0.1 * 0.1 * 0.6 * 1 * 0.3 = 0.00045
---- sum of affinity = 0.01095

If you run:
Code: [Select]
AffinityProfile -seq=sample-seq.txt -psam=sample-psam.xml
you will find the following content in the default output file seq_psam.dat:
Code: [Select]
        sample-psam.xml
sample  0.01095

There are quite a few variations for the calculation of affinity in AffinityProfile, but the above example covers the essence. Since the REDUCE Suite is open source, you can and are encouraged to dive into the details.

Hope this helps,

Xiang-Jun


28
General Discussion / Re: What do the bases mean in PSAM?
« on: December 01, 2016, 10:56:56 am »
Dear Pan Shen,

Thanks for using the REDUCE Suite and for asking your questions on the Forum.

The W in the converted PSAM notation means A or T (Weak, since the A-T Watson-Crick pair has two H-bonds, compared to three in a G-C pair). Not surprising, S (for Strong) represents G or C.

More details on "Nucleic acid notation" can be found in the Wikipedia, among many other online resources.

Hope this helps.

Xiang-Jun

29
Documentation / Other utility programs
« on: September 29, 2016, 01:27:04 pm »
The REDUCE Suite distribution also includes the following auxiliary programs. Simple type the corresponding program name with -h (e.g., Convert2PSAM -h) should provide sufficient information to get one started.

Convert2PSAM
As its name suggests, Convert2PSAM is a utility program that converts other commonly used motif (pattern) representations in nucleic acid sequences to PSAM, which is unique to the REDUCE Suite. It can also be used to standardize the various formats to a simplified PWM format for easy communication.

Topo2Dictfile
The default topological pattern mechanism can be used to specify sequence motifs in a compact, convenient, and flexible way. However, it defines the motifs implicitly, has length limit (15 non-gap positions), and does not take into consideration of the IUPAC degenerate symbols. As an example, X6 stands for exactly 4^6 = 4096 combinations, from AAAAAA, AAAAAC, ... TTTTTT. Sometimes, we may need more control by specifying the motifs explicitly in a dictionary file, with arbitrary length and IUPAC symbols. This can be facilitated by Topo2Dictfile by first generating a motif dictionary accordingly to user-specified topological patterns, and then editing it as needed, e.g., deleting some motifs, adding more, or introducing IUPAC degeneracy symbols etc.

ProcessFASTA
ProcessFASTA is a simple utility program to process a sequence file in FASTA format, e.g., to select a list of sequences based on ids, convert to reverse complementary, combine id and sequence into one-line etc. While such functionalities are surely available in various heavy-duty toolboxes/environments (BioPerl, EMBOSS, BioConductor etc.), none fits ours needs perfectly. We have thus developed this simple utility program mainly for our own convenience.

ProcessTdat
This is simple utility program to process a tab-delimited text file, e.g., to extract a subset, perform log transformation, and sort entries by id order etc. It is created following the same idea as for ProcessFASTA.

ExtractWindows
A simple utility program to extract sequence fragments from a sequence file, probably of a chromosome.

psamdir2list
A Perl utility program to generate a list of PSAM in a given directory. The resultant list can be fed into AffinityProfile or Transfactivity.

30
Documentation / Transfactivity
« on: September 29, 2016, 01:04:45 pm »
Transfactivity is a utility program that performs multiple-linear regressions of measurements (gene expression or binding data) against affinities. As with AffinityProfile, the affinities can be deduced either from a list of PSAMs or IUPAC motifs, or a single PSAM (-psam=one_PSAM_file) or an IUPAC motif (-motif=one_IUPAC_motif) specified directly on the command-line. The PSAMs can be from a MatrixREDUCE or MotifREDUCE run, or a collection of pseudo-PSAMs from literature (as in the $REDUCE_SUITE/data/PSAMs/ directory).

Transfactivity [options] -sequence=seqfile -measurement=measfile \
                         -psam=one_PSAM_file | -psam_list=list_of_PSAMs |
                         -motif=one_IUPAC_motif | -motif_list=list_of_motifs

  Required parameters:
    -sequence=seqfile      --- sequence file in FASTA format
    -measurement=measfile  --- measurement data file in tab-delimited format
    -psam=one_PSAM_file    --- file name of one PSAM
    -psam_list=list_of_PSAMs --- file name containing a list of PSAMs
    -motif=IUPAC_motif     --- one IUPAC motif
    -motif_list=list_of_motifs --- file name containing a list of IUPAC motifs

  Optional parameters:
    [-damid]               --- short-hand form for -motif=GATC
    [-output=dir_name]     --- path to the output directory (./)
    [-copy]                --- copy CSS, JavsScript and image files to the above
                               output directory to make the HTML self-contained
    [-univariate]          --- switch to run univariate fit only
    [-acgt]                ---  i.e., -motif_list=$REDUCE_SUITE/data/acgt.dat
    [-resid_file=file_name] --- name of residuals
    [-strand=integer]      ---  1 |+1 |F | L for leading strand;
                                2 |+2 |B     for both strands;
                               -1 | R |C     for reverse complementary;
                                   with -motif, default to leading strand
                                   with -psam, default to PSAM setting
    [-runlog=[stderr|stdout|file]]
                           --- direct running diagnostics message to stderr,
                               stdout or a specific file (stderr)
    [-help]                --- print out this help message

Usage:
    Transfactivity \
       -measurement=$REDUCE_SUITE/data/mRNA_expression/Spellman1998AlphaTimeCourse.tsv \
       -sequence=$REDUCE_SUITE/data/sequence/YeastUpstream.fasta \
       -psam_list=psams.list

    Transfactivity \
       -measurement=$REDUCE_SUITE/data/mRNA_expression/Spellman1998AlphaTimeCourse.tsv \
       -sequence=$REDUCE_SUITE/data/sequence/YeastUpstream.fasta \
       -motif_list=motifs.list


Note:

Given a directory that contains all the PSAMs one is interested in, the PSAM-list file can be conveniently generated with the Perl script "psamdir2list". This trick applies to Transfactivity here as well as to AffinityProfile.

For example, the PSAM list in $REDUCE_SUITE/examples/Transfactivity/MacIsaac.list was generated as:

Code: PHP
  1. # Within directory $REDUCE_Suite/examples/Transfactivity
  2. psamdir2list ../../data/PSAMs/MacIsaac MacIsaac.list

As another example, the Jaspar PSAM list can be generated as:

Code: PHP
  1. psamdir2list $REDUCE_SUITE/data/PSAMs/Jaspar jaspar_psam.lst

Pages: 1 [2] 3
Created and maintained by Dr. Xiang-Jun Lu [律祥俊]. See also http://forum.x3dna.org and http://x3dna.org