Recent Posts

Pages: 1 ... 4 5 [6] 7 8
General Discussion / Re: Affinity score calculation
« Last post by JonathanCan on September 11, 2017, 02:29:30 am »

Can you explain how the of AffinityProbe is calculated (in what range)?


Thanks for explaining Xiang-Jun!
Documentation / Re: Set up the REDUCE Suite
« Last post by xiangjun on June 19, 2017, 04:39:58 pm »
Hi Rahul,

Thanks for your feedback. Step #5 should work as is if step #4 has been performed as advertised, which adds the bin/ directory to PATH. I've slightly refined the instruction for step #4 to make it clearer.

Executing 'bin/MatrixREDUCE -h' assumes one is at the $REDUCE_SUITE root directory.

Documentation / Re: Set up the REDUCE Suite
« Last post by rrdutta on June 19, 2017, 04:33:00 pm »
Hello all,

On step 5 of REDUCE_Suite_setup, try executing 'bin/MatrixREDUCE -h' in the REDUCE-Suite-v2.2 directory if entering 'MatrixREDUCE' into the command line does not work after you have switched into the 'bin' directory.


General Discussion / Re: Affinity score calculation
« Last post by xiangjun on January 05, 2017, 01:30:05 pm »

Thanks for using the REDUCE Suite and for posting your question(s) on the Forum.

The concept of affinity in the REDUCE Suite is simple, but technical. As is often the case, the idea can be best illustrated with a concrete example.

Let's suppose we have a PSAM (sample-psam.xml) as shown below:

Code: [Select]


# A            C            G            T
# +============+============+============+=======
  1            0.25         0.1          0.1   #1
  0.1          0.5          0.2          1     #2
  0.1          1            0.1          0.1   #3
  1            0.1          0.6          0.1   #4
  0.2          0.6          1            0.1   #5
  0.1          0.1          1            0.3   #6

And a short base sequence (sample-seq.txt) as below:

Code: [Select]

Since the PSAM has a length of 6, and the single sequence has 8 bases, there are three sliding windows, as detailed below:

Code: [Select]
w1: GTCATG      --- affinity of w1: 0.1 * 1 * 1 * 0.1 * 1 * 1 = 0.01
w2:   TCATGG    --- affinity of w2: 0.1 * 0.5 * 0.1 * 0.1 * 1 * 1 = 0.0005
w3:     CATGGT  --- affinity of w3: 0.2 * 0.1 * 0.1 * 0.6 * 1 * 0.3 = 0.00045
---- sum of affinity = 0.01095

If you run:
Code: [Select]
AffinityProfile -seq=sample-seq.txt -psam=sample-psam.xml
you will find the following content in the default output file seq_psam.dat:
Code: [Select]
sample  0.01095

There are quite a few variations for the calculation of affinity in AffinityProfile, but the above example covers the essence. Since the REDUCE Suite is open source, you can and are encouraged to dive into the details.

Hope this helps,


General Discussion / Affinity score calculation
« Last post by kubranarci on January 05, 2017, 02:55:20 am »

Can you explain how the the result of AffinityProbe is calculated (in what range)?

General Discussion / Re: What do the bases mean in PSAM?
« Last post by melodypluto on December 05, 2016, 09:33:10 pm »
Thank you very much! ;D
General Discussion / Re: What do the bases mean in PSAM?
« Last post by xiangjun on December 01, 2016, 10:56:56 am »
Dear Pan Shen,

Thanks for using the REDUCE Suite and for asking your questions on the Forum.

The W in the converted PSAM notation means A or T (Weak, since the A-T Watson-Crick pair has two H-bonds, compared to three in a G-C pair). Not surprising, S (for Strong) represents G or C.

More details on "Nucleic acid notation" can be found in the Wikipedia, among many other online resources.

Hope this helps.

General Discussion / What do the bases mean in PSAM?
« Last post by melodypluto on December 01, 2016, 01:20:13 am »
Dear administrator,
I used the Convert2PSAM in order to convert the PWM into PSAM.
CODE: Convert2PSAM  -source=PW -inpfile=/mnt/tools/REDUCE-Suite-v2.2/data/formats/pwm_ex.dat -psamfile=psam=pw2psam.xml
However, there are some bases named W, instead of anyone among ACGT.
Thus I wonder the meaning of W in PSAM. Is there other type of base in PSAM? If so, what do them mean?
Could it be possible for you to show me the details about the bases in PSAM?

Thank you,
Pan Shen
Documentation / Other utility programs
« Last post by xiangjun on September 29, 2016, 01:27:04 pm »
The REDUCE Suite distribution also includes the following auxiliary programs. Simple type the corresponding program name with -h (e.g., Convert2PSAM -h) should provide sufficient information to get one started.

As its name suggests, Convert2PSAM is a utility program that converts other commonly used motif (pattern) representations in nucleic acid sequences to PSAM, which is unique to the REDUCE Suite. It can also be used to standardize the various formats to a simplified PWM format for easy communication.

The default topological pattern mechanism can be used to specify sequence motifs in a compact, convenient, and flexible way. However, it defines the motifs implicitly, has length limit (15 non-gap positions), and does not take into consideration of the IUPAC degenerate symbols. As an example, X6 stands for exactly 4^6 = 4096 combinations, from AAAAAA, AAAAAC, ... TTTTTT. Sometimes, we may need more control by specifying the motifs explicitly in a dictionary file, with arbitrary length and IUPAC symbols. This can be facilitated by Topo2Dictfile by first generating a motif dictionary accordingly to user-specified topological patterns, and then editing it as needed, e.g., deleting some motifs, adding more, or introducing IUPAC degeneracy symbols etc.

ProcessFASTA is a simple utility program to process a sequence file in FASTA format, e.g., to select a list of sequences based on ids, convert to reverse complementary, combine id and sequence into one-line etc. While such functionalities are surely available in various heavy-duty toolboxes/environments (BioPerl, EMBOSS, BioConductor etc.), none fits ours needs perfectly. We have thus developed this simple utility program mainly for our own convenience.

This is simple utility program to process a tab-delimited text file, e.g., to extract a subset, perform log transformation, and sort entries by id order etc. It is created following the same idea as for ProcessFASTA.

A simple utility program to extract sequence fragments from a sequence file, probably of a chromosome.

A Perl utility program to generate a list of PSAM in a given directory. The resultant list can be fed into AffinityProfile or Transfactivity.
Documentation / Transfactivity
« Last post by xiangjun on September 29, 2016, 01:04:45 pm »
Transfactivity is a utility program that performs multiple-linear regressions of measurements (gene expression or binding data) against affinities. As with AffinityProfile, the affinities can be deduced either from a list of PSAMs or IUPAC motifs, or a single PSAM (-psam=one_PSAM_file) or an IUPAC motif (-motif=one_IUPAC_motif) specified directly on the command-line. The PSAMs can be from a MatrixREDUCE or MotifREDUCE run, or a collection of pseudo-PSAMs from literature (as in the $REDUCE_SUITE/data/PSAMs/ directory).

Transfactivity [options] -sequence=seqfile -measurement=measfile \
                         -psam=one_PSAM_file | -psam_list=list_of_PSAMs |
                         -motif=one_IUPAC_motif | -motif_list=list_of_motifs

  Required parameters:
    -sequence=seqfile      --- sequence file in FASTA format
    -measurement=measfile  --- measurement data file in tab-delimited format
    -psam=one_PSAM_file    --- file name of one PSAM
    -psam_list=list_of_PSAMs --- file name containing a list of PSAMs
    -motif=IUPAC_motif     --- one IUPAC motif
    -motif_list=list_of_motifs --- file name containing a list of IUPAC motifs

  Optional parameters:
    [-damid]               --- short-hand form for -motif=GATC
    [-output=dir_name]     --- path to the output directory (./)
    [-copy]                --- copy CSS, JavsScript and image files to the above
                               output directory to make the HTML self-contained
    [-univariate]          --- switch to run univariate fit only
    [-acgt]                ---  i.e., -motif_list=$REDUCE_SUITE/data/acgt.dat
    [-resid_file=file_name] --- name of residuals
    [-strand=integer]      ---  1 |+1 |F | L for leading strand;
                                2 |+2 |B     for both strands;
                               -1 | R |C     for reverse complementary;
                                   with -motif, default to leading strand
                                   with -psam, default to PSAM setting
                           --- direct running diagnostics message to stderr,
                               stdout or a specific file (stderr)
    [-help]                --- print out this help message

    Transfactivity \
       -measurement=$REDUCE_SUITE/data/mRNA_expression/Spellman1998AlphaTimeCourse.tsv \
       -sequence=$REDUCE_SUITE/data/sequence/YeastUpstream.fasta \

    Transfactivity \
       -measurement=$REDUCE_SUITE/data/mRNA_expression/Spellman1998AlphaTimeCourse.tsv \
       -sequence=$REDUCE_SUITE/data/sequence/YeastUpstream.fasta \


Given a directory that contains all the PSAMs one is interested in, the PSAM-list file can be conveniently generated with the Perl script "psamdir2list". This trick applies to Transfactivity here as well as to AffinityProfile.

For example, the PSAM list in $REDUCE_SUITE/examples/Transfactivity/MacIsaac.list was generated as:

Code: PHP
  1. # Within directory $REDUCE_Suite/examples/Transfactivity
  2. psamdir2list ../../data/PSAMs/MacIsaac MacIsaac.list

As another example, the Jaspar PSAM list can be generated as:

Code: PHP
  1. psamdir2list $REDUCE_SUITE/data/PSAMs/Jaspar jaspar_psam.lst
Pages: 1 ... 4 5 [6] 7 8
Created and maintained by Dr. Xiang-Jun Lu [律祥俊]. See also and