Author Topic: OptimizePSAM (Read 591011 times)

xiangjun · « **on:** September 29, 2016, 12:41:18 pm »

This program performs a single point optimization of either an initial (pseudo-) PSAM or a seed motif against the measurement file and sequence. Internally, it uses exactly the same Levenberg-Marquardt non-linear least squares fitting algorithm as in MatrixREDUCE.

OptimizePSAM [options] -sequence=seqfile -measurement=measfile \
                       -psam=PSAM_file | -motif=IUPAC_Motif

  Required parameters:
    -sequence=file_name    --- name of a FASTA sequence file
    -meas=measfile         --- measurement (expression/binding) file in tab-delimited format
    -psam=PSAM_file        --- PSAM file to be optimized
    -motif=seed_motif      --- Seed IUPAC motif sequence to be optimized

  Optional parameters:
    [-output=dir_name]     --- path to the output directory (./)
    [-p_value=float]       --- threshold to decide if optimized PSAM is significant (0.001)
    [-filename=file]       --- name of the optimized PSAM
    [-strand=integer]      ---  1 |+1 |F | L for leading strand (1);
                                2 |+2 |B     for both strands;
                               -1 | R |C     for reverse complementary;
                                0 | A |D     auto-detection (check 1 and 2)
    [-runlog=[stderr|stdout|file]]
                           --- direct running diagnostics message to stderr,
                               stdout or a file (stderr)
    [-help]                --- print out this help message

  Usage:
    OptimizePSAM \
       -measurement=$REDUCE_SUITE/data/mRNA_expression/Spellman1998AlphaTimeCourse.tsv \
       -sequence=$REDUCE_SUITE/data/sequence/YeastUpstream.fasta \
       -motif=ACGCGT -file=ACGCGT.psam

Notes:

In PSAM format, the initial seed motif ACGCGT is expressed as follows:

# A            C            G            T             # no. opt
# +============+============+============+============ # ==+===+==
  1            0            0            0             #   1   A
  0            1            0            0             #   2   C
  0            0            1            0             #   3   G
  0            1            0            0             #   4   C
  0            0            1            0             #   5   G
  0            0            0            1             #   6   T

The optimized PSAM in file ACGCGT.psam is as follows. Note specifically the changes of Ws from 1s and 0s of initial sequence motif (above) to some fractions with a maximum of 1 in each position in the optimized PSAM.

# A            C            G            T             # no. opt
# +============+============+============+============ # ==+===+==
  1            0.143386     0.156974     0.332267      #   1   A
  2.38995e-06  1            0.133257     6.623e-18     #   2   C
  0.203947     0.0136109    1            6.43632e-11   #   3   G
  3.39323e-14  1            4.38946e-14  3.19148e-17   #   4   C
  0.0655988    0.122631     1            1.13119e-13   #   5   G
  0.422826     0.221149     0.182984     1             #   6   T

As shown in the following diagnostic message from running OptimizePSAM, this optimization step increases the fitted R2 from 0.0414328 to 0.0552883, and the PSAM is significant.

Best seed experiment:
   number of tested candidate experiments: 18
   intercept: coef=-0.12248   t-value=-18.4713   p-value=5.77026e-74
   slope:     coef=+0.363975   t-value=+15.4353   p-value=1.18198e-52
   r2=0.0414328   SSY=1323.85   SSE=1269   SSR=54.8506
   matches[matched-ids/total-ids]: 348[307/5514]   experiment: alpha_factor_release_sample016 [4]
       and of sequence on forward strand
Optimizing:
     20 (1250.65): converged with gradient: 0.0349271 <= 0.05
PSAM linear fit statistics:
   intercept: coef=-0.186363   t-value=-23.1987   p-value=1.12291e-113
   slope:     coef=+0.325095   t-value=+17.9606   p-value=3.83258e-70
   r2=0.0552883   SSY=1323.85   SSE=1250.65   SSR=73.1932
Checking PSAM significance:
   |r|=0.235135   r0=0.0688157   sigma=0.00888813   t_value=18.7125
   E-value=4.19638e-76
   This PSAM is significant (E-value smaller than specific cutoff of 0.001)

REDUCE Suite

News:

Author Topic: OptimizePSAM (Read 591011 times)

xiangjun

OptimizePSAM