Author Topic: OptimizePSAM  (Read 147028 times)

xiangjun

  • Administrator
  • with posts
  • *****
  • Posts: 42
    • View Profile
OptimizePSAM
« on: September 29, 2016, 12:41:18 pm »
This program performs a single point optimization of either an initial (pseudo-) PSAM or a seed motif against the measurement file and sequence. Internally, it uses exactly the same Levenberg-Marquardt non-linear least squares fitting algorithm as in MatrixREDUCE.

OptimizePSAM [options] -sequence=seqfile -measurement=measfile \
                       -psam=PSAM_file | -motif=IUPAC_Motif

  Required parameters:
    -sequence=file_name    --- name of a FASTA sequence file
    -meas=measfile         --- measurement (expression/binding) file in tab-delimited format
    -psam=PSAM_file        --- PSAM file to be optimized
    -motif=seed_motif      --- Seed IUPAC motif sequence to be optimized

  Optional parameters:
    [-output=dir_name]     --- path to the output directory (./)
    [-p_value=float]       --- threshold to decide if optimized PSAM is significant (0.001)
    [-filename=file]       --- name of the optimized PSAM
    [-strand=integer]      ---  1 |+1 |F | L for leading strand (1);
                                2 |+2 |B     for both strands;
                               -1 | R |C     for reverse complementary;
                                0 | A |D     auto-detection (check 1 and 2)
    [-runlog=[stderr|stdout|file]]
                           --- direct running diagnostics message to stderr,
                               stdout or a file (stderr)
    [-help]                --- print out this help message

  Usage:
    OptimizePSAM \
       -measurement=$REDUCE_SUITE/data/mRNA_expression/Spellman1998AlphaTimeCourse.tsv \
       -sequence=$REDUCE_SUITE/data/sequence/YeastUpstream.fasta \
       -motif=ACGCGT -file=ACGCGT.psam


Notes:

  • In PSAM format, the initial seed motif ACGCGT is expressed as follows:
    # A            C            G            T             # no. opt
    # +============+============+============+============ # ==+===+==
      1            0            0            0             #   1   A
      0            1            0            0             #   2   C
      0            0            1            0             #   3   G
      0            1            0            0             #   4   C
      0            0            1            0             #   5   G
      0            0            0            1             #   6   T

  • The optimized PSAM in file ACGCGT.psam is as follows. Note specifically the changes of Ws from 1s and 0s of initial sequence motif (above) to some fractions with a maximum of 1 in each position in the optimized PSAM.
    # A            C            G            T             # no. opt
    # +============+============+============+============ # ==+===+==
      1            0.143386     0.156974     0.332267      #   1   A
      2.38995e-06  1            0.133257     6.623e-18     #   2   C
      0.203947     0.0136109    1            6.43632e-11   #   3   G
      3.39323e-14  1            4.38946e-14  3.19148e-17   #   4   C
      0.0655988    0.122631     1            1.13119e-13   #   5   G
      0.422826     0.221149     0.182984     1             #   6   T

  • As shown in the following diagnostic message from running OptimizePSAM, this optimization step increases the fitted R2 from 0.0414328 to 0.0552883, and the PSAM is significant.
    Best seed experiment:
       number of tested candidate experiments: 18
       intercept: coef=-0.12248   t-value=-18.4713   p-value=5.77026e-74
       slope:     coef=+0.363975   t-value=+15.4353   p-value=1.18198e-52
       r2=0.0414328   SSY=1323.85   SSE=1269   SSR=54.8506
       matches[matched-ids/total-ids]: 348[307/5514]   experiment: alpha_factor_release_sample016 [4]
           and of sequence on forward strand
    Optimizing:
         20 (1250.65): converged with gradient: 0.0349271 <= 0.05
    PSAM linear fit statistics:
       intercept: coef=-0.186363   t-value=-23.1987   p-value=1.12291e-113
       slope:     coef=+0.325095   t-value=+17.9606   p-value=3.83258e-70
       r2=0.0552883   SSY=1323.85   SSE=1250.65   SSR=73.1932
    Checking PSAM significance:
       |r|=0.235135   r0=0.0688157   sigma=0.00888813   t_value=18.7125
       E-value=4.19638e-76
       This PSAM is significant (E-value smaller than specific cutoff of 0.001)


 

Created and maintained by Dr. Xiang-Jun Lu [律祥俊]. See also http://forum.x3dna.org and http://x3dna.org