This program performs a single point optimization of either an initial (pseudo-) PSAM or a seed motif against the measurement file and sequence. Internally, it uses exactly the same Levenberg-Marquardt non-linear least squares fitting algorithm as in
MatrixREDUCE.
OptimizePSAM [options] -sequence=seqfile -measurement=measfile \
-psam=PSAM_file | -motif=IUPAC_Motif
Required parameters:
-sequence=file_name --- name of a FASTA sequence file
-meas=measfile --- measurement (expression/binding) file in tab-delimited format
-psam=PSAM_file --- PSAM file to be optimized
-motif=seed_motif --- Seed IUPAC motif sequence to be optimized
Optional parameters:
[-output=dir_name] --- path to the output directory (./)
[-p_value=float] --- threshold to decide if optimized PSAM is significant (0.001)
[-filename=file] --- name of the optimized PSAM
[-strand=integer] --- 1 |+1 |F | L for leading strand (1);
2 |+2 |B for both strands;
-1 | R |C for reverse complementary;
0 | A |D auto-detection (check 1 and 2)
[-runlog=[stderr|stdout|file]]
--- direct running diagnostics message to stderr,
stdout or a file (stderr)
[-help] --- print out this help message
Usage:
OptimizePSAM \
-measurement=$REDUCE_SUITE/data/mRNA_expression/Spellman1998AlphaTimeCourse.tsv \
-sequence=$REDUCE_SUITE/data/sequence/YeastUpstream.fasta \
-motif=ACGCGT -file=ACGCGT.psam
Notes:
- In PSAM format, the initial seed motif ACGCGT is expressed as follows:
# A C G T # no. opt
# +============+============+============+============ # ==+===+==
1 0 0 0 # 1 A
0 1 0 0 # 2 C
0 0 1 0 # 3 G
0 1 0 0 # 4 C
0 0 1 0 # 5 G
0 0 0 1 # 6 T
- The optimized PSAM in file ACGCGT.psam is as follows. Note specifically the changes of Ws from 1s and 0s of initial sequence motif (above) to some fractions with a maximum of 1 in each position in the optimized PSAM.
# A C G T # no. opt
# +============+============+============+============ # ==+===+==
1 0.143386 0.156974 0.332267 # 1 A
2.38995e-06 1 0.133257 6.623e-18 # 2 C
0.203947 0.0136109 1 6.43632e-11 # 3 G
3.39323e-14 1 4.38946e-14 3.19148e-17 # 4 C
0.0655988 0.122631 1 1.13119e-13 # 5 G
0.422826 0.221149 0.182984 1 # 6 T
- As shown in the following diagnostic message from running OptimizePSAM, this optimization step increases the fitted R2 from 0.0414328 to 0.0552883, and the PSAM is significant.
Best seed experiment:
number of tested candidate experiments: 18
intercept: coef=-0.12248 t-value=-18.4713 p-value=5.77026e-74
slope: coef=+0.363975 t-value=+15.4353 p-value=1.18198e-52
r2=0.0414328 SSY=1323.85 SSE=1269 SSR=54.8506
matches[matched-ids/total-ids]: 348[307/5514] experiment: alpha_factor_release_sample016 [4]
and of sequence on forward strand
Optimizing:
20 (1250.65): converged with gradient: 0.0349271 <= 0.05
PSAM linear fit statistics:
intercept: coef=-0.186363 t-value=-23.1987 p-value=1.12291e-113
slope: coef=+0.325095 t-value=+17.9606 p-value=3.83258e-70
r2=0.0552883 SSY=1323.85 SSE=1250.65 SSR=73.1932
Checking PSAM significance:
|r|=0.235135 r0=0.0688157 sigma=0.00888813 t_value=18.7125
E-value=4.19638e-76
This PSAM is significant (E-value smaller than specific cutoff of 0.001)