61
General Discussion / Re: What do the bases mean in PSAM?
« Last post by melodypluto on December 05, 2016, 09:33:10 pm »Thank you very much!
Transfactivity [options] -sequence=seqfile -measurement=measfile \
-psam=one_PSAM_file | -psam_list=list_of_PSAMs |
-motif=one_IUPAC_motif | -motif_list=list_of_motifs
Required parameters:
-sequence=seqfile --- sequence file in FASTA format
-measurement=measfile --- measurement data file in tab-delimited format
-psam=one_PSAM_file --- file name of one PSAM
-psam_list=list_of_PSAMs --- file name containing a list of PSAMs
-motif=IUPAC_motif --- one IUPAC motif
-motif_list=list_of_motifs --- file name containing a list of IUPAC motifs
Optional parameters:
[-damid] --- short-hand form for -motif=GATC
[-output=dir_name] --- path to the output directory (./)
[-copy] --- copy CSS, JavsScript and image files to the above
output directory to make the HTML self-contained
[-univariate] --- switch to run univariate fit only
[-acgt] --- i.e., -motif_list=$REDUCE_SUITE/data/acgt.dat
[-resid_file=file_name] --- name of residuals
[-strand=integer] --- 1 |+1 |F | L for leading strand;
2 |+2 |B for both strands;
-1 | R |C for reverse complementary;
with -motif, default to leading strand
with -psam, default to PSAM setting
[-runlog=[stderr|stdout|file]]
--- direct running diagnostics message to stderr,
stdout or a specific file (stderr)
[-help] --- print out this help message
Usage:
Transfactivity \
-measurement=$REDUCE_SUITE/data/mRNA_expression/Spellman1998AlphaTimeCourse.tsv \
-sequence=$REDUCE_SUITE/data/sequence/YeastUpstream.fasta \
-psam_list=psams.list
Transfactivity \
-measurement=$REDUCE_SUITE/data/mRNA_expression/Spellman1998AlphaTimeCourse.tsv \
-sequence=$REDUCE_SUITE/data/sequence/YeastUpstream.fasta \
-motif_list=motifs.list
AffinityProfile [options] -sequence=seqfile \
-psam=one_PSAM_file | -psam_list=list_of_PSAMs |
-motif=one_IUPAC_motif | -motif_list=list_of_motifs
Required parameters:
-sequence=file_name --- name of sequence file in FASTA format
-psam=one_PSAM_file --- file name of one PSAM
-psam_list=list_of_PSAMs --- file name containing a list of PSAMs
-motif=IUPAC_motif --- one IUPAC motif
-motif_list=list_of_motifs --- file name containing a list of IUPAC motifs
Optional parameters:
[-threshold=float] --- threshold of affinity for output (0.0)
[-output=dir_name] --- path to the output directory (./)
[-prefix=string] --- prepended to output profile name (aff_)
[-affsum=string] --- file of total affinity per sequence (seq_psam.dat)
[-detail] --- also output detailed affinity along each sequence
[-ids=string] --- a ',' or ';' delimited list of IDs
[-column] --- used with -ids, set profile column-wise for each id
[-normalize] --- linear re-scale (per PSAM) the maximum profile to 1.0
[-strand=integer] --- 1 |+1 |F | L for leading strand;
2 |+2 |B for both strands;
-1 | R |C for reverse complementary;
with -motif, default to leading strand
with -psam, default to PSAM setting
Usage:
(1) AffinityProfile -sequence=$REDUCE_SUITE/data/sequence/YeastUpstream.fasta \
-psam_list=psams.list
(2) AffinityProfile -sequence=$REDUCE_SUITE/data/sequence/YeastUpstream.fasta \
-motif=aaaccct
(3) AffinityProfile -sequence=$REDUCE_SUITE/data/sequence/YeastUpstream.fasta \
-motif=aaaga -ids='YAL001C;YAL002W' -column
OptimizePSAM [options] -sequence=seqfile -measurement=measfile \
-psam=PSAM_file | -motif=IUPAC_Motif
Required parameters:
-sequence=file_name --- name of a FASTA sequence file
-meas=measfile --- measurement (expression/binding) file in tab-delimited format
-psam=PSAM_file --- PSAM file to be optimized
-motif=seed_motif --- Seed IUPAC motif sequence to be optimized
Optional parameters:
[-output=dir_name] --- path to the output directory (./)
[-p_value=float] --- threshold to decide if optimized PSAM is significant (0.001)
[-filename=file] --- name of the optimized PSAM
[-strand=integer] --- 1 |+1 |F | L for leading strand (1);
2 |+2 |B for both strands;
-1 | R |C for reverse complementary;
0 | A |D auto-detection (check 1 and 2)
[-runlog=[stderr|stdout|file]]
--- direct running diagnostics message to stderr,
stdout or a file (stderr)
[-help] --- print out this help message
Usage:
OptimizePSAM \
-measurement=$REDUCE_SUITE/data/mRNA_expression/Spellman1998AlphaTimeCourse.tsv \
-sequence=$REDUCE_SUITE/data/sequence/YeastUpstream.fasta \
-motif=ACGCGT -file=ACGCGT.psam
# A C G T # no. opt
# +============+============+============+============ # ==+===+==
1 0 0 0 # 1 A
0 1 0 0 # 2 C
0 0 1 0 # 3 G
0 1 0 0 # 4 C
0 0 1 0 # 5 G
0 0 0 1 # 6 T
# A C G T # no. opt
# +============+============+============+============ # ==+===+==
1 0.143386 0.156974 0.332267 # 1 A
2.38995e-06 1 0.133257 6.623e-18 # 2 C
0.203947 0.0136109 1 6.43632e-11 # 3 G
3.39323e-14 1 4.38946e-14 3.19148e-17 # 4 C
0.0655988 0.122631 1 1.13119e-13 # 5 G
0.422826 0.221149 0.182984 1 # 6 T
Best seed experiment:
number of tested candidate experiments: 18
intercept: coef=-0.12248 t-value=-18.4713 p-value=5.77026e-74
slope: coef=+0.363975 t-value=+15.4353 p-value=1.18198e-52
r2=0.0414328 SSY=1323.85 SSE=1269 SSR=54.8506
matches[matched-ids/total-ids]: 348[307/5514] experiment: alpha_factor_release_sample016 [4]
and of sequence on forward strand
Optimizing:
20 (1250.65): converged with gradient: 0.0349271 <= 0.05
PSAM linear fit statistics:
intercept: coef=-0.186363 t-value=-23.1987 p-value=1.12291e-113
slope: coef=+0.325095 t-value=+17.9606 p-value=3.83258e-70
r2=0.0552883 SSY=1323.85 SSE=1250.65 SSR=73.1932
Checking PSAM significance:
|r|=0.235135 r0=0.0688157 sigma=0.00888813 t_value=18.7125
E-value=4.19638e-76
This PSAM is significant (E-value smaller than specific cutoff of 0.001)
HTMLSummary [options] [-file=HTMLFile]
Required parameters:
none
Optional parameters:
[-output=dir_name] --- path to an MatrixREDUCE/MotifREDUCE run directory,
used both as input and output for HTML summary (./)
[-copy] --- copy CSS, JavsScript and associated image files to
the output directory to make the HTML self-contained
[-rc] --- logo based on reverse complementary strand
[-width=ThumbnailImageWidthInPixel] (145)
[-height=ThumbnailImageHeightInPixel] (90)
[-psam_list=list_of_PSAMs] --- generate a summary LOGO page for the list
of PSAMs in the file
[-file=HTMLFile] --- HTML output file name (index.html)
Usage: (following a MatrixREDUCE run)
HTMLSummary
HTMLSummary -psam=psams.list -file=psams.html
MatrixREDUCE [options] -sequence=seqfile -measurement=measfile
Required parameters:
-sequence=seqfile --- sequence file in FASTA format
-meas=measfile --- measurement (expression/binding) file in tab-delimited format
Optional parameters:
[-topo_list=topofile] --- name of topology file (up_to_octamers)
[-topo=topology] --- single topology pattern, e.g., X3--X4
[-multifit] --- switch to seed/optimize using all experiments
[added based on code from Pilar -- thanks!]
[-dicfile=file] --- list of motifs to check against. IUPAC wild cards
allowed; no length limit
[-ntop=integer] --- number of top seed motifs to print out (10)
[-iupac_pos=integer] --- number of positions to check for IUPAC degeneracy (0)
[-iupac_sym=string] --- IUPAC symbols to check against ('KMRSWYBDHVN')
[-output=dir_name] --- path to the output directory (./)
[-p_value=float] --- threshold to stop looking for new PSAMs (0.001)
[-max_motif=integer] --- maximum # of PSAMs to search (20)
[-strand=integer] --- 1 |+1 |F | L for leading strand;
2 |+2 |B for both strands;
-1 | R |C for reverse complementary;
0 | A |D auto-detection (check 1 and 2)
[-runlog=[stderr|stdout|file]]
--- direct running diagnostics message to stderr,
stdout or a specific file (stderr)
[-help] --- print out this help message
Usages:
mkdir -p results
MatrixREDUCE \
-meas=$REDUCE_SUITE/data/mRNA_expression/Spellman1998AlphaTimeCourse.tsv \
-sequence=$REDUCE_SUITE/data/sequence/YeastUpstream.fasta \
-topo_list=$REDUCE_SUITE/data/topology/up_to_octamers -o=results
HTMLSummary -o=results
mkdir -p X6
MatrixREDUCE \
-meas=$REDUCE_SUITE/data/mRNA_expression/Spellman1998AlphaTimeCourse.tsv \
-sequence=$REDUCE_SUITE/data/sequence/YeastUpstream.fasta \
-topo=X6 -o=X6
HTMLSummary -c -o=X6
MotifREDUCE [options] -sequence=seqfile -measurement=measfile
Required parameters:
-sequence=seqfile --- sequence file in FASTA format
-meas=measfile --- measurement (expression/binding) file in tab-delimited format
Optional parameters:
[-topo_list=topofile] --- name of topology file (up_to_octamers)
[-topo=topology] --- single topology pattern, e.g., X3--X4
[-dicfile=file] --- list of motifs to check against. IUPAC wild cards
allowed; no length limit
[-ntop=integer] --- number of top seed motifs to print out (10)
[-iupac_pos=integer] --- number of positions to check for IUPAC degeneracy (0)
[-iupac_sym=string] --- IUPAC symbols to check against ('KMRSWYBDHVN')
[-output=dir_name] --- path to the output directory (./)
[-p_value=float] --- threshold to stop looking for new motifs (0.001)
[-max_motif=integer] --- maximum # of motifs to search (20)
[-strand=integer] --- 1 |+1 |F | L for leading strand;
2 |+2 |B for both strands;
-1 | R |C for reverse complementary;
0 | A |D auto-detection (check 1 and 2)
[-runlog=[stderr|stdout|file]]
--- direct running diagnostics message to stderr,
stdout or a specific file (stderr)
[-help] --- print out this help message
Usage:
mkdir -p results # use topology file (up_to_heptamers)
MotifREDUCE \
-meas=$REDUCE_SUITE/examples/MotifREDUCE/yeast_sample.csv \
-sequence=$REDUCE_SUITE/examples/MotifREDUCE/genome5pns600.fasta \
-topo_list=$REDUCE_SUITE/examples/MotifREDUCE/up_to_heptamers \
-o=results
HTMLSummary -c -o=results