Author Topic: Summary of the REDUCE Suite v2.2 programs  (Read 96158 times)

xiangjun

  • Administrator
  • with posts
  • *****
  • Posts: 42
    • View Profile
Summary of the REDUCE Suite v2.2 programs
« on: July 29, 2016, 04:08:51 pm »
The REDUCE Suite v2.2 contains a total of 12 programs, as outlined below. The software is distributed with full source code in ANSI C. For the benefit of users, precompiled binaries for the most common Linux, Mac OS X, and Windows operating systems are also provided. Command-line help message is available for each program by specifying the -h option (e.g., LogoGenerator -h), which also include sample usages to get you started. If you have any questions in using the Suite, please do not hesitate to post them on the Forum.

Motif discovery and model building:

  • MotifREDUCE — An algorithm that builds a motif-based multivariate linear model. REDUCE is an acronym that stands for Regulatory Element Detection Using Correlation with Expression. Based on a simple model for transcriptional regulation by independently acting transcription factors (Bussemaker et al, 2001), REDUCE makes it possible to discover regulatory motifs based on a single microarray experiment. MotifREDUCE is a robust and efficient reimplementation of the original REDUCE algorithm. Required inputs are (i) a genome-wide set of measurements (mRNA expression log-ratios or ChIP fold-enrichments) and (ii) a nucleotide sequence associated with each measurement (e.g., upstream promoter sequence). Output are (i) a set of cis-regulatory oligonucleotide motifs, and (ii) the corresponding regression coefficients.
  • MatrixREDUCE — A more sophisticated algorithm that builds a multivariate linear model based on weight matrices (Foat et al., 2005, 2006). Required inputs are the same as far MotifREDUCE: (i) a genome-wide set of measurements (mRNA expression log-ratios or ChIP fold-enrichments) and (ii) a nucleotide sequence associated with each measurement (e.g., upstream promoter sequence). Outputs include (i) the binding specificity, in the form of a position-specific affinity matrix (PSAM), and (ii) the condition-specific concentration/activity for each of a set of trans-acting factors (TF).
  • OptimizePSAM — Fits PSAM parameters and coefficients for a single-TF model. MatrixREDUCE makes iterative calls to this program to build a multivariate model.
  • Transfactivity — Fit a multivariate linear model to one or more genome-wide sets of measurements. In contrast to MotifREDUCE/MatrixREDUCE, motifs/PSAMs are not inferred from the data, as in, but instead are provided as inputs. This is useful for inferring changes in the (hidden) regulatory activity of one or more TFs of known binding specificity. Transfactivity is a contraction of "trans-factor" and "activity".

Visualization of results:
  • HTMLSummary — A utility for visualizing the result of a MatrixREDUCE or MotifREDUCE run in HTML format.
  • LogoGenerator — A versatile and robust command-line tool that generates logo images in a variety of styles (raw data, frequency, conventional bit information, or affinity logo in ??G). The input can be a PSAM or a multiple sequence alignment file in either FASTA or flat format. The output logo image is in EPS format and is converted to PNG format by default for display in a web page (as from HTMLSummary), using the widely and freely available tool GhostScript tool gs. Other supported image formats include PDF, JPEG, and GIF (further utilizing the convert utility program from ImageMagick).

Affinity-based sequence analysis:
  • AffinityProfile — Convert one or more DNA/RNA sequences to single-nucleotide resolution affinity profiles or a total regional affinity. A set of motifs and/or PSAMs is required as input.

Miscellaneous utilities:
  • Convert2PSAM — Convert commonly used motif (pattern) representations of nucleic acid sequences to PSAM format, which is unique to the REDUCE Suite. It also serves to standardize the various formats to a simplified PWM format for easy communication.
  • Topo2Dictfile — Generate a motif dictionary file according to user-specified topological patterns, allowing for easy user manipulation (deleting/adding specific motifs, introducing IUPAC degeneracy symbols, etc).
  • ProcessFASTA — Process a sequence file in FASTA format to select a list of sequences based on their IDs, convert to reverse complement, combine ID and sequence into a single line, etc.
  • ProcessTdat — Manipulate tab-delimited measurement files (extract a subset of experiments, perform log-transformation, sort entries by ID, etc).
  • ExtractWindows — Extract subsequences from larger sequences (e.g., a chromosome), based on a set of start/end coordinates.

 

Created and maintained by Dr. Xiang-Jun Lu [律祥俊]. See also http://forum.x3dna.org and http://x3dna.org