General Category => Documentation => Topic started by: xiangjun on September 29, 2016, 01:27:04 pm

Title: Other utility programs
Post by: xiangjun on September 29, 2016, 01:27:04 pm
The REDUCE Suite distribution also includes the following auxiliary programs. Simple type the corresponding program name with -h (e.g., Convert2PSAM -h) should provide sufficient information to get one started.

As its name suggests, Convert2PSAM is a utility program that converts other commonly used motif (pattern) representations in nucleic acid sequences to PSAM, which is unique to the REDUCE Suite. It can also be used to standardize the various formats to a simplified PWM format for easy communication.

The default topological pattern mechanism can be used to specify sequence motifs in a compact, convenient, and flexible way. However, it defines the motifs implicitly, has length limit (15 non-gap positions), and does not take into consideration of the IUPAC degenerate symbols. As an example, X6 stands for exactly 4^6 = 4096 combinations, from AAAAAA, AAAAAC, ... TTTTTT. Sometimes, we may need more control by specifying the motifs explicitly in a dictionary file, with arbitrary length and IUPAC symbols. This can be facilitated by Topo2Dictfile by first generating a motif dictionary accordingly to user-specified topological patterns, and then editing it as needed, e.g., deleting some motifs, adding more, or introducing IUPAC degeneracy symbols etc.

ProcessFASTA is a simple utility program to process a sequence file in FASTA format, e.g., to select a list of sequences based on ids, convert to reverse complementary, combine id and sequence into one-line etc. While such functionalities are surely available in various heavy-duty toolboxes/environments (BioPerl, EMBOSS, BioConductor etc.), none fits ours needs perfectly. We have thus developed this simple utility program mainly for our own convenience.

This is simple utility program to process a tab-delimited text file, e.g., to extract a subset, perform log transformation, and sort entries by id order etc. It is created following the same idea as for ProcessFASTA.

A simple utility program to extract sequence fragments from a sequence file, probably of a chromosome.

A Perl utility program to generate a list of PSAM in a given directory. The resultant list can be fed into AffinityProfile ( or Transfactivity (
Title: Re: Other utility programs
Post by: Millson on July 18, 2018, 07:45:10 am
Hi Xiangjun, do you have more resources to share on ProcessFASTA? Cheers.
Title: Re: Other utility programs
Post by: xiangjun on July 18, 2018, 07:53:14 am

Type ProcessFASTA -h for more info. Check the source code for technical details.

Best regards,