Author Topic: Multicollinearity  (Read 95 times)

amathelier

  • Normal
  • *
  • Posts: 1
    • View Profile
Multicollinearity
« on: September 07, 2018, 07:13:59 am »
Hello,

I would like to use the Transfactivity tool from the REDUCE suite. One concern that I have is that if you use a large set of matrices (PWMs), some will be very similar and so can induce multicollinearity in their scores at promoter regions, and it will introduce multicollinearity between them in the multiple linear regression analysis. If so, then the activities of each TF associated to the similar PWMs might be wrong. Any insight/advice on how to tackle that? It does not seem that you take that into consideration in the previous usage of the tool.

Thanks
Best
AM

xiangjun

  • Administrator
  • Normal
  • *****
  • Posts: 30
    • View Profile
Re: Multicollinearity
« Reply #1 on: September 07, 2018, 09:32:52 pm »
Hi,

Thanks for using the REDUCE Suite and for posting your questions on the Forum.

As can be seen from the source code, Transfactivity checks for degeneracy in input data using SVD. However, multicollinearity is not checked by the program, as you already noticed. Users need to deal with the multicollinearity issue using other tools.

Best regards,

Xiang-Jun

hjb2004

  • Administrator
  • Normal
  • *****
  • Posts: 4
    • View Profile
Re: Multicollinearity
« Reply #2 on: September 10, 2018, 01:55:26 pm »
Dear Anthony,

Following up on Xiang-Jun's reply, you are correct that the Transfactivity program does not explicitly deal with collinearity. This should not be a problem when Transfactivity is used to infer TF activities for additional expression profiles using one more PSAMs generated by MatrixREDUCE, as the stepwise PSAM discovery implemented by MatrixREDUCE was explicitly designed to make the PSAMs distinct from each other. In other words, when AffinityProfile is used with a set of PSAMs discovered by MatrixREDUCE to create a matrix containing total affinities for each sequence (which is also the first step performed by Transfactivity), the columns of that matrix will be close to orthogonal. The value of the regression coefficients in a multi-PSAM linear regression will then be close those obtained in separate single-PSAM fits.

Things are potentially different, however, when Transfactivity is used with a set of PSAMs obtained from another source such as Jaspar. In that case, there is no guarantee that the columns of the affinity matrix created by AffinityProfile are independent of each other, and the behavior of the regression could indeed become unstable due to collinearity. We were dealing with exactly this situation in two of our lab’s previous papers. In one case, we implemented L2-penalized regression in R with a design matrix generated by AffinityProfile to deal with collinearity when inferring protein-level activities for a large number of yeast transcription factors (Lee et al., Mol Syst Biol 2010; https://www.ncbi.nlm.nih.gov/pubmed/20865005). In the second case, when we were doing the same for human transcription factors based on a collection of PWMs from Jaspar, we did some additional preprocessing on the design matrix in R as well (Lee et al., PNAS 2014; https://www.ncbi.nlm.nih.gov/pubmed/24706889; see supplemental methods).

I hope this is useful.

Best regards,
Harmen

 

Created and maintained by Dr. Xiang-Jun Lu [律祥俊]. See also http://forum.x3dna.org and http://x3dna.org