Haynes Lab:Notebook/Synthetic Biology and Bioinformatics for Predictable Control of Therapeutic Genes/2012/07/23: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
Line 12: Line 12:
* The macros asks excel to read through the selected column (column 2, or the start column in the case) and delete the entire row (chromosome, start, end, mRNA ID, and strand) if it recognizes that a value is not unique
* The macros asks excel to read through the selected column (column 2, or the start column in the case) and delete the entire row (chromosome, start, end, mRNA ID, and strand) if it recognizes that a value is not unique
<br>
<br>
[[http://www.cpearson.com/excel/deleting.htm DeleteDuplicateRows Code]]
[http://www.cpearson.com/excel/deleting.htm DeleteDuplicateRows Code]
<br>
<br>
*The end results filters the original list to nearly half its size.
*The end results filters the original list to nearly half its size.

Revision as of 00:07, 24 July 2012

Project name <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
<html><img src="/images/c/c3/Resultset_previous.png" border="0" /></html>Previous entry<html>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</html>

Filter the RefSeq List

  • After much trial and error I was finally able to provide a "cleaned up" RefSeq list
  • The problem with RefSeq is that a gene can have multiple mRNA IDs assigned to the same genomic interval. All of those extra mRNA IDs are transcribed to the same gene, so I needed to get rid of them.
  • A macros called DeleteDuplicateRows in Excel VBA was executed on the list.
  • The macros asks excel to read through the selected column (column 2, or the start column in the case) and delete the entire row (chromosome, start, end, mRNA ID, and strand) if it recognizes that a value is not unique


DeleteDuplicateRows Code

  • The end results filters the original list to nearly half its size.
  • If you would like a copy of this list please contact carlyhom91@gmail.com