Abhishek Tiwari:TEXT MINING

From OpenWetWare
Revision as of 00:14, 9 September 2006 by Abhishektiwari (talk | contribs)
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Home        Contact        iCODONS        Publications        Research        Projects        Softwares        Links       

Text Mining

Oxford Bioinformatics Volume 22 | Number 18 | 15 September 2006

  • Text similarity: an alternative way to search MEDLINE

Garner, Harold et al. have created and optimized a new, hybrid search system for Medline that takes natural text as input and then delivers results with high precision and recall. The combination of a fast, low-sensitivity weighted keyword-based first pass algorithm to cast a wide net to gather an initial set of literature, followed by a unique sentence-alignment based similarity algorithm to rank order those results was developed that is sensitive, fast and easy to use. Literature searching algorithms are implemented in a system called eTBLAST. eTBLAST is a unique search engine for searching biomedical literature. eTBLAST service is very different from PubMed. While PubMed searches for "keywords", eTBLAST search engine lets you input an entire paragraph and returns MEDLINE abstracts that are similar to it. This is something like PubMed's "Related Articles" feature, only better because it runs on your unique set of interests. No more guessing whether your set of keywords has found all the right papers. No more sorting through hundreds of papers you don't care about to find the handful you were looking for--eTBLAST search engine does it for you.

Oxford Bioinformatics Volume 22 | Number 17 | 1 September 2006

  • Combination of text-mining algorithms increases the performance

In this paper, Malik, Rainer et al. show that by combining different algorithms and their outcome, the results improve significantly. Method was implemented by CONAN, a system which combines different programs and their outcome. Its methods include tagging of gene/protein names, finding interaction and mutation data, tagging of biological concepts and linking to MeSH and Gene Ontology terms. CONAN, a text mining system that can automatically extract and display the following information: protein/gene names, protein point mutations, protein-protein interactions and biologically interesting keywords. I presented the applications where CONAN is integrated: a command-line tool to query CONAN, a web server and the integration of protein-protein interaction data in a human gene interaction network. With the integration into a human gene interaction network, also "hidden" information can be extracted. CONAN was developed integrating some of the newest and most interesting algorithms and methods into one framework.