The introduction of high-throughput methods has transformed biology into a data-richscience. Knowledge about biological entities and processes has traditionally been acquired bythousands of scientists through decades of experimentation and analysis. The current abundance ofbiomedical data is accompanied by the creation and quick dissemination of new information. Much ofthis information and knowledge, however, is represented only in text form--in the biomedicalliterature, lab notebooks, Web pages, and other sources. Researchers' need to find relevantinformation in the vast amounts of text has created a surge of interest in automatedtext-analysis.
In this book, Hagit Shatkay and Mark Craven offer a concise andaccessible introduction to key ideas in biomedical text mining. The chapters cover such topics asthe relevant sources of biomedical text; text-analysis methods in natural language processing; thetasks of information extraction, information retrieval, and text categorization; and methods forempirically assessing text-mining systems. Finally, the authors describe several applications thatrecognize entities in text and link them to other entities and data resources, support the curationof structured databases, and make use of text to enable further prediction and discovery.