Accurate systems for extracting Protein-Protein Interactions (PPIs)
automatically from biomedical articles can help accelerate biomedical research.
Biomedical Informatics researchers are collaborating to provide metaservices
and advance the state-of-art in PPI extraction. One problem often neglected by
current Natural Language Processing systems is the characteristic complexity of
the sentences in biomedical literature.
Automatically extracting organization names from the affiliation sentences of
articles related to biomedicine is of great interest to the pharmaceutical
marketing industry, health care funding agencies and public health officials.
It will also be useful for other scientists in normalizing author names,
automatically creating citations, indexing articles and identifying potential
resources or collaborators.
Social Network Analysis (SNA) of organizations can attract great interest
from government agencies and scientists for its ability to boost translational
research and accelerate the process of converting research to care. For SNA of
a particular disease area, we need to identify the key research groups in that
area by mining the affiliation information from PubMed. This not only involves
recognizing the organization names in the affiliation string, but also
resolving ambiguities to identify the article with a unique organization.
The complexity of sentences characteristic to biomedical articles poses a
challenge to natural language parsers, which are typically trained on
large-scale corpora of non-technical text. We propose a text simplification
process, bioSimplify, that seeks to reduce the complexity of sentences in
biomedical abstracts in order to improve the performance of syntactic parsers
on the processed sentences. Syntactic parsing is typically one of the first
steps in a text mining pipeline. Thus, any improvement in performance would
have a ripple effect over all processing steps.