Quality of Search Results - Validating Protein-Protein and Host Gene-Virus Interactions using NLPCORE
As we round-out our product features, improve our platform and core text mining / entity extraction algorithms, we also focused on validating quality of search results. Thanks to our collaborators at University of Washington and Center for Infectious Disease Research (CIDR) we chose two representative life sciences data sets - one being the most commonly used Protein-Protein interactions and another being an experimentally discovered Host Gene-Virus interactions. Using these sets, we were able to not only validate a high recall rate but also a good precision through identifying interactions that were otherwise not mentioned in experimentally discovered set.
Here is a link to our draft Application Note that we intend to revise shortly in the new year with our latest iteration of core algorithms. And here is the link for its supplementary information with more details on our methods and data sets used in our tests.