Reflecting upon 2018 and wishing a great 2019!
It has been a hectic year at NLPCORE with quite a few developments, achievements and learning experiences. As the year draws to close, it is time to take a moment to reflect on what we achieved, fell short of, convey our gratitude to all those who continue to support us and plan our year ahead.
A new look - https://nlpcore.com/
We had been maintaining our blog posts (such as this one) at our nlpcore.io website and also used the same as our corporate portal. We took sometime to put our heads together and designed a simple portal interface, drafted and reviewed text and images and migrated to a formal customer focused website at https://nlpcore.com. All existing content - be it our beta website for life sciences, our blogs or our developer APIs, samples or documentation - we have made it all easily accessible from the new site with just one-click. We invite you to explore and send us your feedback.
Significant Architecture Improvements
It is core of what we do (https://nlpcore.com/blog_rest.html#content2-2a) - Availability, Scale, Throughput with multiple projects, multiple users against millions of documents worth terabytes of data being crawled, indexed, searched through neural networks concurrently! We continually experiment and strive to optimize both our computing resources and software components to arrive at maximum availability, throughput and scale.
To this end, we improved our existing DFS implementation (https://nlpcore.com/blog_dfs.html#content2-26) to support replicas that synchronize updates at a lazy but regular interval. This not only helped us create segregated project specific portals (such as alpha.nlpcore.com or beta.nlpcore.com) but also helped us separate crawling and indexing store from run-time search store on separate hardware resources to co-locate compute intensive search neural networks with local index store (a replicated copy of the one being continually indexed!).
Next to improve concurrency of our search nodes, we decided to borrow the message queue implementation approach and apply it to our query processing thereby chunking the number of documents being processed by each query to a small subset and then letting all queries to round-robin concurrently in a queue. For example if a keyword search required us to compute our proximity/part-of-speech/dictionary based knowledge graph from 10,000 documents (count looked from full-text reverse index), then at any time this search would only execute neural network for 100 odd documents and then return the control back to the queue manager to let the other search threads process their batch of similar 100 odd documents. This significantly improved concurrency of our search experience.
Third quarter of 2018 brought us a great validation to years of our efforts - software, technical write-ups, legal discussions and arbitration all together culminated in US Patent Office granting us our core technology patent (#10,102,274 - https://patents.google.com/patent/US10102274B2) that we now have already filed continuations for, to include all our recent innovations.
As researchers started providing feedback directly in our results, we wanted to make sure that we capture the same against individual profile so that we can reflect it against their searches, can collate it together with the same from others, rank it based upon their profile (subject matter expert, project lead or casual user) and continually improve quality of results for all users. To do so, we incorporated a complete authentication and role based security model leveraging KeyCloak (https://www.keycloak.org/) open source identity and access management platform.
A new user-interface
As we worked closely with researchers from Center for Global Infectious Disease Research (https://www.seattlechildrens.org/research/centers-programs/global-infectious-disease-research/), we learned a great deal about their specific requirements. Articles often have verbose descriptions in "discussion" section but these may not be as relevant or precise references as those mentioned in "abstract", "results" or "methods and materials" sections. Researchers need a simple way to analyze a particular protein interaction in one cell type or organ (say liver) against the same in another cell type or organ (say brain). They need to sift through or prune results to collapse various nodes (as synonyms) but retain their references (edges) to neighboring nodes. We were able to accommodate many such requirements in a generic fashion to work across all types of searches and search results.
Given that, we improved our architecture to make it more concurrent, added a lot more user interface features, we came to a difficult decision point for our UX platform - continue to invest in what we had as a web application framework and work around its limitations or start from scratch and leverage our asynchronous platform to its fullest potential. At the same time, we also decided to showcase our ability to extract detailed product (bioentities and reagents) usage information for any given vendor along with their annotated references. This gave us an opportunity to start a fresh project with minimalist framework overhead.
The migration has taken us some time and it still remains a work in progress but results are showing through (here is a brief video showcasing product usage data extracted from published research - https://youtu.be/R4lrFRs-Uy4). We surface partial results immediately as they become available, we also enable immediately or delay load some search options (such as related products, references or advanced search) with background processing in our platform and associated APIs. We are also taking the opportunity to refresh, consolidate and simplify our user interface so that a new user can not only get to first set of default results quickly but then can also explore, sift through, prune or navigate through results graph with simple point and click controls.
Looking ahead in 2019
It has been a tremendous year for us to round out a number of key initiatives we carried forward from the years past - US Patent, Web Scale architecture and platform, Innovative user-interface for biologists and product usage data for life sciences manufacturers and market places. We regrouped and refined our business focus for the 2019 in following key initiatives.
Invest in specific life sciences use-cases for researchers through strategic partners to build case studies, publish technical papers, expand user base and establish brand presence
Attract life sciences manufacturers and marketplaces to sign-up subscriptions for their products usage data feed updated routinely with newly published research
Offer "A Last Mile to AI" software platform that compliments offerings from Amazon AWS, Microsoft Azure or Google App Engine as a low-barrier super set for third-party solution developers
We have major initiatives underway on all above fronts thanks to our enthusiastic and supportive partners and are looking forward to a great 2019 ahead of us. We wish you all the very best of holidays and a happy 2019 ahead!