A personal relationship with medical excellence

Industry

Technology

Akka Angularjs Apache Cordova Elasticsearch Ionic Framework MLib Play Framework Scala Spark

Introduction

NashTech worked along Fitfyles on their raw data. NashTech built a pipeline in Apache Spark because of Spark’s ability to seamlessly integrate different data sources within six sprints.

Fitfyles is a service that is disrupting the health seeker and healthcare space. It allows the health seeker to get the ownership of records and their analysis back into their own hands instead of depending on clinics, hospitals, and individual doctors. The health seeker information usually is islands of unrelated information at various places. FitFyles not only aggregates that information at the click of a picture but also transcribes that data into usable information.

The challenge

FitFyles wanted to take the next step with all the data it collected by offering the health seeker a comparison of their prescription with others of similar profile. They called it “Third Opinion” This would allow the health seeker to benchmark this prescription with their peers and allow them to seek a second opinion or ask more informed questions from their doctors if needed. The major challenges for this feature were:

Terabyte-scale data volume: There are over 50 million unique prescriptions in over 1.5 billion user-generated medical record entries. Need for fast processing performance: The prescription data required a quick matching against the drug database and other clinical conditions to eradicate false matches or recommendations. Diverse and complex analytics algorithm needs: As part of the verification process, the member-input data needed to be normalised (e.g. removal of stop words, lower-case conversion), de-duplicated, and aggregated by a wide array of machine learning algorithms.

The solution

NashTech worked along with Fitfyles through their raw data. NashTech built the pipeline in Apache Spark because of Spark’s ability to seamlessly integrate different data sources, the availability of data processing libraries within MLlib and GraphX, fast performance to avoid slow table joins, and being able to significantly speed up operations that could be parallelised in a distributed fashion. The data pipeline with the analysis dashboard was built within 6 sprints and is a massive hit with the users of the platform.

The data pipeline with the analysis dashboard was built within 6 sprints and is a massive hit with the users of the platform.