OSCON 2016 - Introducing Intel's Trusted Analytics Platform (TAP)

6/07/2016

TAP is Intel's new open source platform for performing data analysis and building data driven apps over big data stores. Open Source. As of this writing the big data platform of choice is the Cloudera Hadoop distribution. TAP is trying to solve the problem of having to deal with many disparate platforms, frameworks and processes for handling tasks around big data. TAP brings these different platforms under one roof for management purposes at least. However TAP's biggest value proposition is its solution to the common issue of data scientists writing code in Python or R that then has to be productionized (rewritten) to run in a distributed manner. Their solution to this problem is the Analytics Toolkit (ATK)
TAP ATK allows data scientists to write regular looking Python data science code in a Jupyter Notebook that is automatically translated into Spark jobs and run on a Spark Cluster. The scientist can then develop a model and publish it, making it available as a REST API for clients to consume. TAP ATK code must make use of the ATK dataframe (instead of pandas dataframe) and the ATK Machine Learning lib (instead of scikit-learn) to take advantage of the automatic translation to Spark. But ATK attempts to maintain compatibility with the tools data scientist are already familiar with.
Examples of problems solved with the TAP ATK are typical machine learning problems like Outlier detection of item placement at Levi stores, prediction of ER readmittance and ER workload. In the case of the Levi store example TAP was used to ingest the events, filter them, store them in HDFS, and create the machine learning model.
Here's an image showing the functions/tools that make up TAP: alt text
More information can be found at:

You Might Also Like

0 comments