TAP includes a complete pipeline for predictive analytics as well as scalable algorithms and in-memory engines for machine learning to address a broad variety of enterprise use cases. The TAP Analytics Layer is built around a flexible and plug-in-based architecture that is accessed by an API layer, which translates function calls from the analytics tools used by data scientists to the supporting algorithms for data wrangling and machine learning. This flexible plug-in architecture allows system operators to expand TAP capabilities and automatically expose user-accessible APIs for newly added functions.
For application developers who want to incorporate the results of advanced analytics into their application easily, TAP treats predictive models (the product of a data scientists’ work) as components. With TAP’s framework, the models produced are accessible to any of the REST-enabled applications that can run from virtually anywhere, on any web-enabled platform. They share the same integrated TAP framework and APIs, so rebuilding applications to accommodate specific languages is unnecessary. Additionally, this approach allows data scientists to evolve and improve the performance of models over time without significant changes to the application that relies upon it.
Batch and Stream Data Ingestions
TAP is data agnostic, and as such, it can handle data of any type – from real-time telemetry streams to massive, distributed Big Data stored in the cloud. TAP also provides access to message brokers that can route data requests to different message queues or data stores via name nodes.
TAP supports common ingestion protocols, including MQTT, Web Sockets (WS) and HTTP REST APIs. TAP also supports message queues based on Gearpump, Apache Kafka and RabbitMQ. These ingestion methods are highly scalable and can be elastically configured based on the amount of data required for a particular use case.
Data Flows (Pipelines)
TAP enables the creation of pipelines that involve complex services from any of multiple categories joined flexibly by simple connectors. For example, a service may ingest a stream over MQTT, transform a raw field using a Gearpump service, persist to a Kafka queue as a topic and score the result using a complex model that was previously trained on historical data stored in a Hadoop cluster. TAP also includes a scoring pipeline that enables easy deployment of data pipelines, which are used by data scientists in model development, to application developers, who use them to build analytics-driven product and services. The scoring pipeline leverages Python User Defined Functions (UDFs) to support the deployment of data processing and feature extraction in the same flow as the scoring engine. This combination of capabilities, provided by TAP, allows for more efficient and flexible delivery of prediction results from the scoring engine, when deployed to production by application developers.
TAP provides an extensible framework that allows new commands and algorithms to be added as plug-ins to the application. They are written in Scala, a functional programming, script-like language. Plug-ins do not require the author to have a deep understanding of the REST server, execution engine, or any of its libraries. The framework will create the code necessary (in Python) to call the plugin through the REST API, as well as generate Python client code, leaving the author to focus on the execution logic.
In machine learning, TAP incorporates two specific hardware-accelerated solutions: Math Kernel Library (MKL) and Data Analytics Acceleration Library (DAAL). MKL accelerates math-processing routines that increase application performance and reduce development time. It includes highly vectorized and threaded Linear Algebra, Fast Fourier Transforms (FFT), Vector Math and Statistics functions. These functions automatically scale on Intel processor architectures by selecting the best code path for each processor generation. In addition, TAP integrates DAAL together with the TAP Analytics Toolkit (ATK) and removes many of the complexities of support for different languages. ATK also extends DAAL to persist the analytical models in its metastore to deliver a consistent end-to-end pipeline for data scientists with broad support for a wide array of machine learning algorithms. The performance of Intel MKL and DAAL can be tuned by setting the maximum number of parallel threads. On multi-core systems, users can bind threads to CPU cores by setting an affinity mask. Processor affinity improves performance by preventing threads from migrating from core to core. More details on tuning options can be found at https://software.intel.com/en-us/mkl-for-linux-userguide. The graph below illustrates the significant performance gains realized by using Intel DAAL and MKL when compared to Apache Spark MLlib.