SPEED. SCALE. SIMPLICITY.


●     Highly automated legacy ETL conversion
●     Visual Step-by-Step debugger
●     High Performance Spark code
●     Configuration & Deployment

REDEFINING ETL FOR THE HYBRID CLOUD

Legacy Stack

Data Lakes

Hybrid Cloud Architecture

VISUAL DESIGNER

ETL · ML · Notebook

Containerized Deployment

Unified Data Lake

Complete Data Lifecycle

DATA INGESTION

Utilize all data

Ingest and manage datasets from any source. Choose from multiple flat file formats, databases and SaaS systems.

EXPLORE & PREPARE

Use spreadsheet view to understand your data and statistics about its quality and distribution. Use prepare functions to modify data, with over 200 built-in functions. Write any SQL expressions.

JOIN & TRANSFORM

Use visual transform functions and custom SQL operators to combine and transform data into the shape you want.

We support complete SparkSQL and SQL 2003.

AUTO TRAIN AND DEPLOY ML MODELS

AutoML allows you to train the best fit predictive model on your data, incorporated into the same transform pipelines.

We automate feature generation, model selection and hyper-parameter tuning.

CUSTOM SCRIPTS &
NOTEBOOK PROGRAMMING

Script Node comes with an inbuilt Jupyter notebook connected to the same Spark as the workflow.

Use the notebook to experiment and generate a function.

Use the final function as a script in your workflow. Scripts can be saved and reused

DEPLOY PRODUCTION PIPELINES

We generate the code, configuration and scheduling required to run production ETL on your open source stack.

DATA INGESTION

From File or Databases

Data Ingestion allows data to be ingested from a file in cloud storage such as S3 – various file formats are supported.

You can also get the data from a database using the JDBC connector.

UNDERSTAND AND PREPARE DATA

Use spreadsheet view to understand your data and statistics about its quality and distribution Use prepare functions to modify data, with over 200 built-in functions and any SQL expression can be written

ETL AND TRANSFORM DATA WITH SQL

Use SQL Queries and SQL operators to combine and transform data into the shape you want.

We support complete SparkSQL and SQL 2003.

BUILD ML PIPELINES

ML Pipeline builder allows you to build simple pipelines or pipelines with hyper-parameter tuning.

See the features you are generating immediately including statistics on the created columns

NOTEBOOK PROGRAMMING AND CUSTOM SCRIPTS

Script Node comes with an inbuilt Jupyter notebook connected to the same Spark as the workflow.

Use the notebook to experiment and generate a function.

Use the final function as a script in your workflow. Scripts can be saved and reused

DEPLOY ML MODELS

Powerful and flexible deployment as model, web service or pod. Versioning and monitoring makes it reliable. High performance with low latency and high concurrency.

CATALOG - COLLABORATE WITH VERSIONING

The Catalog allows teams to collaborate on shared projects where you can share datasets, workflows and even scripts.

Versioning ensures that no information is lost on multiple updates when collaborating

Complete Development Lifecycle

Build Workflows

Interactive development & debugging with incremental execution to build workflows on Spark

Execute and Schedule

We generate code, configuration and scheduling for test and production environments

Get Support

We help you convert to the modern architecture - maintaining performant and correct ETL workflows.