Data transformation pipeline
Redis Data Integration’s Data Transformation allows users to transform their data beyond the default translation of source types to Redis types. The transformation involves no coding! Instead it is described in a set of human readable YAML files, one per source table.
The ingested format and types are different from one source to another. Currently, the only supported source is Debezium. The first transformation from Debezium types to a naive JSON with Redis types is done automatically without any need for user instructions. Then, this JSON is passed on to the user defined transformation pipeline
Each job describes the transformation logic to perform on data from a single source. The source is typically a database table or collection and is specified as the full name of this table/collection. The job may include a filtering logic, to skip data that matches a condition. Other logical steps in the job will transform the data into the desired output that will be stored in Redis (as Hash or JSON).
Jobs
Each job will be in a separate YAML file. All of these files will be uploaded to Redis Data Integration using the deploy
command (see Deploy configuration). If you are using the scaffold command, place the job files in the jobs
folder.
Job YAML structure
Fields
-
source
:This section describes what is the table that this job works on:
server_name
: Logical server name (optional). Corresponds todebezium.source.topic.prefix
property specified in Debezium Server’sapplication.properties
config filedb
: DB name (optional)schema
: DB schema (optional)table
: DB tablerow_format
: Format of the data to be transformed: data_only (default) - only payload, full - complete change record
Note: Any reference to
server_name
,db
,schema
andtable
properties will be treated by default as case insensitive. This can be changed by settingcase_insensitive
tofalse
.
-
transform
:This section includes a series of blocks that the data should go through. See documentation of the supported blocks and JMESPath custom functions.
-
output
:This section includes the outputs where the data should be written to:
- Cassandra:
uses
:cassandra.write
: Write into a Cassandra data storewith
:connection
: Connection namekeyspace
: Keyspacetable
: Target tablekeys
: Array of key columnsmapping
: Array of mapping columnsopcode_field
: Name of the field in the payload that holds the operation (c - create, d - delete, u - update) for this record in the DB
- Redis:
uses
:redis.write
: Write to a Redis data structurewith
:connection
: Connection namekey
: This allows to override the key of the record by applying a custom logic:expression
: Expression to executelangauge
: Expression language, JMESPath or SQL
- SQL:
uses
:relational.write
: Write into a SQL-compatible data storewith
:connection
: Connection nameschema
: Schematable
: Target tablekeys
: Array of key columnsmapping
: Array of mapping columnsopcode_field
: Name of the field in the payload that holds the operation (c - create, d - delete, u - update) for this record in the DB
- Cassandra:
Notes
source
is required.- Either
transform
andkey
(or both) should be specified.
Example
This example shows how to rename a certain field (fname
to first_name
) in a given table (emp
) using the rename_field
block. It also demonstrates how to set the key of this record instead of relying on the default logic.
redislabs.dbo.emp.yaml
source:
server_name: redislabs
schema: dbo
table: emp
transform:
- uses: rename_field
with:
from_field: fname
to_field: first_name
output:
- uses: redis.write
with:
connection: target
key:
expression: concat(['emp:fname:',fname,':lname:',lname])
language: jmespath
Deploy configuration
In order to deploy your jobs to the remote RDI database, run:
redis-di deploy
Deploy configuration on Kubernetes
If the RDI CLI is deployed as a pod in a Kubernetes cluster, perform these steps to deploy your jobs:
-
Create a ConfigMap from the YAML files in your
jobs
folder:kubectl create configmap redis-di-jobs --from-file=jobs/
-
Deploy your jobs:
kubectl exec -it pod/redis-di-cli -- redis-di deploy
Note: A delay occurs between creating/modifying the ConfigMap and its availability in the
redis-di-cli
pod. Wait around 30 seconds before running theredis-di deploy
command.
You have two options to update the ConfigMap:
-
For smaller changes, you can edit the ConfigMap directly with this command:
kubectl edit configmap redis-di-jobs
-
For bigger changes (for example, adding another job file), edit the files in your local
jobs
folder and then run this command:kubectl create configmap redis-di-jobs --from-file=jobs/ --dry-run=client -o yaml | kubectl apply -f -
Note: You need to run
kubectl exec -it pod/redis-di-cli -- redis-di deploy
after updating the ConfigMap with either option.