Apache Beam

Apache Bean help us to implement batch and streaming data processing jobs that run on any execution engine.

Several TFX (TensorFlow Extended) components rely on Beam for distributed data processing. In addition, TFX can use Apache Beam to orchestrate and execute the pipeline DAG.

Beam orchestrator uses a different BeamRunner than the one which is used for component data processing. With the default DirectRunner setup the Beam orchestrator can be used for local debugging without incurring the extra Airflow or Kubeflow dependencies, which simplifies system configuration.

Apache Beam