Job mode

https://e.gitee.com/gudusoft/projects/151613/docs/655632/file/1546243?sub_id=5928451

SQLFlow Job mode is dedicated for handling large amounts of SQL Scripts or directly connecting to the target database server for the data linenage analysis. Config parameters for the analysis should be given when creating a new SQLFlow Job and the parameters cannot be changed once submitted.

Job type

There are two job types:

  • Simple Job

  • Regular Job

Both Simple Job and Regular Job support reading large amounts of SQL files or analysis through DB directly. There are some differences between Simple Job and Regular Job.

Simple Job

  • Possible to add some configs. Once submitted, the configs cannot be changed

  • Result will be persisted in the file system as files

  • Possible to do the Left Most analysis (Left Most: given a->b->c, show a->c )

  • Data lineage result is saved in dataflow.xml file

dataflowOfAggregateFunction: direct (Configurable Parameter)
ignoreFunction:true (Configurable Parameter)
ignoreRecordSet:true (Configurable Parameter)
showConstantTable:(Configurable Parameter)
showRelationType:fdd (Configurable Parameter)
showTransform: false (Configurable Parameter)

simpleOutput: false (Not configurable)
hideColumn: false (Frontend UI doesn't support)
normalizeIdentifier:true (Not configurable)
showLinkOnly:true (Not configurable)

Configurable Parameter: The parameters which can be set before the job creation. There's no way to change such parameters once the job is created. You have to create a new job with the new parameter value if needed.

Regular Job

  • Unable to set any data lineage configs

  • Support table level lineage

  • Result will be in database

  • Support if incremental, possible to anaylze SQL scripts or database in batches

  • Possible to do the Left Most analysis (Left Most: given a->b->c, show a->c )

  • Possible to do the Upstream and Downstream analysis (given a->b->c, Upstream: a->b, Downstream: b->c)

  • Data lineage result is saved in database

default distance:(The parameter can be modified even after the job is created)
showConstantTable:(Configurable Parameter)
dataflowOfAggregateFunction: direct (Not configurable)
ignoreFunction:true (Not configurable)
ignoreRecordSet:true (Not configurable)
showRelationType:fdd (Not configurable)
showTransform: false (Not configurable)

hideColumn: false (Frontend UI doesn't support)
normalizeIdentifier:true (Not configurable)
showLinkOnly:true (Not configurable)
simpleOutput: false (Not configurable)

Configurable Parameter: The parameters which can be set before the job creation. There's no way to change such parameters once the job is created. You have to create a new job with the new parameter value if needed.

Job parameters and UI Settings parameters

Some of the above parameters cover the parameters in UI Settings section

showConstantTable has the same effect as show constant

ignoreFunction has the same effect as show function

ignoreRecordSet has the same effect as show intermediate recordset

showTransform has the same effect as show transform

dataflowOfAggregateFunction has the similar effect as dataflow of count function

Summary Result

When using SQLFlow Front UI, all data lineage will be returned if the number of relationship is less than 1,000. However, only database, schema, table, view data and the number of above DB units will be returned if the number is more than 1,000. No relationship data will be returned in the case and this is called Summary result.

User can request specific database, schema, table or view but result will be in Summary mode if the result number is more than 1,000.

There's a performance limitation in the frontend page rendering and moreover the graph will be complex to understand so we impose a restriction on the UI endpoint (/sqlflow/generation/sqlflow/graph). However, request to the /sqlflow/generation/sqlflow for SQLFLow Rest Api doesn't have such limitation.

Last updated