Gudu SQLFlow Product Docs
  • 1. Introduction
    • What is Gudu SQLFlow?
      • What SQLFlow can do
      • Architecture Overview
    • Getting Started
      • Sign up a new account
        • Team Management
        • Delete My Account
        • Activate by entering a coupon
      • How to use SQLFlow
      • Different modes in Gudu SQLFlow
        • Query mode
        • Job mode
      • Basic Usage
      • Convert SQL to E-R Diagram
      • Colors in SQLFlow Diagram
      • Show call relationship
    • Installation
      • Version and Users
        • Cloud and On-Premise version
        • SQLFlow before Version 6
          • For older version SQLFlow under Linux
          • For older version SQLFlow under MacOS
          • For older version SQLFlow under Windows
      • Linux
      • MacOS
      • Windows
      • Docker
      • Clickhouse Installation
        • Clickhouse For CentOs
        • Clickhouse For Ubuntu/Debian/RHEL
      • Troubleshooting
      • Upgrade
      • Third Party Components
      • Renew License File
    • UI
      • SQLText Editor
      • Schema Explorer
      • Diagram Panel
      • Settings
      • Job Management
        • Job Sources
    • Dlineage Tool
      • Overview
      • Usage
        • Analyze data linege from SQL files
        • Analyze data linege from a database
        • Resolve the ambiguous columns in SQL query
        • Map the DataFlowAnalyzer and the settings on SQLFlow UI
        • Settings
      • Dataflow.xml structure
      • FAQ
  • 2. CONCEPTS
    • Data Lineage Basics
      • Dataflow
        • Relations generated by SQLFlow
      • Direct Dataflow
      • Indirect Dataflow
      • Aggregate function and Dataflow
      • Dataflow chain
    • Data Lineage Format Reference
  • 3. API Docs
    • Prerequisites
    • Using the Rest API
    • SQLFlow Rest API reference
      • User Interface
      • Generation Interface
        • /sqlflow
        • /sqlflow/selectedgraph/table_level_lineage
        • /sqlflow/selectedgraph/image
        • /sqlflow/graph
        • /sqlflow/graph/table_level_lineage
        • /sqlflow/graph/image
        • /sqlflow/downstreamGraph
        • /sqlflow/upstreamGraph
        • /sqlflow/erdiagramSelectGraph
        • /sqlflow/leftMostSourceTableGraph
      • Job Interface
        • /submitUserJob
        • /displayUserJobSummary
        • /displayUserJobsSummary
        • /exportLineageAsJson
        • /exportFullLineageAsJson
        • /exportLineageAsGraphml
        • /submitPersistJob
        • /displayUserLatestJobTableLevelLineage
      • Export Image
      • Export CSV
        • /sqlflow/exportFullLineageAsCsv
        • /job/exportFullLineageAsCsv
    • Swagger UI
    • Export the data lineage result
    • Python
      • Basic Usage
      • Advanced Usage
    • SQL Parser API
      • checkSyntax
  • 4. SQLFlow Widget
    • Widget Get started
    • Usages
    • Widget API Reference
  • 5. Databases
    • Database Objects
      • Azure
      • DB2
  • 6. SQLFlow-ingester
    • Introduction
      • SQLFlow-Exporter
      • SQLFlow-Extractor
      • SQLFlow-Submitter
    • Get Started
      • SQL Server
    • SQLFlow-Ingester Java API Usage
    • Understand the format of exported data
      • Oracle
      • Microsoft SQL Server
      • MySQL
      • PostgreSQL
    • List of Supported dbVendors
    • Git Repo
    • Third Party Components
  • 7. Reference
    • Lineage Model
      • Json Format Lineage Model
      • XML Format Lineage Model
      • Data Lineage Elements
    • Database Model
  • 8. other
    • FAQ
      • Handling Internal Database
      • Delete Your Account
      • Table Form Data Without Intermediates
      • Not all schema exported from Oracle
      • Lineage Customization
    • Roadmap
    • SQL Samples
      • Exchange table partition
      • Generate relationship for renamed table
      • Snowflake table function lineage detection
    • Change Logs
    • SQLFlow with Oracle XML functions
    • Major Organizations Utilizing SQLFlow
Powered by GitBook
On this page
  1. 2. CONCEPTS
  2. Data Lineage Basics

Direct Dataflow

https://github.com/sqlparser/sqlflow_public/blob/master/doc/get-started/direct-dataflow.md

This article introduces some SQL elements that will generate direct dataflow.

1. Select

SELECT a.empName "eName"
FROM scott.emp a
Where sal > 1000

the data of target column "eName" comes from scott.emp.empName so we have a direct dataflow like this:

scott.emp.empName -> direct -> RS-1."eName"

the resultset RS-1 generated by the select list is a relation, which includes columns and rows.

dataflow in XML

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<dlineage>
    <table id="2" schema="scott" name="scott.emp" alias="a" type="table" coordinate="[2,6,0],[2,17,0]">
        <column id="3" name="empName" coordinate="[1,8,0],[1,17,0]"/>
    </table>
    <resultset id="5" name="RS-1" type="select_list" coordinate="[1,8,0],[1,25,0]">
        <column id="6" name=""eName"" coordinate="[1,8,0],[1,25,0]"/>
    </resultset>
    <relation id="1" type="fdd" effectType="select">
        <target id="6" column=""eName"" parent_id="5" parent_name="RS-1" coordinate="[1,8,0],[1,25,0]"/>
        <source id="3" column="empName" parent_id="2" parent_name="scott.emp" coordinate="[1,8,0],[1,17,0]"/>
    </relation>
</dlineage>

The relation represents a dataflow from source column with id=3 to the target column with id=6

diagram

2. Function

During the dataflow analyzing, function plays a key role. It accepts columns as arguments and generate result which maybe a scalar value or a set value.

select round(salary) as sal from scott.emp

A direct dataflow is generated from column salary to the round function in the above SQL :

scott.emp.salary -> direct -> round(salary) -> direct -> sal

dataflow in xml

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<dlineage>
    <table id="2" schema="scott" name="scott.emp" type="table" coordinate="[1,34,0],[1,43,0]">
        <column id="3" name="salary" coordinate="[1,14,0],[1,20,0]"/>
    </table>
    <resultset id="5" name="RS-1" type="select_list" coordinate="[1,8,0],[1,28,0]">
        <column id="6" name="sal" coordinate="[1,8,0],[1,28,0]"/>
    </resultset>
    <resultset id="8" name="FUNCTION-1" type="function" coordinate="[1,8,0],[1,21,0]">
        <column id="9" name="round" coordinate="[1,8,0],[1,13,0]"/>
    </resultset>
    <relation id="1" type="fdd" effectType="select">
        <target id="6" column="sal" parent_id="5" parent_name="RS-1" coordinate="[1,8,0],[1,28,0]"/>
        <source id="9" column="round" parent_id="8" parent_name="FUNCTION-1" coordinate="[1,8,0],[1,13,0]"/>
    </relation>
    <relation id="2" type="fdd" effectType="function">
        <target id="9" column="round" parent_id="8" parent_name="FUNCTION-1" coordinate="[1,8,0],[1,13,0]"/>
        <source id="3" column="salary" parent_id="2" parent_name="scott.emp" coordinate="[1,14,0],[1,20,0]"/>
    </relation>
</dlineage>

diagram

if you turn off the show function setting with /if option, the result is:

3. References

PreviousRelations generated by SQLFlowNextIndirect Dataflow

Last updated 2 years ago

xml code used in this article is generated by tools

digram used in this article is generated by the

Gudu SQLFlow Cloud version
DataFlowAnalyzer