Gudu SQLFlow Product Docs
  • 1. Introduction
    • What is Gudu SQLFlow?
      • What SQLFlow can do
      • Architecture Overview
    • Getting Started
      • Sign up a new account
        • Team Management
        • Delete My Account
        • Activate by entering a coupon
      • How to use SQLFlow
      • Different modes in Gudu SQLFlow
        • Query mode
        • Job mode
      • Basic Usage
      • Convert SQL to E-R Diagram
      • Colors in SQLFlow Diagram
      • Show call relationship
    • Installation
      • Version and Users
        • Cloud and On-Premise version
        • SQLFlow before Version 6
          • For older version SQLFlow under Linux
          • For older version SQLFlow under MacOS
          • For older version SQLFlow under Windows
      • Linux
      • MacOS
      • Windows
      • Docker
      • Clickhouse Installation
        • Clickhouse For CentOs
        • Clickhouse For Ubuntu/Debian/RHEL
      • Troubleshooting
      • Upgrade
      • Third Party Components
      • Renew License File
    • UI
      • SQLText Editor
      • Schema Explorer
      • Diagram Panel
      • Settings
      • Job Management
        • Job Sources
    • Dlineage Tool
      • Overview
      • Usage
        • Analyze data linege from SQL files
        • Analyze data linege from a database
        • Resolve the ambiguous columns in SQL query
        • Map the DataFlowAnalyzer and the settings on SQLFlow UI
        • Settings
      • Dataflow.xml structure
      • FAQ
  • 2. CONCEPTS
    • Data Lineage Basics
      • Dataflow
        • Relations generated by SQLFlow
      • Direct Dataflow
      • Indirect Dataflow
      • Aggregate function and Dataflow
      • Dataflow chain
    • Data Lineage Format Reference
  • 3. API Docs
    • Prerequisites
    • Using the Rest API
    • SQLFlow Rest API reference
      • User Interface
      • Generation Interface
        • /sqlflow
        • /sqlflow/selectedgraph/table_level_lineage
        • /sqlflow/selectedgraph/image
        • /sqlflow/graph
        • /sqlflow/graph/table_level_lineage
        • /sqlflow/graph/image
        • /sqlflow/downstreamGraph
        • /sqlflow/upstreamGraph
        • /sqlflow/erdiagramSelectGraph
        • /sqlflow/leftMostSourceTableGraph
      • Job Interface
        • /submitUserJob
        • /displayUserJobSummary
        • /displayUserJobsSummary
        • /exportLineageAsJson
        • /exportFullLineageAsJson
        • /exportLineageAsGraphml
        • /submitPersistJob
        • /displayUserLatestJobTableLevelLineage
      • Export Image
      • Export CSV
        • /sqlflow/exportFullLineageAsCsv
        • /job/exportFullLineageAsCsv
    • Swagger UI
    • Export the data lineage result
    • Python
      • Basic Usage
      • Advanced Usage
    • SQL Parser API
      • checkSyntax
  • 4. SQLFlow Widget
    • Widget Get started
    • Usages
    • Widget API Reference
  • 5. Databases
    • Database Objects
      • Azure
      • DB2
  • 6. SQLFlow-ingester
    • Introduction
      • SQLFlow-Exporter
      • SQLFlow-Extractor
      • SQLFlow-Submitter
    • Get Started
      • SQL Server
    • SQLFlow-Ingester Java API Usage
    • Understand the format of exported data
      • Oracle
      • Microsoft SQL Server
      • MySQL
      • PostgreSQL
    • List of Supported dbVendors
    • Git Repo
    • Third Party Components
  • 7. Reference
    • Lineage Model
      • Json Format Lineage Model
      • XML Format Lineage Model
      • Data Lineage Elements
    • Database Model
  • 8. other
    • FAQ
      • Handling Internal Database
      • Delete Your Account
      • Table Form Data Without Intermediates
      • Not all schema exported from Oracle
      • Lineage Customization
    • Roadmap
    • SQL Samples
      • Exchange table partition
      • Generate relationship for renamed table
      • Snowflake table function lineage detection
    • Change Logs
    • SQLFlow with Oracle XML functions
    • Major Organizations Utilizing SQLFlow
Powered by GitBook
On this page
  • 1. Sample SQL
  • 2. Data lineage diagram
  • 3. JSON output of this data lineage
  • Main elements in the result
  • Summary
  1. 2. CONCEPTS

Data Lineage Basics

https://github.com/sqlparser/sqlflow_public/blob/master/doc/data-lineage-format/readme.md

PreviousFAQNextDataflow

Last updated 2 years ago

This article gives you a basic idea about the main elements in a data lineage result generated by the Gudu SQLFlow in the JSON/XML format.

For more detailed explanation about the elements used in a data lineage result, please check after reading this article.

1. Sample SQL

We use a simple Oracle SQL query to demostrate the data lineage analysis result.

INSERT INTO deptsal
            (dept_no,
             dept_name,
             salary)
SELECT d.deptno,
       d.dname,
       SUM(e.sal + Nvl(e.comm, 0)) AS sal
FROM   dept d
       left join (SELECT *
                  FROM   emp
                  WHERE  hiredate > DATE '1980-01-01') e
              ON e.deptno = d.deptno
GROUP  BY d.deptno,
          d.dname; 

2. Data lineage diagram

The data lineage generated by the Gudu SQLFlow for the above SQL.

3. JSON output of this data lineage

Main elements in the result

3.1 Database objects, JSON path: $.dbobjs

All database objects discovered during the data lineage analysis are stored in the $.dbobjs object.

Table

In the above sample SQL, there are four tables founded:

  • DEPTSAL you can use $.dbobjs[1].name to return the table name, and $.dbobjs[1].type to return the type of this object which is table in this case. you can also use expression like this to get this table:

    $.dbobjs[?(@.name=='deptsal')].name
  • DEPT

    $.dbobjs[?(@.name=='dept')].name
  • EMP

    $.dbobjs[?(@.name=='emp')].name
  • SQL_CONSTANTS This is not a real table, but a table generated by the Gudu SQLFlow to store the constant used in the SQL query.

    $.dbobjs[?(@.name=='SQL_CONSTANTS')].name

3.2 Relation, JSON path: $.relations

Relation is the atom unit of the data lineage. Relation build a link between the source and target column( column-level lineage).

Those relations are stored in the $.relations.

A relation includes the type, target, sources and other attributes.

  • type There are 2 types of relation between source and target column. a) Direct dataflow: represented by the fdd in the JSON output. b) In-direct dataflow: represented by the fdr in the JSON output. In-direct dataflow also known as impact type, which means the value of the source column doesn't affect the value of the target column, but effect the rows number of the target column. For instance, the relation between a source column in the where clause and the column in the select list is a In-direct relation(impact).

  • target This is the target column.

  • sources These are source columns where the data of the target column comes from.

Return a relation which target column is dept_no

$.relations[?(@.target.column=='dept_no')]

3.3 Connect relations to form a dataflow

Let's say you want to trace back from DEPTSAL.dept_no to DEPT.deptno in the above diagram.

  • Discover the relation 1 using the following JSON path

    $.relations[?(@.target.column=='dept_no')]
  • Discover the relation 2 using the following JSON path

    $.relations[?(@.target.column=='deptno')]

Summary

This is a brief introduction about the structure and elements in a data lineage result generated by the Gudu SQLFlow.

Please check the data lineage structure reference for more detailed information.

the data lineage format reference
Link to the JSON file.