Data Lineage Basics¶
This article gives you a basic idea about the main elements in a data lineage result generated by the Gudu SQLFlow in the JSON/XML format.
For more detailed explanation about the elements used in a data lineage result, please check the data lineage format reference after reading this article.
1. Sample SQL¶
We use a simple Oracle SQL query to demostrate the data lineage analysis result.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
2. Data lineage diagram¶
The data lineage generated by the Gudu SQLFlow for the above SQL.

3. JSON output of this data lineage¶
Main elements in the result¶
3.1 Database objects, JSON path: $.dbobjs¶
All database objects discovered during the data lineage analysis are stored in the $.dbobjs object.
Table
In the above sample SQL, there are four tables founded:
-
DEPTSAL you can use
$.dbobjs[1].nameto return the table name, and$.dbobjs[1].typeto return the type of this object which istablein this case. you can also use expression like this to get this table:* DEPT1$.dbobjs[?(@.name=='deptsal')].name* EMP1$.dbobjs[?(@.name=='dept')].name* SQL_CONSTANTS This is not a real table, but a table generated by the Gudu SQLFlow to store the constant used in the SQL query.1$.dbobjs[?(@.name=='emp')].name1$.dbobjs[?(@.name=='SQL_CONSTANTS')].name
3.2 Relation, JSON path: $.relations¶
Relation is the atom unit of the data lineage. Relation build a link between the source and target column( column-level lineage).
Those relations are stored in the $.relations.
A relation includes the type, target, sources and other attributes.
- type There are 2 types of relation between source and target column. a) Direct dataflow: represented by the
fddin the JSON output. b) In-direct dataflow: represented by thefdrin the JSON output. In-direct dataflow also known asimpacttype, which means the value of the source column doesn't affect the value of the target column, but effect the rows number of the target column. For instance, the relation between a source column in the where clause and the column in the select list is a In-direct relation(impact). - target This is the target column.
- sources These are source columns where the data of the target column comes from.
Return a relation which target column is dept_no
1 | |
3.3 Connect relations to form a dataflow¶

Let's say you want to trace back from DEPTSAL.dept_no to DEPT.deptno in the above diagram.
-
Discover the relation 1 using the following JSON path
* Discover the relation 2 using the following JSON path1$.relations[?(@.target.column=='dept_no')]1$.relations[?(@.target.column=='deptno')]
Summary¶
This is a brief introduction about the structure and elements in a data lineage result generated by the Gudu SQLFlow.
Please check the data lineage structure reference for more detailed information.