Job source can be from one of the following sources:
upload file: The data lineage will be generated from your sql meta file(DDL file for an example). You can upload multiple sql files by compressing them into one zip file and upload the zip file. You can also upload the DDL file or the json result generated by Ingester in the zip file to resolve the ambiguities(Check here for more details about the ambiguity in the data lineage).
Note: The uploaded file must be less than 200MB.
from database: SQLFLow is able to directly read information from the database server and analyze your data lineage. Connection information is required in this mode.
upload file + database metadata: Use the above two methods to create the Job together. The database metadata will be used to resolve the ambiguities in the uploaded file (Check here for more details about the ambiguity in the data lineage). Queries in the database metadata will not be analyzed because the main purpose of using database metadata is eliminating the ambiguity in this scenario.
Note: The uploaded file must be less than 200MB
dbt: Read data lineage from your dbt. dbt is an ETL tool for data transformation. Data lineage can be retrieved from dbt data.
catalog.jsonwill be required. Both files can be found under the
targetfolder of your dbt tranformation project.
redshift log: Read from your redshift log. Amazon Redshift is a cloud data warehouse. You should give your redshift log and the file should be in
.zipformat. You may compress multiple
.gzfiles into one
.zipfile and folder nesting structure is supported in the
.zipfile. You can also give your database's
metadata.jsonas a supplementary source.
snowflake query history: Read from your snowflake query history. Snowflake is a SaaS platform database. We can get the data lineage from your snowflake query history. However, the query history continues to grow with the use of the database so it would be better to check your query history data regularly. You can use
enableQueryHistoryflag to choose werther fetch from the query history.
enableQueryHistory: Fetch SQL queries from the query history if set to
enableQueryHistoryis set to true, the interval at which the SQL query was extracted in the query History. Default is 30 mins.
queryHistorySqlType: Specify what's kind of SQL statements need to be sent to the SQLFlow for furhter processing after fetch the queries from the Snowflake query history. If
queryHistorySqlTypeis specified, will only pickup those SQL statement type and send it the SQLFlow for furhter processing. This can be useful when you only need to discover lineage from a specific type of SQL statements. If
queryHistorySqlTypeis empty, all queries fetched from the query history will be sent to the SQLFlow server.
excludedHistoryDbsSchemas: List of databases and schemas to exclude from extraction, separated by commas
duplicateQueryHistory: Whether filter out the duplicate query history.
snowflakeDefaultRole: The role when connecting the snowflake database. You must define a role that has access to the SNOWFLAKE database and assign WAREHOUSE permission to this role.