On this page:
Extract¶
The TriplyETL Extract step is the first step in any TriplyETL pipeline. It is indicated by the red arrow in the following diagram:
graph LR
sources -- 1. Extract --> record
record -- 2. Transform --> record
record -- 3. Assert --> ld
ld -- 4. Enrich --> ld
ld -- 5. Validate --> ld
ld -- 6. Publish --> destinations
linkStyle 0 stroke:red,stroke-width:3px;
destinations[("D. Destinations\n(TriplyDB)")]
ld[C. Internal Store]
record[B. Record]
sources[A. Data Sources]
In the Extract step, one or more extractors are used to create a stream of records from a data source. The basic structure of every record in is the same: it does not matter which extractor or which source is used.
The following extractors are currently supported:
- CSV or Comma-Separated Values
- JSON or JavaScript Object Notation
- OAI-PMH or Open Archives Initiative Protocol for Metadata Harvesting
- Postgres for PostgreSQL Query & Postgres API Options
- RDF for Resource Description Format
- Shapefile for ESRI Shapefiles
- TSV for Tab-Separated Values
- XLSX for Microsoft Excel
- XML for XML Markup Language
Next steps¶
The Extract step results in a stream of records that can be processed in the following steps: