Extract¶

The TriplyETL Extract step is the first step in any TriplyETL pipeline. It is indicated by the red arrow in the following diagram:

graph LR sources -- 1. Extract --> record record -- 2. Transform --> record record -- 3. Assert --> ld ld -- 4. Enrich --> ld ld -- 5. Validate --> ld ld -- 6. Publish --> destinations linkStyle 0 stroke:red,stroke-width:3px; destinations[("D. Destinations\n(TriplyDB)")] ld[C. Internal Store] record[B. Record] sources[A. Data Sources]

In the Extract step, one or more extractors are used to create a stream of records from a data source. The basic structure of every record in is the same: it does not matter which extractor or which source is used.

The following extractors are currently supported:

CSV or Comma-Separated Values
JSON or JavaScript Object Notation
OAI-PMH or Open Archives Initiative Protocol for Metadata Harvesting
Postgres for PostgreSQL Query & Postgres API Options
RDF for Resource Description Format
Shapefile for ESRI Shapefiles
TSV for Tab-Separated Values
XLSX for Microsoft Excel
XML for XML Markup Language

Next steps¶

The Extract step results in a stream of records that can be processed in the following steps:

Step 2. Transform: cleans, combines, and extends data in the[record.
Step 3. Assert: uses data from the record to make linked data assertions in the internal store.