Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Source

Description

Notes

Armed Conflict Location and Event Data (ACLED)

Conflict data from ACLED’s API.

Reporting delays and corrections to past data are common due to challenges gathering conflict data. Therefore, ACLED data in FDW for a given period is subject to change.  

Multiple data endpoints (events) can be attributed to the same period. Events are given a value of 0 when there was a conflict without fatalities. Values >0 indicate the number of fatalities.

Food and Agriculture Organization (FAO)

Price ingestion pipeline from FAO’s web API.

Monthly and weekly prices data are ingested to multiple Data Source Documents.

International Monetary Fund (IMF) Price Ingestion

Price, CPI, Labor, and GDP data ingestion pipeline from IMF's API.

Data is included in FDW under Secondary price index, Semi-Structured Data Series: Labor Statistics, and Semi-Structured Data Series: Economic Statistics.

Expand
titleDeprecated external data ingestion pipelines

The following data ingestion pipelines are no longer in use:

  • FARMERS, Sudan

...

Typically, there are few initial matches between the a remote API and FDW.

Metadata must be set up in FDW such that the incoming metadata from the remote API is recognized correctly. This includes creating new metadata items as well as aliases for existing metadata items.

...

Each row in the spreadsheet represents one of three situations:

  1. A remote data series that has been matched to an FDW data series: This probably indicates a successful match, but it may also represent an accidental match where FDW has recognized the wrong remote data series.

  2. An FDW Data Series without a matching remote Data Series: This indicates a metadata mismatch, and we need to find the remote data series that we expected to match and identify and correct the unrecognized metadata.

  3. A remote Data Series without a matching FDW Data Series: This might indicate a metadata mismatch, or it might be a remote data series that we do not want to capture in FDW.

...

  1. Research the API: Investigate the API, including acquiring the credentials if necessary, and document the authentication required, the different endpoints and the available filters and formats. This step may require interaction with the developers of the remote API and some level of trial and error if support and/or documentation is not available. The output from the ticket is typically a Jupyter Notebook that demonstrates how to access the API and download the data.

  2. Develop a Data Ingestion Pipeline data ingestion pipeline for the API: Write a new Luigi pipeline within FDW to download the necessary data and perform API-specific transformations required to prepare the data for ingestion to FDW,  and then use the transformed data as input to a generic data normalization, validation and ingestion tasks. The output from the ticket is a complete pipeline with associated unit tests merged into the FDW software and released to the FDW production environment.

  3. Support for enabling the API in FDW production: Use API-specific guidance to determine the required metadata and Data Series in FDW. Typically, we have no control over the content of the remote API, and so we must set up FDW appropriately to recognize the Data Series we want to capture. The ingestion pipeline produces an API metadata matches spreadsheet to help with this process, which reports the data series available from the remote API, and how the metadata matches to metadata available in FDW, including the Data Series defined for the relevant Data Source Document(s).