Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Source

Description

Notes

Armed Conflict Location and Event Data (ACLED)

Conflict data from ACLED’s API.

Reporting delays and corrections to past data are common due to challenges gathering conflict data. Therefore, ACLED data in FDW for a given period is subject to change.  

Multiple data endpoints (events) can be attributed to the same period. Events are given a value of 0 when there was a conflict without fatalities. Values >0 indicate the number of fatalities.

ComTrade

UN commodity price data, as reported from national customs organizations.

Food and Agriculture Organization (FAO)

Price ingestion pipeline from FAO’s web API.

Monthly and weekly prices data are ingested to multiple Data Source Documents.

International Monetary Fund (IMF) Price Ingestion

Price, CPI, Labor, and GDP data ingestion pipeline from IMF's API.

Data is included in FDW under Secondary price index, Semi-Structured Data Series: Labor Statistics, and Semi-Structured Data Series: Economic Statistics.

Expand
titleDeprecated external data ingestion pipelines

The following data ingestion pipelines are no longer in use:

  • FARMERS, Sudan

...

  • Chad Weekly Market Prices

  • DRC Weekly Exchange Rates

  • DRC Weekly Market Prices

  • Ethiopia Weekly Market Prices

  • Nigeria Weekly Exchange Rates

  • Nigeria Weekly Livestock Prices

  • Nigeria Weekly Market Prices

  • South Sudan Weekly Market Prices

  • Zimbabwe Weekly Exchange Rates Open Market

  • Zimbabwe Weekly Exchange Rates Supermarket

  • Zimbabwe Weekly Market Prices Open Market

  • Zimbabwe Weekly Market Prices Service Station

  • Zimbabwe Weekly Market Prices Supermarket

User permission pipeline

...

Configuring Data Series

Typically, there are few initial matches between the a remote API and FDW.

The data managers must use the various capabilities of FDW to Metadata must be set up the in FDW metadata so such that the incoming metadata from the remote API is recognized correctly. This includes creating new metadata items as well as aliases for existing metadata items.

...

Each row in the spreadsheet represents one of three situations:

  1. A remote data series that has been matched to an FDW data series: This probably indicates a successful match, but it may also represent an accidental match where FDW has recognized the wrong remote data series.

  2. An FDW Data Series without a matching remote Data Series: This indicates a metadata mismatch, and we need to find the remote data series that we expected to match and identify and correct the unrecognized metadata.

  3. A remote Data Series without a matching FDW Data Series: This might indicate a metadata mismatch, or it might be a remote data series that we do not want to capture in FDW.

...

Support for additional APIs and websites should be requested through the Hub’s sprint process for the Data Platform. That process can be initiated through a Helpdesk ticket or the monthly Data Stakeholders meeting.

Depending on the complexity of the remote API/website and the documentation and support available for it, implementing support takes 2 to 3 months. The implementation process involves 3 steps, typically tracked through 3 separate but dependent Jira tickets implemented in consecutive sprints:

  1. Research the API: Investigate the API, including acquiring the credentials if necessary, and document the authentication required, the different endpoints and the available filters and formats. This step may require interaction with the developers of the remote API and some level of trial and error if support and/or documentation is not available. The output from the ticket is typically a Jupyter Notebook that demonstrates how to access the API and download the data.

  2. Develop a Data Ingestion Pipeline data ingestion pipeline for the API: Write a new Luigi pipeline within FDW to download the necessary data and perform API-specific transformations required to prepare the data for ingestion to FDW,  and then use the transformed data as input to a generic data normalization, validation and ingestion tasks. The output from the ticket is a complete pipeline with associated unit tests merged into the FDW software and released to the FDW production environment.

  3. Support for enabling the API in FDW production: Provide Use API-specific guidance to the Data Owner to help them set up determine the required metadata and Data Series in FDW. Typically, we have no control over the content of the remote API, and so we must set up FDW appropriately to recognize the Data Series we want to capture. The ingestion pipeline produces an API metadata matches spreadsheet to help with this process, which reports the data series available from the remote API, and how the metadata matches to metadata available in FDW, including the Data Series defined for the relevant Data Source Document(s).

Support

...

  1. .