Skip to end of banner
Go to start of banner

Metadata Management

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Introduction

The Household Economy Analysis (HEA) Database serves as a pivotal resource in the assessment and understanding of household economies across various regions. Its creation and ongoing management enable a structured approach to data storage and presentation of the livelihood strategies and other characteristics of households within diverse livelihood zones.

One of the important opportunities presented by the HEA Database is its ability to standardize the language and classification used across the range of baseline surveys and the resulting Baseline Storage Sheets (BSS) for each country. This standardization plays a crucial role in enhancing the comparability and coherence of data, which, in turn, is instrumental in:

  • Ensuring Consistency: By adopting international standards and a uniform set of terminologies for metadata, the database ensures that data collected from different regions or studies can be compared and analyzed in a consistent manner. This consistency is fundamental for aggregating data at a national or regional level, thereby providing a comprehensive overview of livelihood strategies.

  • Facilitating Cross-Regional Analysis: Standardization of language and classification across BSS allows for the direct comparison of data across different countries and regions. This comparative analysis is essential for identifying patterns, trends, and anomalies in household economies.

  • Improving Data Quality and Reliability: Standardized data storage and classification methodologies contribute to the overall quality and reliability of the data stored in the HEA Database. This reliability is critical for stakeholders, researchers, and policymakers who depend on accurate and comparable data to make informed decisions.

  • Enhancing Accessibility: A standardized language and classification system make the database more accessible to a broader audience, including international agencies, non-governmental organizations (NGOs), and researchers unfamiliar with local terminologies. This accessibility is pivotal for collaborative efforts and knowledge sharing on a global scale.

  • Enabling Data Integration: The use of global metadata standards, such as ISO 3166 for countries, ISO 4217 for currencies, and the Central Product Classification (CPC) v2.1 for products, allow current data for items such as market prices and crop year forecasts to be compared to the data for the reference year, easing the use of HEA data in outcome analysis.

Approach

To effectively manage metadata within the HEA Database, the following practices are implemented:

  • Data Entry and Validation: All metadata entries are subject to validation against the respective standards before inclusion in the database. This ensures accuracy and consistency of the data.

  • Data Ownership: Each metadata item has an Owner and designated Editors. The Owner is ultimately responsible for data quality for that metadata item. This approach ensures that someone has a full picture of the individual metadata items and can avoid duplicates, etc.

  • Regular Updates: Metadata standards, especially wealth characteristics, are periodically reviewed and updated to reflect changes in economic practices and data classification standards.

  • Training and Documentation: Personnel involved in metadata management are provided with training and detailed documentation on metadata standards.

  • Quality Assurance: Regular audits of the metadata are conducted to ensure adherence to standards and to identify and correct any inconsistencies.

Metadata Reconciliation

The BSS spreadsheets contain many similar items that use slightly different terminology. We want to eliminate these duplicates as we load the BSSs, to ensure that they are comparable across countries and regions.

Metadata Reconciliation Tables

To enable this we have two additional metadata items: Wealth Characteristic Label and Activity Label. These items are used to translate the items in Column A of the ‘WB’ and ‘Data’ worksheets respectively, converting them into specific deduplicated items across many different metadata types.

These two tables are used by the data ingestion pipelines to recognize the  data in the BSS and convert it to a standard form. Completing the review of these metadata items is critical to being able to automatically recognize the data in the BSS and automate the ingestion of it.

These metadata items contain some standard columns:

  • label: the exact text from Column A of the “WB” or “Data” worksheet

  • langs: the primary language(s) detected for the BSS where the label is used 

  • datapoint_count_sum: The number of data points (i.e. non-blank cells) that use that label

  • unique_filename_count: The number of BSS that contain data with this label

  • min_row_number: The smallest row number in any BSS that contains this label 

  • max_row_number: The largest row number in any BSS that contains this label

  • filename_for_min_row: The name of the BSS where the label has the smallest row number

  • filename_for_max_row: The name of the BSS where the label has the largest row number

  • status: Indicates whether the label has been reviewed and the correct lookups have been assigned and the record is ready to be used. 

The Wealth Characteristic Label worksheet contains the following additional columns:

  • wealth_characteristic_id: the name of the standard WealthGroupCharacteristic that this label represents. For example, Donkey number owned matches the number owned WealthGroupCharacteristic, and Land area cultivated (hectares) matches the land area cultivated  WealthGroupCharacteristic.

  • product_name: the common name of the standard Product that this label represents, if appropriate. For example, Donkey number owned matches the L02132: Donkeys Product. 

  • unit_of_measure_id: the name of the standard Unit Of Measure that this label represents. For example, Land area cultivated (hectares) matches the ha: Hectare Unit Of Measure.

The Activity Label worksheet contains the following additional columns:

  • is_start: Indicates whether this label indicates the start of a new Livelihood Strategy. In the BSS these cells often have a light green background. For example Sorghum Deyr: kg produced or any other crop name followed by kg produced indicates the start of a new Activity.

  • strategy_type: the name of the Livelihood Strategy subtype, such as MilkProduction, or OtherCashIncome.

  • attribute: the standard name for this attribute that matches the name of the field in the data model for the Livelihood Activity that will store the data from this row. For example Sorghum Deyr: kg produced maps to the quantity_produced field.

  • product_name: the common name of the standard Product that this label represents, if appropriate. For example, Donkey number owned matches the L02132: Donkeys Product. 

  • unit_of_measure_id: the name of the standard Unit Of Measure that this label represents. For example, Land area cultivated (hectares) matches the ha: Hectare Unit Of Measure.

  • currency_id: the iso4217a3 code for the currency if one is specified in the label. This field is not often used.

  • season: the name or alias of the season, if one is specified in the label.

  • additional_identifier: an additional identifier required to distinguish between Livelihood Strategies. For example, there may be two Strategies for growing maize, with labels Maize rainfed: kg produced and maize irrigated: kg produced. In this case the additional identifiers will be rainfed and irrigated, the Strategy Type will be CropProduction , the Unit Of Measure will be kg: Kilogram and the product will be R01122: Maize/corn grain

Metadata Reconciliation Process

  • Find the row in the ActivityLabel or WealthCharacteristicLabel worksheet in the Reference Data spreadsheet.

  • Open the files listed in the  filename_for_min_row and filename_for_max_row columns so that you can see the label in context to decide whether it is the start of a new Livelihood Activity, etc.

  • Fill in the attributes in the Reference Data spreadheet. Most attributes are constrained by validation to selecting from a pre-approved list of values. When you find the need for a Product, for example, that isn’t already available then reach out to the Hub and they add the name to the list of available products after checking the appropriate code, etc.  

  • Mark the metadata row as Complete.

  • No labels