Skip to end of banner
Go to start of banner

Crop Production Estimates

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »

All crop production data are legitimately called “estimates,” including final corrections and revisions of early estimates. There is no better category of crop data information. FDW’s objective is to capture and archive the best and hopefully final estimates.   

Crop production data is extracted from seasonal and annual crop assessment reports and surveys produced by Source Organizations. The data is prepared and uploaded to FDW once or twice a year per country. This page describes the process for preparing the data for upload using spreadsheet templates.

Data preparation

Most crop production estimate uploads will use an existing country-specific template with several key tabs filled out (DATA, ADMIN, CROPS, SEASONS, NOTES).

This will help determine the types of metadata you will use to complete the steps described below. Inserting a term that does not match one of those choices will produce an error upon upload.

If, after reading the description of the steps below, you are uncertain how to proceed in any of them, check with FEWS NET Data Management staff.

Extracting source data

  1. Retrieve a copy of the country’s spreadsheet template, usually named <CountryName>AgStats. You will enter the new source crop data in the tab named DATA. There may already be data in the country template, which should provide concrete examples for all the steps that follow.

  2. Most of the fields in the country template provide key data that instruct FDW how to document and archive the data. A few fields are intended to inform you, the data enterer, about how to best interpret, document, and enter the data. These are described below.

  3. Review the source document(s) containing the new crop data to determine how you will extract, process, and add the source tabular data into the upload file format shown in the DATA tab in this Excel Crop Production Data workbook. Examine the structure and format of the tabular data in the source document.

    1. The fundamental variables that you want to extract are: reporting unit name, year, season, crop name, crop production system, area planted, area harvested, yield, and quantity.

    2. The fundamental units of measure should be “hectares” for area, “metric tons per hectare” for yield, and metric tons for quantity. If not, you should convert them in an editing process.

    3. If the structure of all source tables is the same, then you will need less error-checking than if the tables vary in structure or format. If the tables vary in structure, naming convention, composition of variables, or in other important manners, you will need to consider extra quality-control steps in your extraction and editing (see step 7).

  4. If the source document is in a .pdf format, you will need an application to convert the PDF tables into spreadsheet tables. The current preferred application is ABBYY FineReader 16. It contains a full-featured suite of PDF editing tools and uses optical-character recognition (OCR) to convert the PDF content.

    1. To convert the PDF tables, open the .pdf file in FineReader, select Save As, and choose a spreadsheet as the format.

      1. Tip: If the source PDF document contains a lot of text, and/or other topics in addition to the tabular data you want, consider deleting the unwanted pages.

  5. After extracting and converting the source document tabular data, check the quality of the saved spreadsheets. If the OCR has made a number of recognition errors, or has not correctly identified or structured all the tables you want, you can work with the OCR-editing application offered by FineReader to edit the output before selecting Recognize and Save As to complete the conversion.

  6. If the source data are offered in any of a variety of other online applications that allow you to define a CSV or spreadsheet output format, prioritize that choice for your download of source data.

  7. If the source document tables vary unpredictably in the way they are structured, or how they present the fundamental variables of crop production, it will be important to define a secondary process for error-checking and quality control steps immediately after the extraction and conversion. In many cases, this will consist of visually checking each extracted table to see if it accurately duplicates the source table.

Use all available tools and alternatives available to you to avoid manually copying and pasting the data into the FDW upload spreadsheet.

Working with templates

The order of columns in a crop data upload spreadsheet is not necessarily fixed, except for DNL (Do Not Load) columns which must always be located to the right of the columns intended for upload. However, it is helpful to follow the most common pattern described below to ensure consistent and complete documentation of the crop data to be uploaded.

Source_organization

The first column is typically Source_Organization. It correlates to the entity that gathers, organizes, and publishes the crop data. It is most often a Ministry of Agriculture, statistical bureau, or a multi-entity crop assessment mission; this latter term will most often indicate a UN-sponsored Crop and Food Supply Assessment Mission (CFSAM) organized by either, or both, the Food and Agriculture Organization (FAO) and the World Food Program (WFP).

The source_organization name is comprised of the entity’s name, followed by its acronym in parentheses, a comma, and the name of the country. If possible, use a source_organization name already found in the FDW Crop domain metadata for the country involved.

Source_document

The second column is Source_Document. It describes the nature of the activity or publication in which the crop data are found (note: this differs from other FDW domains).

A common entry here would be “Official crop statistics”. Others include “Agricultural Census,” “Annual crop assessment,” “Statistical Yearbook,” or “CFSAM.” As in the first column, this descriptor is followed by a comma and the name of the country. If possible, use a source_document name that is already found in the FDW Crop domain metadata for the country involved.

Publication

The third column is often referred to as Publication. It should provide enough specific details to be able to identify the actual document the crop data were found in.

Common entries found in this column might include “2017/2018 Crop Assessment Report”, or “2018 Statistical Yearbook”, or “2017/2018 Crop and Food Supply Assessment Mission FSAM to Sudan”. Again, the name is followed by a comma and the country name.

Survey type

The fourth column is usually titled Survey_type. For crop data, the most common entry here will be “crop_final.” This indicates that the data are complete for the period described.

If a country produces multiple harvest estimates during the cropping season, or habitually produces an initial estimate followed one year later by a final estimate, a different descriptive term, usually already found in the country’s crop domain metadata, may be entered here.

FEWS NET practice is generally not to capture intermediate estimates during a season, but if a country normally publishes two estimates per harvest, and the only publication that can be found is the first, FEWS would put an entry like “1st round estimate” in this field to document the status of the data entered.

Country

In the Country column, enter the country's two-letter code as it is identified on the FDW Country Codes tab.

Zone

The Zone column is only used to provide additional information about where the reporting unit is located in the country. It is not uploaded to the FDW.

FNID

The FNID column contains FNID codes that have already been assigned to every crop-reporting and/or administrative unit in the country. They change every year in which a country’s boundaries change. These ID codes should be selected from annual boundary sets which are found in the FDW Spatial domain. Consult with Hub personnel to receive a list of FNIDs for the year represented by the data.

Administration units

The Admin1 column contains the name of the first subnational crop reporting/administration unit, i.e., the first administrative unit after the national level.

After you have entered the names as they appear in the source files, the FEWS NET GIS Data Specialist will update the file with FNIDs and the official Admin names as defined in FEWS NET administration shapefile hierarchy.

The Admin2, Admin3, and Admin4 columns contain the name of the next subnational administration unit name in the country’s reporting hierarchy. If the source data report only at a higher Admin level (e.g. Admin1), this field, and all others, will be empty. If you have names for Admin2, you will also have a parent in the Admin1 field. Enter the names as they appear in the source file and continue similarly for Admin3 and Admin4 if there are such.

For all 4 administrative unit columns, in some countries these may be a hierarchical sub-set, or set of groupings of Admin units, expressly used for reporting crop statistics. If that is the case, what is required here are the crop reporting unit names, which will have previously been coded and assigned for that purpose, and annual boundary sets for them should be found in the Spatial domain.

Year

In the Data Upload column for Year, you will enter the agricultural year of the country's new crop data. The field is restricted to four digits in the FDW, so you will enter a four-digit (e.g., 1999) year descriptor in this column.

For countries whose annual agricultural year crosses over a December 31 date, identifying which year to insert here can be complex. Looking at the SEASON tab should help you understand which year to enter here, or it may be clear from the data already in the DATA tab.

If the source document identifies the year in a different format (e.g. 2017/18), enter the source document year as it is in a DNL_Source_year column at the right of the spreadsheet.

Additional dates

You may next encounter several fields regarding start and end periods of the season and you may also find multiple columns describing the crop development stage dates (“phenology”), which are relative to the specific crop data being entered on each row. These are entered automatically upon upload and are not your responsibility.

Look for the Season column, where you will enter the season name(s) of the crop data being uploaded. The SEASON tab in the workbook provides additional information regarding how seasons are named for this country, and any season name entered for new data should correspond to one of those.

Crop names and codes

Enter CPCv2 crop codes in the column/field labeled “CPCv2” or “Crop.” These have already been defined for each crop in the FDW, and existing crop names for this country can be found in the Crop Domain metadata. CPC codes can be found in FDW under Metadata management>Common>Classified products.

Source document crop names for every crop should be retained and placed in a column to the far right of the sheet generally named “DNL_Source_Crop.”

FEWS NET practice is to upload all crop data offered in the source document, except for some “fruit tree crops” (oranges, apples, pears, etc. Coffee, cocoa, bananas, and plantains are “tree” crops whose data is always uploaded) and commercially-grown “flowers.” If unsure about what to add, ask the field analyst or FEWS NET Database Manager.

If you find that existing FDW crop names and code metadata provide less information about the type of crop being reported now in the source document (e.g. existing crop data for this country includes a name/code of  “R01701AA” for common beans, while the new data describes the crop as “Red beans”, or “Vigna beans”), ask the field analyst or FEWS NET Database Manager which name/code should be entered.

All Crop names need to pre-exist and be coded in the FDW before being uploaded.

Crop production system

In the Crop domain upload template, the column/field sometimes labelled Dominant Livelihood Zone is supposed to instead be called the Crop Production System.

Crop production system descriptors entered here would include metadata such as “rainfed,” “irrigated,” “commercial,” small-farmer,” or, very frequently, “all.” All these terms should be already present in the FDW Crop Domain metadata for the specific country you are working on, and you should enter those if they fit. If not, contact the Data Management staff who will help in defining/creating new metadata descriptors.

Each entry in this column should be followed by a “(PS)” to indicate a production system descriptor.  

Crop estimate data

In the following columns/fields, you will enter crop estimate data provided in the source document.

Source documents generally distinguish between a zero value and “no data collected/no data available/no data reported”. Try to retain those differences as you move the data into the data upload sheet (“0” for a zero-value, “NA” for no crop grown, “NC” for no data collected/reported).

Enter the area planted in hectares (ha) in the Area planted column/field. The source document may refer to the area planted by other names, e.g., Area under [crop]. If the source file provides this data in a different unit (e.g. “acres”), make the conversion, but add a DNL column (“DNL_Source_area planted”) for the source unit data for information purposes. If this is the only area data you have, determine or ask the field whether the area estimate is ”planted” or “harvested”. 

As above, enter Area harvested in hectares (ha), if area harvested data are provided.

Enter the yield in MT/ha into the Yield column. Enter the actual yield figure given in the source document, unless: a)  you have to convert it first to MT/ha, b) there is an obvious error in the yield figure, or c) no yield data are given. In the latter two cases, compute yield as “quantity” divided by “area planted”.  

Enter the quantity produced in the Quantity column/field as Metric tons (MT) only, i.e., not bags, kgs, or sacs. If the source file quantity estimate is in a different unit, make the conversion, but add a DNL column for the source quantity estimate for information purposes.

Quality assurance tips

You should try to ensure the accuracy of your capture and annotation of source data. Spot-checking and comparing results is a basic method for identifying systematic errors.

Frequent errors include duplications of already entered crop data. This may result if the source document provides revisions of the previous year data, which was entered into the FDW last year. There are a variety of methods you should use to identify duplicates, as all duplicates will produce an error upon upload into the FDW.

Save the final file with the file name CountryCode_AgStats (e.g., El Salvador file name would be SV_AgStats).

  • No labels