Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Retrieve a copy of the country’s spreadsheet template, usually named <CountryName>AgStats. You will enter the new source crop data in the tab named DATA. There may already be data in the country template, which should provide concrete examples for all the steps that follow.

  2. Most of the fields in the country template provide key data that instruct FDW how to document and archive the data. A few fields are intended to inform you, the data enterer, about how to best interpret, document, and enter the data. These The template fields are described below.

  3. Review the source document(s) containing the new crop data to determine how you will extract, process, and add the source tabular data into the upload file format shown in the DATA tab in this Excel Crop Production Data workbook. Examine the structure and format of the tabular data in the source document.

    1. The fundamental variables that you want to extract are: reporting unit name, year, season, crop name, crop production system, area planted, area harvested, yield, and quantity.

    2. The fundamental units of measure should be “hectares” for area, “metric tons per hectare” for yield, and metric tons for quantity. If not, you should convert them in an editing process.

    3. If the structure of all source tables is the same, then you will need less error-checking than if the tables vary in structure or format. If the tables vary in structure, naming convention, composition of variables, or in other important manners, you will need to consider extra quality-control steps in your extraction and editing (see step 7).

  4. If the source document is in a .pdf format, you will need an application to convert the PDF tables into spreadsheet tables. The current preferred application is ABBYY FineReader 16. It contains a full-featured suite of PDF editing tools and uses optical-character recognition (OCR) to convert the PDF content.

    1. To convert the PDF tables, open the .pdf file in FineReader, select Save As, and choose a spreadsheet as the format.

      1. Tip: If the source PDF document contains a lot of text, and/or other topics in addition to the tabular data you want, consider deleting the unwanted pages.

  5. After extracting and converting the source document tabular data, check the quality of the saved spreadsheets. If the OCR has made a number of recognition errors, or has not correctly identified or structured all the tables you want, you can work with the OCR-editing application offered by FineReader to edit the output before selecting Recognize and Save As to complete the conversion.

  6. If the source data are offered in any of a variety of other online applications that allow you to define a CSV or spreadsheet output format, prioritize that choice for your download of source data.

  7. If the source document tables vary unpredictably in the way they are structured, or how they present the fundamental variables of crop production, it will be important to define a secondary process for error-checking and quality control steps immediately after the extraction and conversion. In many cases, this will consist of visually checking each extracted table to see if it accurately duplicates the source table.

...