R packages for Agricultural Research

Finding the R packages that support your research

Map of Australia as R package hex stickers; image credit: Mitchell O’Hara-Wild

Last updated: October 14, 2022

Update: *This is now a CRAN Task View. Please check there for the most updated information.

Agriculture encompasses a broad breadth of disciplines. Many, many package in base R and contributed packages are useful to agricultural researchers. For that reason, this is not exhaustive list of packages useful to agricultural research. This is intended to cover major packages that in most cases, have been developed to support agricultural research and analytical needs.

Note that some of these packages are on CRAN and others are on GitHub, Bioconductor, or R-forge.

If you think that a package is missing from this list, please let us know by filing an issue in the GitHub repository.

Packages with general applications

Agricultural & land use databases

Data from the United States Department of Agriculture’s National Agricultural Statistical Service ‘Quick Stats’ web API can be accessed with rnassqs or with tidyUSDA, which also offers some mapping capabilities. The USDA’s Cropland Data Layer API can be accessed with CropScapeR and cdlTools provides various utility functions for processing CDL data. The package rusda provides an interface to access the USDA-ARS Systematic Mycology and Microbiology Laboratory (SMML)’s four databases: Fungus-Host Distributions, Specimens, Literature and the Nomenclature database. USDA’s Agricultural Resource Management Survey (ARMS) data API can be accessed with rarms. The USDA’s Livestock Mandatory Reporting data API can be accessed with usdampr. FAOSTAT and faobulk can be used to access data from the FAOSTAT Database of the Food and Agricultural Organization (FAO) of the United Nations. NASA soil moisture active-passive (SMAP) data can be accessed and processed by smapr.

FedData provides access to geospatial data from the United States Soil Survey Geographic (SSURGO) database, the Global Historical Climatology Network (GHCN), the Daymet gridded estimates of daily weather parameters for North America, the International Tree Ring Data Bank, and the National Land Cover Database. SSURGO data can also be accessed and processed with XPolaris. Most USDA-NRCS soils related databases and APIs can be accessed with soilDB. SISINTAR provides access to SiSINTA (Sistema de información de Suelos del INTA), a soil profile database for Argentina, and functions for processing the data. SILO weather data from the Queensland DES longpaddock website can be accessed with cropgrowdays. febr has utilities to access and process data from the Brazilian Soil Data Repository.

PGRdup provides functions to aid the identification of probable/possible duplicates in Plant Genetic Resources (PGR) collections. rfieldclimate provides functionality and parsers to interact with the FieldClimate API.

Agricultural data sets

Agridat consists of a very large collection of agricultural data sets and example analyses; the package contains a vignette detailing additional data sets and extensive resources to support agricultural analysis. agritutorial provides a collection of agricultural data sets and analysis with particular attention to crop experiments. On GitHub, the repository agroBioData houses a collection of data sets supporting agriculture and applied biology (note that this is a collection of CSV files, not a package). The soybean nested associated mapping population data set can be accessed via SoyNAM. simplePhenotypes can be used for simulating pleiotropic, linked and epistatic phenotypes. USGS county data on fertilizer sales can be accessed with ggfertilizer. The FAOSTAT data set collection on the Food and Agriculture Biomass Input–Output model (FABIO) is available through fabio and described more in detail in Bruckner (2019). The R-forge subversion repository ‘cropcc’ hosts several R packages with climate change/cropping data set. Additionally, many of the agriculture-focused packages listed in this guide also include data sets to illustrate their functionality (e.g. agricolae, AgroTech, ZeBook). Annual agriculture production data from the Peruvian Integrated System of Agricultural Statistics (SIEA) covering 2004 to 2014 can be accessed with cropdatape

General analytical packages supporting agricultural research

The packages nlraa and AgroReg provides general linear and nonlinear regression functions specifically for agricultural applications. agriCensData is a flexible package for working with censored data (e.g. time to flowering, instrumentation values below the detection limit, disease scoring). The package biotools can conduct a wide array of multivariate analysis for agronomists including genetic covariance, optimal plot size, tests for spatial dependence, and tests for seed lot heterogeneity. grapesAgri1 houses a collection of shiny apps, GRAPES (General R-shiny based Analysis Platform Empowered by Statistics), that works as a graphical user interface for individuals to upload data files and analyse. Linear models and ANOVA for CRD and RCBD (2-way) model, correlation analysis, exploratory data analysis and other common hypothesis tests are supported.

ALUES implements methodology developed by the FAO and the International Rice Research Institute for evaluating land suitability for different crop production. AGPRIS (AGricultural PRoductivity in Space) provides functions for different spatial analyses in implemented in INLA and other spatial approaches. The package KenSyn has example data sets and analytical code supporting the book De L’analyse des Réseaux Expérimentaux à la Méta-analyse (French) or From Experimental Network to Meta-analysis (English).

Agrotech provides functionality for making chemical application calculations and example data sets.

Discipline-specific packages

Agricultural economics

The task views for Econometrics, Empirical Finance, and TimeSeries provide information on packages and tools relevant to agriculture economics.

Several packages have been developed specifically for agricultural price forecasting. vmdTDNN forecasts univariate time series data using variational mode decomposition based time delay neural network models as described by Dragomiretskiy 2014. stlELM also conducts univariate time series forecasting univariate time series, using seasonal-trend decomposition procedures based on loess (STL) combined with the extreme learning machine developed by Xiong et al 2018. eemdTDNN does the same, utilizing different decomposition based time delay neural network models. For method details, see Yu et al 2008.

Agrometeorology

The Hydrology CRAN Task View has many resources for accessing and processing weather and climate data. Meteor provides a set of functions for weather and climate data manipulation to support crop and crop disease modeling. Data from the Copernicus data set of agrometeorological indicators can be downloaded and extracted using ag5Tools. The frost package contains a compilation of empirical methods used by farmers and agronomic engineers to predict the minimum temperature to detect a frost event. agroclim contains functions to compute agroclimatic indices useful to zoning areas based on climatic variables and to evaluate the importance of temperature and precipitation for individual crops or in general for agricultural lands. cropgrowdays can be used for calculating growing degree days, cumulative rainfall, number of stress day, mean radiation, evapotranspiration and other variables. It also can be used to access SILO weather data from the Queensland DES longpaddock website. Climate crop zones in Brazil can be accessed and calculated with cropZoning using data sets from TerraClimate that are calibrated to weather stations run by the National Meteorological Institute of Brazil. The package acdcR (Agroclimatic Data by County) provides functions to calculate widely-used county-level variables in agricultural production or agroclimatic and weather analyses. LWFBrook90R provides an implementation of the soil vegetation atmosphere transport (SVAT) model LWF-BROOK90 to calculate daily evaporation (transpiration, interception, and soil evaporation) and soil water fluxes, along with soil water contents and soil water tension of a soil profile covered with vegetation. Leaf area index and soil moisture from microwave backscattering data based on the WCM model can be calculated with the WCM package.

Agronomic trials

Experimental design

The package agricolae provides extensive resources for the planning and analysis of planned field experiments. Designs constructed by agricolae can be visualised with agricolaeplotr. The CRAN task for ExperimentalDesign provide additional information on experimental design for a wide variety of research problems. desplot is for plotting maps of agricultural trials laid out in grids. DiGGer was developed for rectangular field trials; its purpose is to help users determine the optimal experimental design based on the treatment structure and number of replicates. inti provides functionality for experimental design and manipulation and it is focused on FieldBook compatibility.

High throughput phenotyping (HTP)

statgenHTP is for analyzing data from HTP platform experiments, with some functions specifically designed to work with the proprietary software asreml. CropDetectR can be used to identify crop rows from image data. FWRGB can process plant images for downstream machine learning models to predict fresh biomass. pliman provides tools for image manipulation to quantify plant leaf area, disease severity, number of disease lesions, and obtain statistics of image objects such as grains, pods, pollen, leaves, and more.

Trial analysis

The package agricolae contains functions for analyzing many common designs in agriculture trials such as split plot, lattice, Latin square and some additional functions such AMMI and AUDPC calculations. Trials utilizing an incomplete block design can be analylsed used ispd. statgenSTA has functions for single trial analysis with and without spatial components. The proprietary software asreml provides an R version of their mixed model fitting functions for field trial analysis (note this is not open source and also requires an annual license). CRAN also contains an add-on package asremlPlus that provides several accessory functions to asreml. INLA provides tools for Bayesian inference of latent Gaussian models. It contains functions for modelling spatial variation, such as field experiments or farm locations. The gosset package provides the toolkit for a workflow to analyse experimental agriculture data, from data synthesis to model selection and visualisation. AgroR has general functions and a shiny app for analysis of common designs in agriculture: CRD, RCBD and Latin square.

SpATS can be used to adjust for field spatial variation using p-splines. A localised method of spatial adjustment for unreplicated trials, moving grid adjustment, is implemented with mvngGrAd. ClimMobTools is the API Client for the ClimMob citizen science platform in R for agronomic trials.

Animal science

usdampr provides access to the USDA’s Livestock Mandatory Reporting API. Many of the genetic packages described in this resource can also be applied to animals.

Breeding & quantitative genetics

See the R package repository Bionconductor for bioinformatics tools to support the processing of high-throughput genomic data.

lmDiallel provides service functions for analysing data sets obtained from diallel experiments, as described in Onofri 2020. plantbreeding (available on R-forge: install.packages("plantbreeding", repos="http://R-Forge.R-project.org")) provides many convenience functions for working with populations and designs common in plant breeding including dialleles, line testers, augmented trials, the Carolina design, and more. st4gi and variability provides several common utility functions for genetic improvement of crops. Also, please see the subsection on “genotype-by-environment interactions” for packages integrating environmental and genomic data in an analytical framework. gpbStat provides functions for common plant breeding analyses including line-by-tester analysis (Arunachalam 1974) and diallel analysis (Griffing, 1956). heritability implements marker-based estimation of heritability when observations on genetically identical replicates are available. selection.index calculates a selection index using the Smith (1973) method.

AlphaSimR is an implementation of the AlphaSim algorithm in R, providing functions for stochastic modelling of processes common to breeding programs such as selection and crossing. MoBPS has a suite of functions for simulating genetic gain and economic costs in a plant breeding program. isqg provides functions for high performance quantitative genetic simulations using a bitset-based algorithm.

Linkage mapping & QTL analysis

There are two notable and long-standing packages: (1) onemap, providing MapMaker/EXP like performance and extended functionality, and (2) qtl providing standard functionality for qtl mapping and accessory functions for simulating crosses. ASMap is for fast linkage mapping with the algorithm ‘MSTmap’. MapRtools is another linkage mapping package. Linkage maps can be visualized with LinkageMapView. For polyploids, the packages mappoly and polymapR can be used for linkage mapping and the packages qtlpoly and polyqtlR can be used for qtl estimation. diaQTL is for QTL and haplotype analysis of diallel populations (diploid and autotetraploid). statgenMPP can conduct QTL mapping in multi-parent populations.

GWAS

Genome-wide association study analysis can be conducted with statgenGWAS. GWAS models across very large number of SNPs or observations can be estimated with rMVP and megaLMM. Functions for autotetraploids are provided by GWASpoly, and these functions also work in diploid species. StageWise provides functions to conduct a 2-stage GWAS when the underlying phenotypic data are from multiple field trials. Variable selection for ultra-large dimensional GWAS data sets can be done with bravo, which implements a Bayesian algorithm, selection of variables with embedded screening SVEN. statgenIBD can calculate IBD probabilities for biparental, three and four-way crosses. For polyploids, polyBreedR provides convenience functions to facilitate the use of genome-wide markers for breeding autotetraploid species, and its functionality also extends to diploids.

Genomic prediction

Packages supporting genetic prediction using mixed models augmented with pedigree or genetic marker data include sommer, rrBLUP, BGLR, lme4gs, lme4qtl, pedigreemm, qgtools, cpgen, QTLrel, and the licensed software asreml. Many of these packages have built-in functionality for data preparation steps including data imputation and calculation of the relationship matrices. breedR is a general purpose package for performing quantitative genetic analyses. Genome feature mixed linear models using frequentist and Bayesian approaches can be implemented with qgg. pedmod provides linear modelling functions integrating kinship for categorical outcomes.

STGS implements several genomic selection models for single traits. GSelection implements genomic selection integrating additive and non-additive models. GSMX, multivariate genomic selection, estimates trait heritability and handles overfitting through cross validation. TSDFGS can estimate the optimal training population size and composition for genomic selection. BGGE conducts genomic prediction for continuous variables, focused on genotype-by-environment genomic selection models following the methods of Jarquín 2014. BMTME builds genomic selection prediction models that an be expanded to multiple traits and environments using Bayesian models developed by Montesinos-Lopéz (2016, 2018a, 2018b). BWGS, “Breed Wheat Genomic Selection”, provides a pipeline of functions for conducting genomic selection in hexaploid wheat.

AGHmatrix provides extensive options for calculating pedigree and genomic (additive and dominance) relationships. The pedigree packages provides functionality for ordering pedigrees, calculating and inverting the A matrix and other related tasks.

Crop growth models & crop modelling

The apsimx package has functions to read, inspect, edit and run files for APSIM “Next Generation” (json) and APSIM “Classic” (xml). Files with an .apsim extension correspond to APSIM Classic, the files with an .apsimx extension correspond to APSIM Next Generation. rapsimng works with Next Generation APSIM files. DSSAT provides a comprehensive R interface to the Decision Support System for Agrotechnology Transfer Cropping Systems Model (DSSAT-CSM) documented by Jones (2003). This package provides cross-platform functions to read and write input files, run DSSAT-CSM, and read output files. The modelling framework Simplace (Scientific Impact assessment and Modelling PLatform for Advanced Crop and Ecosystem management) can be accessed using the R package simplace.

Meteor provides a set of functions for weather and climate data manipulation to support crop and crop disease modeling. cropDemand can be used to estimate crop water demand in Brazilian production regions using the TerraClimate data set. Evapotranspiration can estimate potential and actual evapotranspiration using 21 different models. metrica has many convenience functions for comparing model predictions with ground truth data. ZeBook provides data sets and examples to accompany the book Working with Dynamic Crop Models.

phenorice is an R implementation of the phenorice model for remote sensing of rice crop production. phenoriceR provides helper functions for processing data from the phenorice model. Recocrop estimates environmental suitability for plants using a limiting factor approach for plant growth following Hackett (1991). Rquefts provides an implementation of the QUEFTS (Quantitative Evaluation of the Native Fertility of Tropical Soils) model (Janssen 1990). Rwofost is an implementation of the WOFOST (“World Food Studies”) crop growth model (De Wit 2019).

Entomology

hnp Generates half-normal plots with simulation envelopes using different diagnostics from a range of different fitted models. A few example data sets are included. The package agriCensData provides functions for dealing with censored data. In addition, the survival CRAN Task View list CRAN resources for working with censored data.

Food science

For packages supporting sensory studies, see the Psychometrics CRAN task view. NutrienTrackeR provides convenience functions for calculating nutrient content (macronutrients and micronutrients) of foods using food composition data from several reference databases, including: ‘USDA’ (United States), ‘CIQUAL’ (France), ‘BEDCA’ (Spain) and ‘CNF’ (Canada).

Genotype-by-environment interactions

statgenGxE has several functions for handling various analytical approaches for addressing genotype-by-environment interactions. IBCF.MTME implements item-based collaborative filtering for multi-trait and multi-environment trials. The package gge is useful for producing GGE biplots, while bayesammi can conduct Bayesian estimation of additive main effects multiplicative interaction model. EnvRtype can be used for assembling climate data, data set preparation and environmental classification. FW implements Finlay-Wilkinson regression using a Gibbs sampler; spFW also conducts spatial Finlay-Wilkinson analysis for multi-environmental trials using a Bayesian hierarchical model. A wide variety of stability analysis statistics can be calculated via agrostab including coefficient of homeostaticity, specific adaptive ability, weighted homeostaticity index, superiority measure, regression on environmental index, Tai’s stability parameters, stability variance, ecovalence and other stability parameters.

Plant pathology

epifitter provides functions for analysis and visualization of plant disease progress curve data. epiphy is a toolbox for analyzing plant disease epidemics. It provides a common framework for plant disease intensity data recorded over time and/or space. hagis has functions for analysis of plant pathogen pathotype survey data. Functions provided calculate distribution of susceptibilities, distribution of complexities with statistics, pathotype frequency distribution, as well as diversity indices for pathotypes. Populations with mixed clonal/sexual reproductive strategies can be analyzed with poppr, which has population genetic analysis tools for hierarchical analysis of partially clonal populations. ascotraceR can simulate an Ascochyta blight infection in a chickpea field following the model developed by Diggle (2022)). Stochastic disease modelling of plant pathogens incorporating spatial and genetic information can be done with landsepi. Evolution of resistance genes under pesticide pressure can be simulated under different numbers of pests, modes of pest reproduction, resistance loci, number of pesticides and other facets with resevol.

Rural sociology

See the CRAN task view for Psychometrics for general sociology packages. . Both the Survival CRAN task view and the agriCensData package provide tools for working with interval and censored data.

Soil science and precision agriculture

sharpshootR contains a compendium of utility functions supporting soils survey work including data management, summary, visualizations and conversions.For soil pedology, aqp provides a general toolkit for soil scientists: specialized data structures, soil profile summary, visualisation, color conversion, and more. SoilTaxonomy provides functions for parsing soil taxonomic terms. The “Spatial and Spatio-Temporal CRAN task views provide extensive resources in spatial statistics. pedometrics has many utility functions for common analyses of soil data.

Soil water retention curves can be calculated by the soilwater packages using the Van Genuchten method for soil water retention and Mualem method for hydraulic conductivity. SoilR models soil organic matter decomposition in terrestrial ecosystems with linear and nonlinear models. Soil texture triangles can be graphed using soiltexture; this package can also classify and transform soil texture data. sorcering can be used to model soil organic carbon and soil organic nitrogen and to calculate N mineralisation rates. QI can be used to calculate potassium intensity and exchangeability. soiltestcorr has functions for conducting correlation analysis between soil test values and crop yield data. SoilTesting provides functions for calculating soil mineral concentrations from analytical lab results. fertplan provides fertilizer recommendations based on soil test results (note this packages is optimized for horticultural crop production in Italy). DMMF implements the daily based Morgan-Morgan-Finney (DMMF) soil erosion model (Choi et al., 2017) for estimating surface runoff and sediment budgets from a field or a catchment on a daily basis. Estimation and prediction of parameters of soil hydraulic property models can be accomplished with spsh.

Agriculture image features from spectral data can extracted with agrifeature. It has functions to calculate gray level co-occurrence matrix (GLCM), RGB-based vegetative index (RGB VI) and normalized difference vegetation index (NDVI). Experimental units (e.g. plots) can be obtained from spectral images using rPAex. mapsRinteractive provides functions for working with soil point data in raster format. The suitability of specific soils for crop production can be analyzed using soilassessment, including soil fertility classes, soil erosion models and soil salinity classification. Suitability requirements are for crops grouped into cereal crops, nuts, legumes, fruits, vegetables, industrial crops, and root crops. mpspline2 implements a mass-preserving spline to soil attributes to make continuous down-profile estimates of attributes measured over discrete, often discontinuous depth intervals.

Weed science

The package drc offers versatile model fitting and after-fitting functions for dose-response curves. LW1949 implements the Litchefield and Wilcoxon (1949) dose-response analysis. PROSPER is a package for simulating weed population dynamics at the individual and population level under a range of conditions including herbicide resistance and herbicide pressure. For ecological studies and analytical applications, the CRAN task view for Environmetrics provides a list of existing R resources in this topic.

  • InkaVerse: a collection of shiny apps and an R package (inti) to support field trials analyses for trials managed in FieldBook.

Relevant R-Forge Projects (not otherwise listed in this resource)


Julia Piaskowski
Julia Piaskowski
Statistician

My research interests include plant genetics, spatial statistics and how to implement open science and reproducible research practices routinely.

Sign up for our newsletter

Agricultural statistical content focused on R and SAS, along with info about upcoming workshops, lectures, and trainings relevant to these topics

Sent quarterly (thereabouts)