Using Large Langugae Models in R Programming

Image: Karsten Wuerth

Overview

Generative AI tools specifically Large Language Models (LLMs) (e.g., ChatGPT, Claude, or GitHub Copilot) have become increasingly popular in the R programming world. However, the effectiveness of these tools is often questioned and can vary widely among users.

Everyone has their own opinion on how effective these tools are for R programming, based on their experiences. Some people find them incredibly helpful, while others think they are a waste of time. One’s experience with R programming, familiarity with AI tools, and the specific tasks they are trying to accomplish can greatly influence their perception of AI’s effectiveness.

Note: Please note that the knowledge base of these AI tools is unclear, and the underlying information lacks transparency. This could raise concerns about the quality of AI-generated responses, e.g., it might still generate answers using deprecated packages and functions.

In this blog post, I will share my insights on using AI tools for R programming, including how to effectively write prompts, troubleshoot code, and best practices for leveraging AI assistance. The goal of this blog post is to help you with:

  • AI tools for R programming
  • Using AI for R programming
    • Generate R code
    • R code explanation
    • Troubleshooting R code
    • R code optimization
  • Best practices

AI tools for R programming

Common AI tools available for R programming include GitHub Copilot, Claude, OpenAI’s ChatGPT, or LLM add-in packages for RStudio (such as gander, ellmer). The ChatGPT or Claude are LLM-based chatbots and can be used as a standalone tool and GitHub Copilot is an add-in for RStudio IDE.

Here are the links of the sources on how to install these tools:

  • GitHub Copilot: Installation guide

    • This is a great add-in AI tool for: live code suggestion, code completion, or generating an entire function based on the context of analysis.
  • R packages for using LLM: These packages allows you to directly interact with LLMs from within RStudio, which can assist in programming questions.

Programming tasks that we can accomplish with AI tools

Generate R code

We can use these tools for generating code to perform specific tasks in R, including: data wrangling, statistical analysis, data visualization, and more. We can also look for help in generating specific functions, or entire scripts based on the our requirements. However, the clarity in writing prompts can significantly affect the response of AI tools.

For example, using prompts like “create a plot in R” vs. “create a boxplot using ggplot2” can yield quite different results. Therefore, it’s crucial to utilize these tools for R programming with a clear action plan. In addition to that your choice of AI tool can also affect the output. Here, I have obtained quite different outputs from ChatGPT and Claude for the same prompt (‘create a plot in R’). We have obtained a detailed response from Claude with specific examples for different kinds of plots, while ChatGPT provided a more generic response with base R functions.

R code explanation AI tools

AI tools can be helpful in understanding R code syntax as we can obtain faster explanations for specific functions, packages, or their parameters. This can be particularly useful for new R users or when working with unfamiliar packages.

Example:

I am following an online guide for data wrangling (using the tidyverse package), and I’m confused about what the author is trying to achieve with a long loop of code. I pasted the code in ChatGPT and asked for an explanation of the code. Here’s a detailed answer for the code explanation line-by-line from ChatGPT.

Troubleshoot R code with AI assistance

Often, we spent hours debugging R code by searching through package documentation or Stack Overflow. While these sources remain reliable, AI assistance can provide a faster explanation and solution of the error. The errors in R programming could be a simple error as a typo in a function name, or it could be a mispecification of the function arguments. The quickest way to figure out that error is: we can paste the error from R output and ask AI for the error explanation and solution to fix it or a suggestion for an alternative approach. AI tool can help you in troubleshooting R code by analyzing error messages, suggesting potential fixes, and providing explanations for common issues. But, remember that these tools cannot debug all kinds of errors, some errors require expert’s advice.

R code explanation

If you are new to R programming and not very familiar with R packages documentation, AI tools can be an excellent resource for understanding how to use specific functions and packages.

Here’s an example prompt for using ChatGPT for R package and code explanation:

explanation of functions in dplyr package for data wrangling

This was the first response from ChatGPT.

In addition to that, AI tools can also be used to understand how to use functions/packages in R, what parameters they take, and how to interpret the results. This can be especially useful when learning new packages.

Prompt: Can you explain how to use the ggplot() function in R to create a boxplot for different treatments? Please provide an example code and explain the parameters used in the function.

Here’s an explanation of ggplot() function from ChatGPT.

Or, you can also take the output of the analysis and ask for the explanation of output and interpretation.

For advanced R programming

If you’re a seasoned R programmer and want to improve your coding efficiency, you can benefit from AI assistance in several ways, such as, reproducible workflow, automate repetitive coding tasks, or code optimization.

Best practices for AI assisted R programming

While these tools can be helpful for R programming tasks, the suggested code/packages should be reviewed carefully, especially for accuracy and appropriateness for contexts. The tool you are using may not always understand the context of your task and can provide inefficient or incorrect solutions.

To ensure that you get the most out of AI tools for R programming, it’s important to follow some best practices:

Know your AI tool

As mentioned before, we are not very familiar with algorithms and the knowledge base of these tools. They vary widely based on the specific AI model, its training data, and how it interprets your prompts.

Thus, it’s important to understand whether the response given by one tool is effective or not.

Here’s an example of the output from same prompt with ChatGPT and GitHub Copilot:

Prompt: Create summary statistics for iris data

Response in ChatGPT

Output from GitHub Copilot:

    # create summary statistics for iris data
    # using the iris data set 
    # # load the iris data set
    data(iris)
    # create a summary statistics function
    summary_stats <- function(data) {
      summary_data <- data.frame(
        Mean = sapply(data[, -5], mean),
        Median = sapply(data[, -5], median),
        SD = sapply(data[, -5], sd),
        Min = sapply(data[, -5], min),
        Max = sapply(data[, -5], max)
      )
      return(summary_data)
    }
    # apply the summary statistics function to the iris data set
    summary_iris <- summary_stats(iris)
    # print the summary statistics
    print(summary_iris)

                Mean Median        SD Min Max
Sepal.Length 5.843333   5.80 0.8280661 4.3 7.9
Sepal.Width  3.057333   3.00 0.4358663 2.0 4.4
Petal.Length 3.758000   4.35 1.7652982 1.0 6.9
Petal.Width  1.199333   1.30 0.7622377 0.1 2.5

Here, we can see that both tools provided a solution for the task we requested, but the output from GitHub Copilot is more structured and includes a step-wise approach.

Before using AI tools for R programming, familiarize yourself with the specific tool you are using. Therefore, it’s important to understand the capabilities and limitations of these tools.

print() in R code is often suggested by AI tools. I don’t think we need it as many times AI suggests it.

Optimize prompts for R programming tasks

When using AI tools for R programming, write clear and specific prompts to get the best results. Here are some tips for writing effective prompts:

  • Be clear and concise: se simple language to describe your programming task.

  • Providing context, even as simple as “use the dplyr package to filter data”, can help in getting more successful suggestions.

  • If you are aware of the desired outcome, include it in your prompt. But it’s not easy for new learners and it’s hard to articulate in prompts. Alternatively, if you’re unsure of the correct answer, continue refining the prompt until you achieve the desired results.

  • Another potential trick is to start over a new chat. Every new prompt in the thread is analyzed in the context of previous prompts. If you’re struggling to get a desired response, try starting a new chat.

Here’s an example of bad, good, & excellent prompts along with a response from ChatGPT:

Bad prompt: “Filter data in R” Response

Good prompt: “How to filter rows in a data frame where a column is greater than 10 in R?.” Response

Excellent prompt: “I have a data frame with columns ‘Name’, ‘Age’, > and ‘Score’. How can I filter for rows where Age is over 30 and Score > is less than 5 using dplyr package? Please provide a code example in R.” Response

Validate your output

It’s possible that AI tools can generate outdated R syntax or provide wrong package/function names. Constantly review and verify the generated codes and packages with R documentation. Here are the few common issues that can occur:

  • Simple errors: typos in the function names or incorrectly formatted ode (e.g., missing parentheses, commas, etc.

  • Recommending deprecated packages/functions or using functions that are not available in the specified package.

  • Issues with incorrect data types or structures: I have faced this multiple times. The AI tool you are using can not know about your data structure (unless you upload/explain it) and it can provide code that might not work. This can happen while working with a numeric data class when it should be factor/character class.

Use AI tools as an assistant, not a replacement

Remember that these tools are supposed to assist in R programming tasks but not a replacement for one’s understanding of R. These tools can help in enhancing your skills and productivity, but don’t rely entirely on these for all your programming needs.

Here are a few ways to verify the packages/functions:

  • Load the suggested package and run the function[1] and compare your output to the examples provided in the R documentation or textbook examples. This will help you in confirming the expected package functions and results.

  • Run the output code line-by-line; by executing the generated response line-by-line, you will be able to inspect the functionality of each function and its arguments. In addition to that, you will also be able to identify errors as sometimes R silently ignores errors by providing a warning only. In that scenario, your code will still execute, but the outcome might not be valid. This typically happens while working on a statistical analysis.

Final thoughts

AI tools can be handy for R programming tasks, from generating code to explaining functions and troubleshooting issues. However, it’s important to use these tools effectively by writing clear prompts, validating the generated code, and understanding the limitations of the AI tools.

Please note that this is not a definite guide for using AI tools for R programming, but rather a starting point for beginners who want to explore AI tools for R programming. Also, the landscape of AI tools continue to evolve, tips or practices working today might not be effective in future.

Here are the best sources to check the R functions/packages: R documentation, CRAN, or Stack Overflow.

Important: Please make sure to follow the UI guidelines for using AI.

Harpreet Kaur
Harpreet Kaur
Statistician

My research interests include agronomy, soil fertility, and applied statistics.

Sign up for our newsletter

Agricultural statistical content focused on R and SAS, along with info about upcoming workshops, lectures, and trainings relevant to these topics

Sent quarterly (thereabouts)

Not sure if you want to sign up? Read our newsletter archive