Introduction
R is a powerful and versatile programming language used for statistical analysis, data visualization, and machine learning. The success of a data analysis project relies on properly importing the data into R. While there are several methods to import data into R, one of the simplest and most convenient methods is to copy and paste data from external sources. These sources are spreadsheets, text editors, or websites. In this article, we will discuss the top three ways to import data into R using copy and paste. The three methods are R script, read.delim in R, and R datapasta. With these methods you can import and load data into R.
These methods will allow you to quickly and easily import data in R different formats and sources and use it in your project.
Learning Objectives
- Understand why to use copy-paste method for getting data into R.
- Understand how to use the read.delim function from base R to import data into R using copy and paste.
- Learn how to use the clipr package to import data into R using copy and paste.
- Explore the R datapasta package to paste data directly into R with the correct formatting.
This article was published as a part of the Data Science Blogathon.
Table of Contents
Why Import Data into R using Copy-Paste Method?
Getting data for analysis is challenging for several reasons. One reason is that data may not always be readily available, and collecting it requires time-consuming and expensive efforts. Additionally, the data may not be in a format that is suitable for analysis, and preprocessing may be necessary to clean, organize, and transform it. Copy-pasting data using different packages in R can help overcome some of these challenges.
Importing Data into R Using read.delim() Function
The first way that we will use to import the data into R is using the copy-paste method. For this, we will use the read.delim function from base R directly. The command read.delim() in R is used to read tabular data in the form of delimited text files (where a specified delimiter, such as a comma, tab, space, or other characters, separates the columns). Simply copy the data from an external source, like a spreadsheet or text file, and paste it into the R console or R script editor. Let us take the following example where we have data in an Excel sheet that we want to import into RStudio:
Select and copy the required data using either the copy option or the shortcut CTRL+C to import the required data. Then, return to RStudio and use the following command to save and load data in R in a dataframe named “df”:
df<-read.delim("clipboard")
After running this command, the data in the clipboard will be saved in the “df” dataframe. Let us verify the data by printing the first few rows using the “head” function:
head(df)
Output:
First few rows of dataframe
It’s important to note that the first line of the selected table is the header row. Additionally, data stored in a TXT file can be copied and pasted into R using the read.delim function. Let us take the following text file:
To import this data, we will use the read.delim() in R and specify the separator argument to be equal to a space since the text data are separated by blanks. First, we will copy the required data from the text file, return to RStudio, and use the following command to save and load data in R in a new dataframe named “df1”:
df1 <- read.delim("clipboard", sep = " ")
Let us again verify the data by printing the first few rows using the “head” function:
head(df1)
Output:
Printing first few rows of dataframe
Although the output of this example is similar to the earlier one, this time, we imported data from a text file instead of an Excel file.
Importing Data into R Using the Clipr Package
Next, we will use the clipr package to import the data into R using the copy-paste method. This package provides functions to read and write data from the clipboard.
To use the clipr package, it first needs to be installed by running the following command:
install.packages("clipr")
Once installed, load the clipr library using the library() function:
library(clipr)
Now we will use the read_clip_tbl() function from the clipr package to directly get the clipboard contents from spreadsheets into data frames.
We will use the earlier excel spreadsheet for exploring the clipr package. We will select the data in the excel spreadsheet and copy it using the copy option. Then, we will return to RStudio and use the following command to save and load data in R in a dataframe named “df2”:
df2 <- read_clip_tbl()
The above code reads the data from the clipboard and returns a tibble (a modern and tidy implementation of a data frame in R) stored in the “df2” variable. The read_clip_tbl() function automatically detects the delimiter and header row, so you don’t need to specify any arguments.
After running the above command, the data in the clipboard will be saved in the “df2” dataframe.
df2
Other than the read_clip_tbl() function, the clipr package provides many functions. For example, in R, we can use the write_clip() function from the clipr package to write data to the clipboard. This is useful when copying data from R and pasting it into another application (e.g., Excel, a text editor, or an email).
df <- write_clip(c("Getting Data", "using", "clipr"))
The format of the copied data depends on the data type of our variable, i.e., if it is a vector or a dataframe.
We can find out if the clipboard is available for use by calling the clipr_available() function.
clipr_available()
Output:
As shown above, this function returns a Boolean value highlighting whether the clipboard is currently available or not.
Moreover, if we want to clear the clipboard, we can use the clear_clip() function. As the name suggests, this function will erase the contents of the clipboard, ensuring that no old or unwanted data remains.
Importing Data into R using the Datapasta Package
Datapasta is a package of RStudio add-ins and functions that allows users to copy data available in sources like Excel, Jupyter, and websites, and paste it directly into R with the correct formatting.
R Datapasta simplifies the process of embedding raw data into Rmarkdown files, creating reproducible examples for StackOverflow, and quickly pasting vector output from other queries into dplyr::filter().
First, we will install the datapasta package from CRAN using the following command:
install.packages("datapasta")
This package contains an RStudio Add-In that allows users to paste web tables stored in their clipboard. After installing the R datapasta package, restart RStudio in order to access the datapasta add-ins.
As you can see from the above image, Datapasta provides various options for copying and pasting data. For example, let us copy a table from Wikipedia and paste it at the current cursor location.
Source: en.wikipedia.org
To paste data as a tribble(), we will simply copy the table header and data rows, then paste the add-in “Paste as tribble” into the source editor. For pasting the data, we may opt for the keyboard shortcut ctrl + shift + t. Don’t forget to assign it to an object to work further with it.
Pasted tribble using datapasta
The function tribble_paste() is quite flexible and can guess the separator and types of data from the clipboard. However, there may be cases where it fails. Supported separators include | (pipe), t (tab), (comma), and ; (semicolon). In most cases, data copied from the internet or spreadsheets will be tab-delimited. The function will also try to recognize if there is no header row and create a default for the user.
Next, we will use another add in from R datapasta “paste as data.frame”. We will select the same data as shown in the previous example, and this time we will paste it as data.frame.
Pasted dataframe using datapasta
As shown in the output above, it pasted the data selection. Also, without any formatting, it is telling R to consider the age column entries as integers with the L extension to it and the first two column entries as strings. We can assign it to an object called df and print the first few rows using the head function.
head(df)
Output:
Sometimes it may be unnecessary to create a whole dataframe, and a simple array is sufficient. In such cases, the shortcut to paste as a vector (shift+cmd+V) can be used to turn a single copied row of data into a vector.
Pasted vector using datapasta
Conclusion
In conclusion, this article discussed three ways to import data into R using the copy-and-paste approach. We started with the read.delim function from base R, where we could directly import the data with this approach. Next, we discussed the clipr package, which provides functions to read data from/to the clipboard. Finally, we discussed the R datapasta package, which allows users to copy data from various sources and paste it directly into R with appropriate formatting. These methods will allow you to quickly and easily import data and use it in your project.
Here are the key takeaways from this article:
- We can import data into R in CSV, Excel, and TSV formats. However, it is important to check the imported data for errors and inconsistencies after importing it into R.
- The copy-and-paste method is very useful when working with unformatted or unstructured data. In such a situation, we can copy the data from webpages, PDF documents, or emails and paste it directly into R using the datapasta or clipr packages. This saves a lot of time and effort compared to manually typing and load data into R or converting it into a specific file format before importing.
- Depending on the size and complexity of the data, different methods may be more appropriate. For example, read.delim may be more suitable for large datasets and importing data from tab-separated values (TSV) files, while clipr or datapasta could be better for smaller datasets.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
By Analytics Vidhya, March 27, 2023.