User Tools

Site Tools


en:data_import_examples

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
en:data_import_examples [2019/01/20 23:52]
David Zelený [Using clipboard]
en:data_import_examples [2019/01/21 00:15]
David Zelený [Import directly from *.xls and *.xlsx file]
Line 73: Line 73:
  
  
-==== Import directly from *.xls and *.xlsx file ==== +==== Import directly from Excel (*.xls or *.xlsxfile ==== 
-This option is the most complicated and dependent on the version of Excel file. For overview of R packages focused on importing ​data from Excel, consult ​e.g. [[http://​cran.r-project.org/​doc/​manuals/​R-data.html#​Reading-Excel-spreadsheets|this website]]; howeverin order to work properly, most of these libraries require installing some third party software. Perhaps the simplest way is to use the package ''​XLConnect''​, requiring installation of Java software((Note that for XLConnect and other libraries to work properly, you need to make sure that you have installed Java on your computer and it is the right version, i.e. if you are running 64-bit version of R, you need to install 64-bit version of Java (default installation file downloaded from the website contains only 32-bit versionso you perhaps need to install the 64-bit version manually from [[https://​java.com/​en/​download/​manual.jsp|here]];​ more on this [[https://​www.r-statistics.com/​2012/​08/​how-to-load-the-rjava-package-after-the-error-java_home-cannot-be-determined-from-the-registry/​|here]]).)). Install this package first, and download example ​Excel file {{https://​raw.githubusercontent.com/​zdealveindy/​anadat-r/​master/​data/​vltava.xlsx|vltava.xlsx}},​ e.g. to the root (''​c:/''​). Try to read the second working sheet from the ''​vltava.xlsx''​ file, which contains the environmental variables (Vltava env):+Importing ​data directly ​from Excel used to be quite complicated (see e.g. [[http://​cran.r-project.org/​doc/​manuals/​R-data.html#​Reading-Excel-spreadsheets|this website]])but the package ''​readxl'' ​made it much easierStillI do not suggest ​you use this optionsince it may not be replicable ​on every platform, and it may change with a newer version of Excel.
  
 +The use of the package ''​readxl''​ is pretty straightforward. There is a suite of functions for reading Excel file, like ''​read_excel'',​ ''​read_xls''​ and ''​read_xlsx''​. Important arguments are ''​path''​ and ''​sheet'',​ first for the name of the file (optinally including the path to the folder) and the second the name of the sheet which should be imported. We may try it on the Excel file {{https://​raw.githubusercontent.com/​zdealveindy/​anadat-r/​master/​data/​vltava.xlsx|vltava.xlsx}} - save it somewhere on your computer, and use:
 +<code rsplus>
 +# install.packages ('​readxl'​)
 +library (readxl)
 +veg.data.0 <- read_excel ('​c:/​path/​to/​data/​folder/​vltava.xlsx',​ sheet = '​Vltava spe')
 +</​code>​
  
 +The object ''​veg.data.0''​ created in R is not data.frame, but a ''​tibble''​ (alternative to data.frame in the ''​tidyverse''​ packages). If you don't like that (or you don't know how to use it), you can simply convert it into data.frame using ''​as.data.frame''​ function. If the first column of the data are in fact row names, they may need to be assigned as such:
 <code rsplus> <code rsplus>
-install.packages ​('​XLConnect'​+veg.data <- as.data.frame ​(veg.data.0[,​-1]
-library ​(XLConnect) +rownames ​(veg.data<- as.matrix ​(veg.data.0[,1])
-env.data <- readWorksheetFromFile(file = '​c:/​vltava.xlsx', sheet = "​Vltava env", header = TRUErownames = 1) +
 </​code>​ </​code>​
  
-The function ''​readWorksheetFromFile''​ will open the file specified in argument ''​file'' ​ and reads the sheet specified by the argument ''​sheet''​ (this should be the number or the name of the required sheet). Other arguments are similar to the function ''​read.table''​. +Note that the functions from package ​''​readxl''​ cannot read Excel files directly from internet, unlike e.g. ''​read.table''​. But there is a workaround - first, download the Excel file into R as a temporary file, and the read it using ''​readxl'' ​function:
- +
-Note that the function ​''​readWorksheetFromFile''​ cannot read Excel files directly from internet, unlike e.g. ''​read.table''​. But there is a workaround - first, download the Excel file into R, and then use the function ​''​readWorksheetFromFile'' ​on it((Thanks to Stackoverflow comments [[http://​stackoverflow.com/​questions/​24165623/​read-excel-file-into-r-with-xlconnect-package-from-url|here]] and [[http://​stackoverflow.com/​questions/​28860675/​how-to-download-an-xlsx-file-in-r-and-load-the-data-into-a-dataframe|here]])): +
 <code rsplus> <code rsplus>
-library (XLConnect) +url <- 'https://raw.githubusercontent.com/​zdealveindy/anadat-r/master/​data/​vltava.xlsx
-tmp <- tempfile (fileext = "​.xlsx"​) +destfile <- tempfile () 
-download.file (url = "http://davidzeleny.net/​anadat-r/​data-download/​vltava.xlsx", destfile ​= tmp, mode = '​wb'​) +download.file(url, destfile, mode = '​wb'​) 
-env.data <- readWorksheetFromFile(file = tmp, sheet = "Vltava ​env", header = TRUE, rownames = 1)+veg.data.0 <- readxl::​read_excel(destfile, sheet = 'Vltava ​spe') 
 +</​code>​
  
-</​code>​ +Note that in the function ''​download.file''​ it is important to specify the argument ''​mode = '​wb'''​ (on Windows, if the argument ''​mode''​ is not set up, the type of the file will be determined from the file extension; in case of *.xls and *.xlsx files R would attempt to download these files as plain text, but in fact these files need to be downloaded as binaries).
-Note that in the function ''​download.file''​ it is important to specify the argument ''​mode = '​wb'''​((On Windows, if the argument ''​mode''​ is not set up, the type of the file will be determined from the file extension; in case of "xls" ​and "xlsx", it would attempt to download these files as plain text, but in fact these files need to be downloaded as binaries.))!+
  
 +The library ''​readxl''​ is a part of the ''​tidyverse''​ packages, and as such it does not use standard ''​data.frame''​ format for data frames, but unique ''​tibble''​ (as discussed above). One feature of ''​tibble''​ is that **it does not have rownames, and rownames are therefore imported as the first column of the data frame**. This may or may not be handy for future analysis since most of the functions we will use for numerical analysis here use data in standard ''​data.frame''​ format with rownames indicating plot IDs. To convert ''​tibble''​ into standard ''​data.frame'',​ use the function ''​as.data.frame'',​ and move the first column into the rownames of the newly created data frame (see above). Alternatively,​ there are functions like ''​[[https://​www.rdocumentation.org/​packages/​tibble/​versions/​1.4.2/​topics/​rownames|column_to_rownames]]''​ in the package ''​tibble''​ which can help you with that.
  
 ===== Import from cc! (cep) format ===== ===== Import from cc! (cep) format =====
en/data_import_examples.txt · Last modified: 2019/02/03 16:21 by David Zelený