User Tools

Site Tools


en:data_import_examples

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
en:data_import_examples [2019/01/20 23:52]
David Zelený [Using clipboard]
en:data_import_examples [2019/01/21 00:29]
David Zelený [Import directly from Excel (*.xls or *.xlsx) file]
Line 73: Line 73:
  
  
-==== Import directly from *.xls and *.xlsx file ==== +==== Import directly from Excel (*.xls or *.xlsxfile ==== 
-This option is the most complicated and dependent on the version of Excel file. For overview of R packages focused on importing ​data from Excel, consult ​e.g. [[http://​cran.r-project.org/​doc/​manuals/​R-data.html#​Reading-Excel-spreadsheets|this website]]; howeverin order to work properly, most of these libraries require installing some third party software. Perhaps the simplest way is to use the package ''​XLConnect''​, requiring installation of Java software((Note that for XLConnect and other libraries to work properly, you need to make sure that you have installed Java on your computer and it is the right version, i.e. if you are running 64-bit version of R, you need to install 64-bit version of Java (default installation file downloaded from the website contains only 32-bit versionso you perhaps need to install the 64-bit version manually from [[https://​java.com/​en/​download/​manual.jsp|here]];​ more on this [[https://​www.r-statistics.com/​2012/​08/​how-to-load-the-rjava-package-after-the-error-java_home-cannot-be-determined-from-the-registry/​|here]]).)). Install this package first, and download example ​Excel file {{https://​raw.githubusercontent.com/​zdealveindy/​anadat-r/​master/​data/​vltava.xlsx|vltava.xlsx}},​ e.g. to the root (''​c:/''​). Try to read the second working sheet from the ''​vltava.xlsx''​ file, which contains the environmental variables (Vltava env):+Importing ​data directly ​from Excel used to be quite complicated (see e.g. [[http://​cran.r-project.org/​doc/​manuals/​R-data.html#​Reading-Excel-spreadsheets|this website]])but the package ''​readxl'' ​made it much easierStillI do not suggest ​you use this optionsince it may not be replicable ​on every platform, and it may change with a newer version of Excel.
  
 +The use of the package ''​readxl''​ is pretty straightforward. There is a suite of functions for reading Excel file, like ''​read_excel'',​ ''​read_xls''​ and ''​read_xlsx''​. Important arguments are ''​path''​ and ''​sheet'',​ first for the name of the file (optinally including the path to the folder) and the second the name of the sheet which should be imported. We may try it on the Excel file {{https://​raw.githubusercontent.com/​zdealveindy/​anadat-r/​master/​data/​vltava.xlsx|vltava.xlsx}} - save it somewhere on your computer, and use:
 +<code rsplus>
 +# install.packages ('​readxl'​)
 +library (readxl)
 +veg.data.0 <- read_excel ('​c:/​path/​to/​data/​folder/​vltava.xlsx',​ sheet = '​Vltava spe')
 +</​code>​
  
 +The object ''​veg.data.0''​ created in R is not data.frame, but a ''​tibble''​ (alternative to data.frame in the ''​tidyverse''​ packages). If you don't like that (or you don't know how to use it), you can simply convert it into data.frame using ''​as.data.frame''​ function. If the first column of the data are in fact row names, they may need to be assigned as such:
 <code rsplus> <code rsplus>
-install.packages ​('​XLConnect'​+veg.data <- as.data.frame ​(veg.data.0[,​-1]
-library ​(XLConnect) +rownames ​(veg.data<- as.matrix ​(veg.data.0[,1]) 
-env.data <- readWorksheetFromFile(file = '​c:/​vltava.xlsx', sheet = "​Vltava env"header = TRUE, rownames = 1)+</​code>​
  
 +Note that the functions from package ''​readxl''​ cannot read Excel files directly from internet, unlike e.g. ''​read.table''​. But there is a workaround - first, download the Excel file into R as a temporary file, and the read it using ''​readxl''​ function:
 +<code rsplus>
 +url <- '​https://​raw.githubusercontent.com/​zdealveindy/​anadat-r/​master/​data/​vltava.xlsx'​
 +destfile <- tempfile ()
 +download.file(url,​ destfile, mode = '​wb'​)
 +veg.data.0 <- readxl::​read_excel(destfile,​ sheet = '​Vltava spe')
 </​code>​ </​code>​
  
-The function ''​readWorksheetFromFile'' ​will open the file specified in argument ''​file'' ​ and reads the sheet specified by the argument ''​sheet'' ​(this should be the number or the name of the required sheet). Other arguments are similar to the function ''​read.table''​.+Note that in the function ''​download.file'' ​it is important to specify ​the argument ''​mode = 'wb'''​ (on Windows, if the argument ''​mode'' ​is not set up, the type of the file will be determined from the file extension; in case of *.xls and *.xlsx files R would attempt to download these files as plain text, but in fact these files need to be downloaded as binaries).
  
-Note that the function ​''​readWorksheetFromFile'' ​cannot read Excel files directly from internetunlike e.g. ''​read.table''​. ​But there is a workaround - firstdownload ​the Excel file into Rand then use the function ''​readWorksheetFromFile'' ​on it((Thanks to Stackoverflow comments ​[[http://stackoverflow.com/questions/24165623/read-excel-file-into-r-with-xlconnect-package-from-url|here]] and [[http://​stackoverflow.com/questions/28860675/​how-to-download-an-xlsx-file-in-r-and-load-the-data-into-a-dataframe|here]])):+The library ''​readxl''​ is a part of the ''​tidyverse'' ​packagesand as such it does not use standard ​''​data.frame'' ​format for data frames, but unique ''​tibble''​ (as discussed above)One feature of ''​tibble'' ​is that **it does not have rownamesand rownames are therefore imported as the first column of the data frame**. This may or may not be handy for future analysis since most of the functions we will use for numerical analysis here use data in standard ''​data.frame''​ format with rownames indicating plot IDs. To convert ''​tibble'' ​into standard ''​data.frame''​, use the function ''​as.data.frame''​, and move the first column into the rownames of the newly created data frame (see above). Alternatively,​ there are functions like ''​[[https://www.rdocumentation.org/packages/tibble/versions/1.4.2/topics/rownames|column_to_rownames]]''​ in the package ''​tibble''​ which can help you with that.
  
-<code rsplus>​ +==== Import *.RData file ==== 
-library (XLConnect) +Binary data storing the R object can be loaded into R using function ''​load''​. Download data {{https://raw.githubusercontent.com/​zdealveindy/anadat-r/master/​data/​vltava.RData|vltava.RData}} to your computer and use:
-tmp <- tempfile (fileext ​"​.xlsx"​) +
-download.file (url "http://davidzeleny.net/​anadat-r/​data-download/vltava.xlsx", destfile = tmp, mode = '​wb'​) +
-env.data <- readWorksheetFromFile(file = tmp, sheet = "​Vltava env", header = TRUE, rownames = 1)+
  
 +<code rsplus>
 +load ('​c:/​path/​to/​data/​folder/​vltava.RData'​)
 </​code>​ </​code>​
-Note that in the function ''​download.file''​ it is important to specify the argument ''​mode = '​wb'''​((On Windows, if the argument ''​mode''​ is not set up, the type of the file will be determined from the file extension; in case of "​xls"​ and "​xlsx",​ it would attempt to download these files as plain text, but in fact these files need to be downloaded as binaries.))! 
  
 +This should create the variable ''​vltava''​ in your Global environment (it will appear among variables), which is  alist of variables (see details [[en:​data:​vltava#​all_vltava_dataset_matrices_as_a_single_r_object_list_vltava|here]]). Do not assign the result to a new variable, it will not work (if you use ''​vltava.spe <- load ('​vltava.RData'​)'',​ the object ''​vltava.spe''​ will be assigned the character string ''​vltava'',​ not the data).
 +
 +Alternatively,​ ''​load''​ function can read directly data from website, if the URL link is wrapped by function ''​url'':​
 +<code rsplus>
 +load (url ('​https://​raw.githubusercontent.com/​zdealveindy/​anadat-r/​master/​data/​vltava.RData'​))
 +</​code>​
  
 ===== Import from cc! (cep) format ===== ===== Import from cc! (cep) format =====
en/data_import_examples.txt · Last modified: 2019/02/03 16:21 by David Zelený