Importing Data into RStudio:
- Importing Data: The process of bringing external data into RStudio for analysis or manipulation.
- Data Frame: A rectangular data structure in R that stores data in rows and columns, similar to a table in a database.
data_frame <- data.frame( Name = c("Alice", "Bob", "Carol"), Age = c(25, 30, 22) )
- CSV File: A Comma-Separated Values file that stores tabular data, with commas separating values in each row.
csv_data <- read.csv("data.csv")
- Tab-Delimited File: A file format where values are separated by tabs.
tab_data <- read.delim("data.txt", sep = "\t")
- Whitespace-Separated File: Data values are separated by whitespace, like spaces or tabs.
whitespace_data <- read.table("data.txt")
- Excel Files: Data stored in Microsoft Excel spreadsheet format.
library(readxl) excel_data <- read_excel("data.xlsx")
- From Statistical Software: Importing data from statistical software like SPSS, SAS, or Stata.
library(haven) spss_data <- haven::read_sav("data.sav")
- Scraping Web Data: Extracting data from websites.
library(rvest) web_data <- read_html("https://example.com") %>% html_table()
- JSON Data: JavaScript Object Notation data format.
library(jsonlite) json_data <- fromJSON("data.json")
- XML Data: Extensible Markup Language data format.
library(xml2) xml_data <- read_xml("data.xml")
Rectangular Data and Excel Files:
- Rectangular Data: Tabular data organized into rows and columns, suitable for analysis in R.
- Header Row: The first row in a data file that contains column names.
- Column: A vertical arrangement of data in a data frame or spreadsheet.
- Row: A horizontal arrangement of data in a data frame or spreadsheet.
- Cell: A single data entry within a row and column intersection.
- Workbook: An Excel file containing one or more spreadsheets.
- Worksheet: A single sheet within an Excel workbook.
- Named Range: A named set of cells in an Excel worksheet.
- Cell Reference: A notation that specifies the location of a cell, e.g., A1, B2.
- Formula: A mathematical expression used to calculate values in Excel.
Importing from Statistical Software and Web Scraping:
- SPSS Data File: Data file from IBM SPSS software.
- SAS Data Set: Data file from SAS software.
- Stata Data File: Data file from Stata software.
- URL: Uniform Resource Locator, the web address of a resource.
- Web Scraping: Extracting information from web pages.
- HTML: HyperText Markup Language, the standard markup language for web pages.
- CSS Selector: A pattern used to select HTML elements for scraping.
- XPath: A query language for selecting elements from an XML or HTML document.
- API: Application Programming Interface, a set of rules for interacting with software components.
- JSON: JavaScript Object Notation, a lightweight data interchange format.
Reading and Writing from SQL Databases:
- SQL: Structured Query Language, used to manage and manipulate databases.
- Database Connection: Establishing a link between R and a database.
- DBMS: Database Management System, software for managing databases.
- ODBC: Open Database Connectivity, a standard for database access.
- Query: A request for specific data from a database.
- Table: A structured data entity within a database.
- Primary Key: A unique identifier for each record in a database table.
- Foreign Key: A field in one table that refers to the primary key in another table.
- JOIN: Combining data from two or more database tables.
- SQL Injection: Malicious code injection into a database query.
Examples:
- Import CSV data:
csv_data <- read.csv("data.csv")
- Import Excel data:
library(readxl) excel_data <- read_excel("data.xlsx")
- Import SPSS data:
library(haven) spss_data <- haven::read_sav("data.sav")
- Scrape web data:
library(rvest) web_data <- read_html("https://example.com") %>% html_table()
- Import JSON data:
library(jsonlite) json_data <- fromJSON("data.json")
- Import XML data:
library(xml2) xml_data <- read_xml("data.xml")
- Connect to a SQL database:
library(DBI) con <- dbConnect(RSQLite::SQLite(), dbname = "mydb.sqlite")
- Query a database table:
query <- "SELECT * FROM employees WHERE department = 'HR'" result <- dbGetQuery(con, query)
- Write data to a database:
dbWriteTable(con, "new_data", new_data_frame)
- Join data from multiple tables:
query <- "SELECT orders.order_id, customers.name FROM orders INNER JOIN customers ON orders.customer_id = customers.customer_id" result <- dbGetQuery(con, query)