Importing Data into RStudio

Importing Data into RStudio:

  • Importing Data: The process of bringing external data into RStudio for analysis or manipulation.
  • Data Frame: A rectangular data structure in R that stores data in rows and columns, similar to a table in a database.
    • data_frame <- data.frame( Name = c("Alice", "Bob", "Carol"), Age = c(25, 30, 22) )
  • CSV File: A Comma-Separated Values file that stores tabular data, with commas separating values in each row.
    • csv_data <- read.csv("data.csv")
  • Tab-Delimited File: A file format where values are separated by tabs.
    • tab_data <- read.delim("data.txt", sep = "\t")
  • Whitespace-Separated File: Data values are separated by whitespace, like spaces or tabs. whitespace_data <- read.table("data.txt")
  • Excel Files: Data stored in Microsoft Excel spreadsheet format.
    • library(readxl) excel_data <- read_excel("data.xlsx")
  • From Statistical Software: Importing data from statistical software like SPSS, SAS, or Stata.
    • library(haven) spss_data <- haven::read_sav("data.sav")
  • Scraping Web Data: Extracting data from websites.
    • library(rvest) web_data <- read_html("https://example.com") %>% html_table()
  • JSON Data: JavaScript Object Notation data format.
    • library(jsonlite) json_data <- fromJSON("data.json")
  • XML Data: Extensible Markup Language data format.
    • library(xml2) xml_data <- read_xml("data.xml")

Rectangular Data and Excel Files:

  • Rectangular Data: Tabular data organized into rows and columns, suitable for analysis in R.
  • Header Row: The first row in a data file that contains column names.
  • Column: A vertical arrangement of data in a data frame or spreadsheet.
  • Row: A horizontal arrangement of data in a data frame or spreadsheet.
  • Cell: A single data entry within a row and column intersection.
  • Workbook: An Excel file containing one or more spreadsheets.
  • Worksheet: A single sheet within an Excel workbook.
  • Named Range: A named set of cells in an Excel worksheet.
  • Cell Reference: A notation that specifies the location of a cell, e.g., A1, B2.
  • Formula: A mathematical expression used to calculate values in Excel.

Importing from Statistical Software and Web Scraping:

  • SPSS Data File: Data file from IBM SPSS software.
  • SAS Data Set: Data file from SAS software.
  • Stata Data File: Data file from Stata software.
  • URL: Uniform Resource Locator, the web address of a resource.
  • Web Scraping: Extracting information from web pages.
  • HTML: HyperText Markup Language, the standard markup language for web pages.
  • CSS Selector: A pattern used to select HTML elements for scraping.
  • XPath: A query language for selecting elements from an XML or HTML document.
  • API: Application Programming Interface, a set of rules for interacting with software components.
  • JSON: JavaScript Object Notation, a lightweight data interchange format.

Reading and Writing from SQL Databases:

  • SQL: Structured Query Language, used to manage and manipulate databases.
  • Database Connection: Establishing a link between R and a database.
  • DBMS: Database Management System, software for managing databases.
  • ODBC: Open Database Connectivity, a standard for database access.
  • Query: A request for specific data from a database.
  • Table: A structured data entity within a database.
  • Primary Key: A unique identifier for each record in a database table.
  • Foreign Key: A field in one table that refers to the primary key in another table.
  • JOIN: Combining data from two or more database tables.
  • SQL Injection: Malicious code injection into a database query.

Examples:

  • Import CSV data:

csv_data <- read.csv("data.csv")

  • Import Excel data:

library(readxl) excel_data <- read_excel("data.xlsx")

  • Import SPSS data:

library(haven) spss_data <- haven::read_sav("data.sav")

  • Scrape web data:

library(rvest) web_data <- read_html("https://example.com") %>% html_table()

  • Import JSON data:

library(jsonlite) json_data <- fromJSON("data.json")

  • Import XML data:

library(xml2) xml_data <- read_xml("data.xml")

  • Connect to a SQL database:

library(DBI) con <- dbConnect(RSQLite::SQLite(), dbname = "mydb.sqlite")

  • Query a database table:

query <- "SELECT * FROM employees WHERE department = 'HR'" result <- dbGetQuery(con, query)

  • Write data to a database:

dbWriteTable(con, "new_data", new_data_frame)

  • Join data from multiple tables:

query <- "SELECT orders.order_id, customers.name FROM orders INNER JOIN customers ON orders.customer_id = customers.customer_id" result <- dbGetQuery(con, query)

What are your feelings