RèšèªãŸãšã
çµ±èšã®ããã® èšèª
çµ±èšèšç®ãšã°ã©ãã£ãã¯åºåã®ããã® GNU ãããžã§ã¯ã.
ããŒã¹ã¯é¢æ°åèšèªã ã, ãªããžã§ã¯ãæåã«ã, æç¶ãåã«ãããããšãã§ãã
- çµ±èšè§£æéšå㯠AT&T ãã«ç 究æãéçºãã S èšèªãåèã«ããŠãã
- S 㯠Dinamic Scoping ã«ããã㊠R 㯠Lexical Scoping
- ããŒã¿åŠçéšå㯠Scheme ã®åœ±é¿ãåããŠãã.
ææ³
åºæ¬çãªæ§æèŠçŽ ã¯ä»¥äžã®ãšãã
- ç°å¢ ⊠ãªããžã§ã¯ããšã·ã³ãã«ã® pair ããã€éå
- ãªããžã§ã¯ã ⊠ãã®
- é¢æ° ⊠ãªããžã§ã¯ããæäœãããã® (é¢æ°ãªããžã§ã¯ã)
- ã·ã³ã㫠⊠ãã®ã«ã€ããããåå (èå¥å)
Basic
Immutable
R èšèªã«ãããŠ,
- ä»£å ¥ã¯ãªããžã§ã¯ãã®ã³ããŒ
- é«éé¢æ° ⊠é¢æ°ã®æ»ãå€ã¯é¢æ°ãªããžã§ã¯ãã®ã³ããŒ
Binding to Symbol
é¢æ°ã¯ã·ã³ãã«ã«å²ãåœãŠãããšãã§ãã.(ãããé¢æ°åãã©ãã€ã )
ã€ã³ã¿ããªã¿ã¯
- global environment ããã·ã³ãã«ãæ¢ã.
- Namespaces (Lexical Scope) ããã·ã³ãã«ãæ¢ã.
Lexical Scoping
make.power <- function (n) {
pow <- function (x) { x^n }
pow
}
cube <- make.power (3)
square <- make.power (2)
-
explicite printing
print é¢æ°ãå©çšãã.
x <- 1 print (x)
1
-
auto-printing
èªåã§åœ¢åŒãå€å¥ããŠã§åºå.
x
1
msg <- "hello"
hello
x <- 1:5
1 2 3 4 5
Functions
ç¹åŸŽ
ã³ãã³ãã©ã€ã³ãã interactive ã«æäœã§ããããšãéèŠããŠèšèšããã.
Basics
function () 宣èšã§çæããã. çæããããš,é¢æ°ã¯ã©ã¹ã® R ãªããžã§ã¯ããšããŠä¿æããã.
f <- function (<arguments) {
###
}
R ã®é¢æ°ã¯ 第äžçŽãªããžã§ã¯ã.
- ä»ã®é¢æ°ã®åŒæ°ã«æž¡ãããšãã§ãã.
- é¢æ°ã¯ãã¹ãã§ãã.
- æ»ãå€ã¯ body ã®æåŸã«è©äŸ¡ãããçµæ.
Arguments:
åŒæ°ã®ç §åã«ã¯ä»¥äžã®é åºããã.
- æ確ãªååæå®
- éšåäžèŽããåå
- å ¥åé
Lagy Evaluation
é 延è©äŸ¡ããµããŒãããŠãã.
以äžã®äŸã§ã¯, a ã¯è©äŸ¡ãã㊠b ã¯è©äŸ¡ãããªã.
f <- function (a, b) {
print (a)
print (b)
}
f (45)
example
add2 <- function (x, y) {
x + y
}
above10 <- function (x) {
use <- x > 10
x[use]
}
above <- function (x, n = 10) {
use <- x > n
x[use]
}
x <- 1:20
above (x, 12)
clumnmean <- function (y, removeNA = TRUE) {
nc <- ncol (y)
means <- numeric (c (nc))
for (i in 1:nc) {
means[i] <- mean (y[, 1], na.rm = removeNA)
}
means
}
Control Structures: å¶åŸ¡æ
å¶åŸ¡æ㯠Ruby ã«äŒŒãŠãã.
if
if (x > 3) {
y <- 10
} else {
y <- 0
}
# cf.) = (condition) ? foo: bar;
y <- if (x > 3) {
10
} else {
0
}
For loops
for (i in 1:10) {
print (i)
}
matrix ã¯ä»¥äžã®ããã« loop ããã
x <- matrix (1:6, 2, 3)
for (i in seq_len (nrow (x))) {
for (j in seq_len (ncol (x))) {
print (x[i, j])
}
while loops
count <- 0
while (count < 10{)) {
print (count)
count <- count + 1
}
repeat/break/next
repeat 㯠infinite loop ãã€ããããã«å©çšãã. break, next ãšãã£ããã«å©çšãã.
ãªããžã§ã¯ã(ããŒã¿)
Atomic Classes of Objects
R ã«ã¯ 5 ã€ã®ã¢ãããã¯ãªãªããžã§ã¯ãããã.
- charactor
- numeric (real number)
- integer
- complex
- ligical (true/false)
-
Integer
Integer ã§è¡šçŸãããšãã¯, numeric ã®ããšã« L ãã€ãã.
-
Nan
Undefined valuable.(not a number).
-
Inf
InfâŠInfinity number
Basic Objects
-
valuables
x <- 5
-
vetor
c 㧠vector ãçæãã.
a <- c (0.5, 0.6) # numeric b <- c (TRUE, FALSE) # logial c <- 0:5 # integer d <- c ("a", "b", "c") #chalactor
åã®æ··åãèš±ã. tuple ã®ãããªæ©èœã䜵ãæã€.
a <- (1,7, "a") b <- (TRUE, "a")
x <- 0:6 class (x)
integer
-
list
vector ã®ç¹æ®ãªåœ¢. ç°ãªãåã® vector ãäžã€ã«ãŸãšãã.
x <- list (1, "a", TRUE, 1 + 4i) x
1 a TRUE 1+4i
-
Matrices
次å ã®æ§è³ªãã〠vector. matrix é¢æ°ã§çæ.
m <- matrix (nrow = 2, ncol = 3) m
nil nil nil nil nil nil
m <- matrix (1:6, nrow = 2, ncol = 3) m
1 3 5 2 4 6
-
dim
dim é¢æ°ãã€ãããš vector 㫠次å ã®æ§è³ªãäžããããšãã§ãã.
m <- 1:10 dim (m) <- c (2,5) m
1 3 5 7 9 2 4 6 8 10
-
cbind-ing and rbind-ing
cbind, rbind ãå©çšããŠã, vector ãã matrix ãçæã§ãã.
x <- 1:3 y <- 10:12 cbind (x, y)
1 10 2 11 3 12
rbind (x,y)
1 2 3 10 11 12
-
-
Factors
vector ã®ç¹æ®ãªããã¡. categorical data ãæ±ã.
integer vector ã«ã€ããŠ, ããããã® integer ã« label ããããããªãã®.
enum åæå ãšãããã.factor é¢æ°ã§äœæ.
x <- factor (c ("yes", "no", "no", "yes", "no"), labels = c ("yes", "no")) table (x)
yes 3 no 2
-
Data Frame
è€æ°ã®ãã¯ãã«ãããªããªã¹ã.
- ããŒã¿ãã¬ãŒã Tips å€§å š - RjpWiki
- Ruby - R ã®ããŒã¿ãã¬ãŒã (data.frame) ã«ã€ã㊠- Qiita
list ã®ç¹æ®ãªããã¡. list ã® list.
-
list ã®ãªãã®ãã¹ãŠã® list ãåã length ããã€å¿ èŠããã.
-
list ã®äžã® list 㯠column ãšã¿ãªããã.
-
list ã®äžã®åèŠçŽ ã®çªå·ã¯ row ãšã¿ãªããã.
-
éåžžã¯, rad.table (), read.csv ã«ãã£ãŠçæããã.
-
data.matrix (x) ã«ãã£ãŠ matrix åã«å€æã§ãã.
x <- data.frame (foo = 1:4, bar = c (T,T,F,F))
1 TRUE 2 TRUE 3 FALSE 4 FALSE
-
ã©ãã«ãååŸ
names (data)
-
æ¡ä»¶ãæå®ããŠããŒã¿ã®æœåº
adaltAnimalData <- animaldata[animaldata$Age.Intake>=1,]
-
ãã¬ãŒã ãããã¯ã¿ãŒãæœåº
distance <-student$distance
-
names
ãªããžã§ã¯ãã«ã¯ååãã€ããããšãã§ãã. å¯èªæ§ãåäžããã.
x <- 1:3 names (x) <- c ("foo", "bar", "norf")
x <- 1:3 names (x) <- c ("foo", "bar", "norf") m <- matrix (1:4 nrow = 2, ncol = 2) dimname (m) <- list (c ("a", "b"), c ("c", "d"))
split
ã«ããŽãªããšã« DataFrame ãåå²ãã.
Reading/Writing Data
Reading
read.csv CSV ãã¡ã€ã«ããèªã¿èŸŒã¿.
data <- read.csv ("foo.csv")
read.table R ãé©åœã«èªã¿èŸŒãã§ããã.
data <- read.table ("foo.txt")
100 è¡ã ãèªã¿èŸŒã.
initial <- read.table ("datatable.txt", nrows = 100)
Writing
dput, dump 㧠ããã¹ããã¡ã€ã«ãŠåºåã§ãã.
y <- data.frame (a = 1, b = "a")
dput (y)
1 a
Outside
Outsid World ãšã®ã€ã³ã¿ãã§ãŒã¹.
- file
- gzfile
- bzfile
- url
Connection ãå©çšããŠãã¡ã€ã«ãéãããšãã§ãã.
con <- file ("hw1_data.csv", "r")
data <- read.csv (con)
close (con)
website ããã URL ãæå®ããããšã§ããŒã¿ãååŸããããšãã§ãã.
con <- url ("http://www.jhsph.edu", "r")
data <- read.csv (con)
close (con)
Subsetting: éšåéå
ãµãã»ãã (éšåéå).
vector
x <- c ("a", "b", "c", "c", "d", "a")
x[1:4]
a
b
c
c
æ¡ä»¶ãæå®ããŠ, éšåãæœåºããããšãã§ãã.
x[x > "a"]
b
c
c
d
list
x <- list (foo = 1:4, bar = 0.6)
# index ã§æå®
x[1]
# $ã§æå®
x$bar
Marix
p
x <- matrix (1:6, 2, 3)
1 3 5
2 4 6
, ãå©çšããããšã§, è¡ãåã ãã vector ãšããŠæœåº.
x[1,]
1
3
5
NA Values ãåãé€ã
complete.cases ã§èª¿ã¹ã.
x <- c (1, 2, NA, 4, NA, 5)
y <- c ("a", "b", NA, "d", NA, "f")
good <- complete.cases (x, y)
good
, TRUE
TRUE
FALSE
TRUE
FALSE
TRUE
x[good]
1
2
4
5
Apply Functions
R ã§ã¯, for æãå©çšããªãã§, apply ãå©çšãã®ãã¹ããŒããªæ¹æ³
- è¡åã¿ã€ãã®ããŒã¿ãåŠçãã apply
- ããŒã¿ãã°ã«ãŒãããšã«ãŸãšããŠåŠçãã tapply
- ãã¯ãã«ããªã¹ãã«äžŠãã ããŒã¿ãé 次åŠçãã lapply ãš sapply
- è€æ°ã®ãã¯ãã«ããªã¹ãããããããã²ãšã€ã¥ã€ããŒã¿ããšãã ããŠãããããŸãšããŠåŠçãã mapply.
è¡åèšç®ããããããªãã®ã ãšã€ã¡ãŒãžããã.
apply (X, MARGIN, Fun, âŠ)
ãã¯ãã«ãè¡å, é åã® MARGIN ã«é¢æ°ãé©çšã, ãã®çµæã®é åããªã¹ããè¿ã.
é©çšãã察象㯠MARGIN ã§æå®ãã.
- MARGIN = 1 ãªãã°è¡
- MARGIN = 2 ãªãã°å
- MARGIN = c (1,2) ãªãã°åèŠçŽ
lapply (X, Fun, âŠ)
ãªã¹ãã«é¢æ°ãé©çšã, çµæã®ãªã¹ããè¿ã.
x <- list (a = 1:5, b = rnorm (10))
lapply (x, mean)
ç¡åé¢æ°ãé©çšã§ãã.
x <- list (a = matrix (1:4, 2, 2), b = matrix (1:6, 3, 2))
lapply (x, function (elt) elt[,1])
sapply (X, Fun, âŠ)
ãªã¹ãã«é¢æ°ãé©çšã, 以äžã®ãããããè¿ã.
- names å±æ§ä»ãã®ãã¯ãã«
- names å±æ§ä»ãã®è¡å
lapply ã«ååãã€ããŠè¿ã.
tapply (X, INDEX, é¢æ°, âŠ)
ã°ã«ãŒãåãããå€æ°ã«ã€ããŠ, ã°ã«ãŒãããšã«é¢æ°ãé©çšãã. INDEX 㯠X ã®èŠçŽ ãã°ã«ãŒãã«åããå åã®çµã¿åããã®ãªã¹ã (éåžžã¯æååãã¯ãã«) ãäžã, åã°ã«ãŒãã«é¢æ°ãé©çšããçµæããã¯ãã«ãããã¯ãªã¹ãã§è¿ã.
x <- c (rnorm (10), runif (10), rnorm (10, 1))
f <- gl (3,10)
tapply (x, f, mean)
Excel ã® vlookup ã¿ãããªã®ãæ³åããã°ãã.
mapply (Fun F , x, y, z, ⊠)
sapply () ã®å€å€éç. x, y, z, ã¯ãã¯ãã«ãè¡åãªã©ãè€æ°åæå®ã§ã, é¢æ° F (x, y, z, âŠ) ã®çµæããã¯ãã«ã®ãªã¹ãã§è¿ã.
Operations
vector
x <- 1:4, y <- 4:9
x + y
x * y
x / y
matrix
x <- matrix (1:4, 2, 2)
1 3
2 4
y <- matrix (rep (10, 4), 2, 2)
10 10
10 10
x * y
10 30
20 40
èšç®ç³»
ããŒã¿æ°
table (adaltAnimalData$Animal.Type)
äžå€®å€ã»å¹³åå€ã»æšæºåå·®
# æ倧å€
max (maleage)
# å¹³åå€
mean (animaldata$Age.Intake)
# äžå€®å€
median (animaldata$Age.Intake)
# æšæºåå·®: Standard Deviation
sd (animaldata$Age.Intake)
# fine number summery
fivenum (animaldata$Age.Intake)
# åæšäºå
¥
# å°æ°ç¹ 2 æ¡ãŸã§
round (data,2)
cor: å ±åæ£
cor (bull$YearsPro, bull$BuckOuts)
-
categorical data ã numeric data ãž
val <- as.numeric (var)
ãããªãã¯ã¹ã®äœæ
myvars <- c ('YearsPro', 'Events', 'BuckOuts')
cor (bull[,myvars])
æ€å®
zscore: Z æ€å®
zcat <- (13- mean (catWeight))/sd (catWeight)
1-pnorm (zcat)
確ç
table
åå²è¡š (Contingency Tables) ãäœæãã. èŠçŽ æ°ãã«ãŠã³ããã.
gtab <- table (acl$Grammy)
N | 67 |
---|---|
Y | 49 |
prop
確çååžè¡š (marginal table) ãäœæãã.
prop.table (gtab)
N | 0.577586206896552 |
---|---|
Y | 0.422413793103448 |
gtab2 <- table (acl$Grammy, acl$Gender)
prop.table (gtab2)
0.181034482758621 | 0.396551724137931 |
---|---|
0.120689655172414 | 0.301724137931034 |
æç»ç³»
æå ã®ã©ã€ãã©ãªã¯ 3 ã€ãã.
- Base: âartistâs paletteâ model
- Lattice: Entire plot specified by one function; conditioning
- ggplot2: Mixes elements of Base and Lattice
Base
plot (x, y), hist (x)
-
plot :Scatter Plot ãæå
# plot plot (bull$YearsPro, bull$BuckOuts, xlab='Years Pro', ylab='Buckouts', main='Plot of Years Buckouts') # with with (airquarity, plot (Ozon, Wind))
-
abline: è¿äŒŒæ²ç·ãã€ãã
abline (lm (bull$BuckOuts~bull$YearsPro))
-
hist: ãã¹ãã°ã©ã
hist (animaldata$Age.Intake, main="Histgram of Intage Ages", xlab="Age at Intake")
-
boxplot: ç®±ãã²å³
boxplot (Ozone ~ Month, airquarity, xlab="Month", ylab="Ozone (ppb)")
-
parameters
- `pch`: the plotting symbol (default is open circle)
- `lty`: the line type (default is solid line), can be dashed, dotted, etc.
- `lwd`: the line width, specified as an integer multiple
- `col`: the plotting color, specified as a number, string, or hex
- code; the `colors ()` function gives you a vector of colors by name
- `xlab`: character string for the x-axis label
- `ylab`: character string for the y-axis label
The `par ()` function is used to specify global graphics parameters that affect all plots in an R session. These parameters can be overridden when specified as arguments to specific plotting functions.
- `las`: the orientation of the axis labels on the plot
- `bg`: the background color
- `mar`: the margin size
- `oma`: the outer margin size (default is 0 for all sides)
- `mfrow`: number of plots per row, column (plots are filled row-wise)
- `mfcol`: number of plots per row, column (plots are filled column-wise)
-
è¡š
xtable library ãã€ãã.
Lattice
ã©ãã«. çžé¢é¢ä¿ãèŠèŠåãããšãã«, 圹ç«ã€ã©ã€ãã©ãª.
xyplot (y ~ x | f * g, data)
-
Lattice Panel Function
x/y ã®çžé¢é¢ä¿ãã¿ããšãã«åœ¹ç«ã€.
ããã«ããããã䞊ã¹ãŠåŸåãã¿ãããšãã§ãã.ãã¿ãŒã³ãã¿ã€ãã.
ggplot2
plot () ã®æ¹è¯ç. qprot ()
qplot ã䜿ãã°èªåã§ããŒã¿ã»ãããåæãããŠããæãã®ã°ã©ããäœæã§ãã. plot ã¯ãã¹ãŠãèªåã§æå®ããªããšãããªã.
-
qplot
Graphics File Devices
-
vector formats
line graphics ã«é©ããŠãã.
- svg
- win.metafie
- postscript
-
bitmap format
scatter graphics ã«é©ããŠãã.
- png
- jpeg
- tiff
- bmp
Bookmarks
coursera
ååž°åæ
ç·åœ¢ååž°çŽç·: linFit
linFit ãå©çšãã.
linFit (mens800$Year, mens800$Record)
ææ°ååž°æ²ç·
expFit ãå©çšãã.
expFit (time, mv)
# 以äžã§æ°å¹ŽåŸãäºæ³
expFitPrid (time, mv, 12)
ããžã¹ãã£ãã¯ååž°æ²ç·
logisticFit ãå©çšãã.
logisticFit (time, mv)
# 以äžã§æ°å¹ŽåŸãäºæ³
logisticFitPrid (time, mv, 12)
#+end_src
3 ã€ã®ååž°ç·ãåæã«è¡šç€ºãã
tripleFit ãå©çšãã.
tripleFit (time, mv)
ãµã³ãã«æœåº
1000 åã®è©Šè¡ã®ãªã㧠10 ååãåºã.
xbar10 <-rep (NA, 1000)
for (i in 1:1000)
{x <-sample (survey$name_letters, size =10)
xbar10[i] <- mean (x)}
t-testing
t.test (age, mu=30)
t.test (age, mu=30, alternative = 'less')
t.test (age, mu=30, alternative = 'greater')
aggregate
R ãçšããŠã°ã«ãŒãããšã«éèšããããšããå Žåã«çšãã.
çµ±èšéãããšãããšãã«å©çšãã T (X) where X = (x1,x2,x3,âŠ)
aggregate (x, by, FUN, âŠ)
- ããŒã¿ x ã
- ãªã¹ãæ§é ã® by ã®ã°ã«ãŒãããšã«,
- é¢æ° FUN ã§çµ±èšéãšããŠãŸãšãã
averages <- aggregate (x=list (steps=data$steps),
by=list (interval=data$interval),
FUN=mean)
aggregate (formula, data, FUN)
- formula ã
- data frame (data) ãã
- é¢æ° FUN ã§çµ±èšéãšããŠãŸãšãã
åè: fomula R-Source
ããŒã¿ã¯ãªãŒãã³ã°
sort
order / sort.list ãå©çšãã.
stateData <- stateData[order (stateData[,col]),]
äžæ£ãªå€ã®åé€
numeric ã§ãªããã° NA ãæ¿å ¥ãã.
data[, 11] <- as.numeric (data[, 11])
éè€é€å»
unique ãå©çšãã.
u <- unique (d)
Simulation
Randum Number
- dnorm
- pnorm
- qnorm
- rnorm
dnorm (x, mean=0 sd=1, log=FALSE)
pnorm (x, mean=0 sd=1, lower.tail=TRUE, log.p=FALSE)
qnorm (x, mean=0 sd=1, lower.tail=TRUE, log.p=FALSE)
rnotm (x, mean=0 sd=1)
-1 ~ 1 ã®é㧠10 ã®ã©ã³ãã å€æ°ã.
x <- rnorm (10)
å¹³åãšåæ£ãæå®.
x <- rnorm (10, 20, 2)
set.seed ãã»ãããããš, å®è¡ãããã³ã«æ¯åç°ãªãæ°ãåŸããã.
ex: Linier Models
y = b0 + b1*x + e
set.seed (20)
x <- rnorm (100)
e <- rnorm (100, 0, 2)
y <- 0.5 + 2 * x + e
Random Sampling
sample function 㧠æ¯éå£ã®ãªããããµã³ãã«ãã©ã³ãã ã«åãåºã.
set.seed (1)
sample (1:10, 4)
sample (letters, 4)
Debug
ess-tracebug
ess-tracebug ãå©çšãã.
BreakPoint ç³» ess-bp-xxx
str
ã³ã³ãã¯ãã«ãªããžã§ã¯ãã®å éšã®æ§é ã衚瀺ãã.
str (str)
summary
ãªããžã§ã¯ãã®å 容ãèŠçŽããŠè¡šç€º.
system.time
åŠçã«ããã£ãæéãèŠçŽããŠè¡šç€ºããŠããã.
Rprof
R ã® ãããã¡ã€ã©.
- Rprof ()
- summaryRpof ()
ðšHadley Wickham
dplyråã³é¢é£ã®ãããã©ã€ãã©ãªãéçºããŸãã£ãããšã§RèšèªãããŒã¿ãµã€ãšã³ã¹ã®çè ãžãšå°ããç·.
ð¡çŸœé³¥æ, 矜鳥ç¥ãšããŠåŽãå¥ãããŠãã.
ð§tidyverse/dplyr
ð¡çŸœé³¥æ
ããŒã¿ãµã€ãšã³ãã£ã¹ãã«æ§ããããããå ¥ä¿¡ããæãåº.
Tools
ð§data.table
ðpandasã®ãããªããŒã¿ãã¬ãŒã .
CRAN
ããã±ãŒãžãªããžããª. åœå ãµãŒãã®æå®.
options (repos="http://cran.md.tsukuba.ac.jp")
~/.Rprofile ã«ãããšæ¯åèªã¿èŸŒãŸãã.
Related
tags. ðProgLang ðDataScience