æšå®çµ±èšåŠãšã¯
Inferential Statistics.
åéã§ããããŒã¿ãå šäœã®ããŒã¿ (æ¯éå£) ããåŸãããäžéšã®éå£ãšèŠãªã, ãã®åŸããããµã³ãã«ããŒã¿ããå ã®æ¯éå£ã®æ§è³ªãåŸåãæšæž¬ããåé.
- è¿ä»£çµ±èšåŠãšããããã.
- 芳å¯è ã®ç®ã®åã«ããããŒã¿ã®èåŸã«åºããæ¯éå£ (population) ã«é¢ããæšæž¬ãè¡ãããã®æ¹æ³è«.
- å°ãããµã³ãã«ãã倧ããªæ¯éå£ã®æ§è³ªãåŸåãèŠã€ããããšãç®ç.
ææ³
æ¯æ°ãæšæž¬ããææ³ãšã㊠æšå® (estimation) ãš æ€å® (test) ããã.
- ðçµ±èšçæšå®: æ¯æ°ãã©ãã»ã©ã®å€ãªã®ããæšæž¬ããææ³
- ðçµ±èšç仮説æ€å®: æ¯æ°ãå®è³ªç§åŠçã«æå³ã®ããåºæºå€ãšçãããçãããªãããâÃåŒã§æšæž¬ããææ³
ãµããžã£ã³ã«
æšå®çµ±èšåŠã¯, ããã«çŽ°ãã以äžã«åãããã.
- Exploratory Data Analysis
- Predictive Data Analysis
- Casual Data Analysis
- Mechanistic Data Analysis
ðçµ±èšçæšå®
Estiamtion, æšå®. æ¯æ°ãã©ãã»ã©ã®å€ãªã®ããæšæž¬ããææ³.
æ®éã¯ç¹æšå®ã§æšå®ããŠãã, éèŠãªå Žåã«ã ãåºéæšå®ãè¡ãã®ãäžè¬ç.
ç¹æšå®
åºéæšå®
ðçµ±èšç仮説æ€å®
çµ±èšç仮説æ€å®, Hypothesis Tesing. æ€å®æ³. 仮説æ€å®.
ãã仮説ãæ£ãããšãã£ãŠãããã©ãããçµ±èšåŠçã»ç¢ºçè«çã«å€æããããã®ã¢ã«ãŽãªãºã .
仮説ãæ£ãããšä»®å®ããäžã§, ããã«åŸãæ¯éå£ãã, å®éã«èŠ³å¯ãããæšæ¬ãæœåºããã確çãæ±ã, ãã®å€ã«ããå€æãè¡ã. ãã®ç¢ºçãååã« (äºã決ããŠãããå€ãã) å°ãããã°, ã仮説ã¯æãç«ã¡ããããªãããšå€æã§ãã.
å®éã®åæã§ã¯, è¡ãªãåã«æ¯éå£ã®ç¹æ§ã«ã€ããŠãªãããã®äºæ³ããªãããŠããã®ãæ®é. ãããã仮説ãåæã§åŸãããçµæ (ã€ãŸã, æšæ¬çµ±èšé) ãšæŽåçã§ãããã調ã¹ãã®ã仮説æ€å®.
åž°ç¡ä»®èª¬ãã©ã®çšåºŠæ£ãããªãããå€å®ããäœæ¥. å€ãã®å Žå p å€ãç®åºããäœæ¥ãã®ãã®.
ãå·®ããã (A â B) ããšãã仮説ã蚌æããã -> ãå·®ããªã (A=B) ããšãã仮説ã確çè«çã«ççŸãã (çã§ããå¯èœæ§ãäœã) äºã蚌æãã
ãã€ãã³ãšãã¢ãœã³ãéçºããææ³.
ðæ€å®çµ±èšé
çµ±èšåŠçæ€å®ã«å©çšããããã«æšæ¬ããç®åºããçµ±èšé.
仮説æ€å®ã®æé
- State the null and alternative hypotheses.
- Select the appropriate significance level and check the test assumptions.
- Analyze the data and compute the test statistic.
- Interpret the result
Developing Hypothesis
仮説ãæ£ãããšä»®å®ããå Žåã«ãã®æšæ¬ã芳å¯ããã確çãç®åºã§ããããã«, 仮説ãçµ±èšåŠçã«è¡šçŸãã.以äžã® 2 ã€ã®çš®é¡ããã.
Null Hypothesis: åž°ç¡ä»®èª¬
蚌æããã仮説ã®å察ã®ä»®èª¬.
調æ»ãããæ¯éå£ã®çµ±èšéãšãµã³ãã«ããæ±ããçµ±èšéã®éã«éãããªãããšã蚌æãã.
ããããŒã¿ãåã£ãŠãããã®ã§ãäºã€ã®å€æ°ã®éã«é¢ä¿ããããããšããç©æ¥µçãªä»®èª¬ãç«ãŠãã®ã§ã¯ãªã, ããäºã€ã®å€æ°ã®éã«ã¯é¢ä¿ããªããã®ã§ãããŒã¿ã®åãã¯å¶ç¶çãããããšããåž°ç¡ä»®èª¬ãç«ãŠ, ãã®åž°ç¡ä»®èª¬ãããããŒã¿ã®åããå¶ç¶çããããšããã確çã¯ç¡èŠã§ããã»ã©äœãããšããè«çã§æ£åŽããããšã«ãã£ãŠ, äºéåŠå®ã®è«çã§é²ããŠãã.
Alternative Hypothesis: 察ç«ä»®èª¬
èªåã®èšŒæããã仮説.
Type 1 Error/ Type 2 Error:第äžçš®ã®èª€ããšç¬¬äºçš®ã®èª€ã
Null Hypothesis is True | Null Hypothesis is Faulse | |
---|---|---|
Reject Null Hypothesis | Type 1 Error | Correct Decision |
Do not Reject Null Hypothesis | Correct Decision | Type 2 Error |
åž°ç¡ä»®èª¬ã®æ€å® åž°ç¡ä»®èª¬ã®æ€å®â âæ£åŽ æ¡æâ â察ç«ä»®èª¬ãç åž°ç¡ä»®èª¬ãç
Critical Values: å±éºå€
å±éºåã¯åž°ç¡ä»®èª¬ãæ£åŽã§ããé å.
Critical regions are the areas under the distribution curve representing values that support the null hypothesis.
å±éºå€ã¯å±éºåã®ç«¯ã瀺ãæ°.
Critical values are values separating the values that support or reject the null hypothesis.
倧æµã®å Žå, ãã㯠0.05 (5%) ã«èšå®ãããããšãå€ã.
äž¡åŽæ€å®ã»çåŽæ€å®
äž¡åŽæ€å®
ãå¹³åãçããããšäž»åŒµããã¿ã€ãã§ããã°, ååžé¢æ°ã®è£ŸãšããŠå·Šå³äž¡åŽãçšãã.
H1:Ξâ Ξ 0
çåŽæ€å®
ã xxx ã®æ¹ãå¹³åã倧ãã (å°ãã) ãšããããšã¯ãªãããšäž»åŒµããã¿ã€ãã§ããã°, çåŽã®è£Ÿã ããçšãã (> or <).
H1:Ξ>Ξ 0, ãããã¯Îž<Ξ 0
ãã©ã¡ããªãã¯æ€å®æ³ã»ãã³ãã©ã¡ããªãã¯æ€å®æ³
ãã©ã¡ããªãã¯ãšã¯ âæ¯æ° (ãã©ã¡ãŒã¿ãŒ) ã«ããâ ãšããæå³.
æ€å®ã¯,以äžã®æé ã§å®æœãããã, æé 2 ãäž¡è ã§ç°ãªã. 1 åž°ç¡ä»®èª¬ã®èšå® 2 çµ±èšé T ã®èšç® 3 çµ±èšé T ãšæ£åŽåã®æ¯èŒã«ããåž°ç¡ä»®èª¬ã®æ¡çšãŸãã¯æ£åŽ
ãã©ã¡ããªãã¯æ€å®æ³
æ¯éå£ååžã«é¢ããŠ, æ£èŠååžãªã©ã®ããç¹å®ã®ååžãä»®å®ããŠçµ±èšçæ€å®ãè¡ãæ¹æ³.
ãã³ãã©ã¡ããªãã¯æ€å®æ³
æ¯éå£ååžã«é¢ããŠ, æ£èŠååžãªã©ã®ããç¹å®ã®ååžãä»®å®ããªãã§çµ±èšçæ€å®ãè¡ãæ¹æ³.
R ã«ãã³ãã©ã¡ããªãã¯æ€å®ãè¡ãé¢æ°ãããªãå®è£ ãããŠãã.
èªç±åºŠ
Z-Scores: Z æ€å®
Z å€, Z ã¹ã³ã¢ãšã. æ¯éå£ã®çµ±èšéãããã£ãŠãããšãã«å©çšãã.
z = åå·® / æšæºåå·®
<=> 2
z=(åã£ãåŸç¹-å¹³åç¹)/ æšæºåå·®
- -1.5 以äžã¯ 1 (å šäœã® 7%)
- -1.5~-0.5 㯠2 (å šäœã® 24%)
- -0.5~0.5 㯠3 (å šäœã® 38%)
- 0.5~1.5 㯠4 (å šäœã® 24%)
- 1.5 以äžã¯ 5 (å šäœã® 7%)
T-Scores: T æ€å®
åž°ç¡ä»®èª¬ãæ£ãããšä»®å®ããå Žåã«, çµ±èšéã t ååžã«åŸãããšãå©çšããçµ±èšåŠçæ€å®æ³ã®ç·ç§°.
ããŒã¿ X ããã³ããŒã¿ Y ã® 2 ã€ã®ããŒã¿éã®å¹³åå€ã«å·®ããããã©ãããæ€å®ããæ¹æ³.
ã¹ãã¥ãŒãã³ãã® t æ€å®ãšããã.
ããããã®ãµã³ãã«ã¯éããããªããæå ã«åéããå°èŠæš¡ãªãµã³ãã«ããæ€å®ããããªããããšããããŒãºã«å¿ãããã®ã§ããã¥ã©ãŒãªæ€å®æ¹æ³ã®ã²ãšã€.
t æ€å®ãã€ããããã®æ¡ä»¶
- æšæ¬ãæ£èŠååžã«ãããã£ãŠããããš (ã°ã©ããæžããŠç¢ºããã)
äžæšæ¬ t æ€å®
-
t-statistics
t = (æšæ¬å¹³å) - (åž°ç¡ä»®èª¬ã®ããšã§ã®æ¯éå£å¹³å)/ (æšæ¬èª€å·®)
äºæšæ¬ t æ€å®
2 æšæ¬ã«é¢ããæ€å®.
-
é¢é£ 2 矀åãæ¯éå£ã«å¯Ÿããæ€å®. xxx ãããåãšåŸã®ããŒã¿ãæ¯èŒããŠ, xxx ãå¹æããã£ããã©ãããå€å®ãã.
- 10 ååãš 10 ååŸ
- æåãšææ«
-
ç¬ç« 2 矀ç°ãªãæ¯éå£ã«å¯Ÿããæ€å®.
- A çµãš B çµ
- ç·ãšå¥³
F-Scores: F æ€å®
åž°ç¡ä»®èª¬ãæ£ãããã°çµ±èšéã F ååžã«åŸããããªçµ±èšåŠçæ€å®ã®ç·ç§°
-
F æ€å® - Wikipedia the variation between groups to the variation within groups.
Sum of Square (SS) = xx
Mean Square (MS) = SS/df
Bookmarks
Chi-Scores: ã«ã€äºä¹æ€å®
Categorical Data ããã¹ãããããã®æ¹æ³.
x^2 = sigma (observed - expected)^2/expected
Test of Independence: ç¬ç«æ§æ€å®
ã«ã€äºä¹æ€å®ã¯, 芳枬ãããåå²è¡šãã, äºã€ã®ç¢ºçå€æ°ãç¬ç«ãã©ããã枬ãææšã«ããªã.
芳枬ãããããŒã¿ã®ååžã¯, çè«å€ã®ååžãšã»ãŒåããšèŠãªããã ããã?
df = (row - 1) x (col - 1)
expected cell value = col x row / total numbers of sample.
ANOVA: åæ£åæ
Analisis of Variance.
芳枬ããŒã¿ã«ãããå€åã誀差å€åãšåèŠå ããã³ãããã®äº€äºäœçšã«ããå€åã«å解ããããšã«ãã£ãŠ, èŠå ããã³äº€äºäœçšã®å¹æãå€å®ãã.
ANOVA is an appropriate statistical measure when we want to compare the means of three or more populations at once.
Now ANOVA is a framework of testing, that can handle multiple situations.
åæ£åæã¯å®éšèšç»æ³ã«å¯æ¥ã«çµã³ã€ããçµ±èšè§£æã§ãããã, ãããé©çšãããã®ã¯å®éšããŒã¿ã§ããããšãã»ãšãã©. åæçµæããå°ãããçµè«ãšããŠå æé¢ä¿ã«èšåããã±ãŒã¹ãå€ã.
xx ãš xx ã«ã¯é¢ä¿ããã.
äºåŸæ€å®
äºåŸæ¯èŒ (Post hoc comparisons) æ¯èŒããå¹³åå€ã«ã€ããŠã®æ確ãªä»®èª¬ããªãå Žåã¯, ANOVA ã§æ¯èŒå¯Ÿè±¡ã決ããŠ, å€éæ¯èŒãè¡ã.
Tukeyâs test
äºãã«ææã«å·®ãããå¹³åãæ¢çŽ¢ããããã«åæ£åæ (ANOVA) ãšäœµçšããã.
F çµ±èšéãçšããªãå€éæ¯èŒ.
åªäœæ§æ€å®
ãã£ãã·ã£ãŒãèæ¡.
ðçµ±èšçæææ§
æææ§. çµ±èšçã«åªäœ. statistical significance.
確ççã«å¶ç¶ãšã¯èãã«ãã, æå³ããããšèããããããš.
ããããç§åŠçã«æ£ãããšã¯çµ±èšçåªäœãããããšãå€ããããã¯å¿çåŠããããã€ãã£ãŠããã.
på€
åž°ç¡ä»®èª¬ã®äžã§å®éã«ããŒã¿ããèšç®ãããçµ±èšéããã極端ãªçµ±èšéã芳枬ããã確ç.
ç§åŠãšã®é¢ä¿
ç§åŠçãªçµè«ã§ããããã«ã¯, é©åãªçµ±èšææ³ãçšããŠé©åã«ãææãªéãããããããšã瀺ããªããã°ãªããªã.
åŸã£ãŠç§åŠçãªæ¹æ³ã®å¯Ÿè±¡ã§ããããã«ã¯, é©åãªçµ±èšæ段ãè¡äœ¿ãåŸã察象ã§ããå¿ èŠããã.
- é©åãªã°ã«ãŒãåãã®èšå®
- é©åãªæ¹æ³ã§å®éšæ¡ä»¶ãæ¯ã/ ãã¶ãããš (æè¬ãæ¡ä»¶åºããã®åé¡)
- é©åãªçµ±èšææ³/ çµ±èšæ€å®éã®æ¡çš
- çµ±èšçãªææå·®ãåŸãããã«å¿ èŠãªå®éšäŸæ°ã®èšå®
- çµ±èšçãªçžé¢, å·®ç°ã®é©åãªè§£é
- å®éšããŒã¿ãé©åãªå¯èŠåæ段ã«ãŠå¯èŠåãããããš
éã«ããã°, ç§åŠçã«æ£ããããšãããããã«ã¯, çµ±èšçææ³ãå¹æç ãšããããšã .
References
ðRelated
- up: ðçµ±èšåŠ
- ðæšè«=inference.