ðããŒã¿ã©ã³ã°ãªã³ã°ãŸãšã
Data Wranging. ããŒã¿ã®æŽçãšå€æãåãããæŠå¿µ.
ã©ã³ã°ãªã³ã°ã«ã¯, ããšããšéŠ¬ãçãéããŠé£Œããªããæå³ããã. ã«ãŒããŒã€ã®ããã«ããŒã¿ãèªåšã«æãã¹ãã«.
ç°¡åã«ããŒã¿ã«ã¢ã¯ã»ã¹ããŠåæã§ããããã«, ä¹±éã§è€éãªããŒã¿ã»ãããæŽçããŠçµåããããã»ã¹.
ðããŒã¿ãã¬ãŒã ãã€ãã£ãå€æåŠç.
ããŒã¿ååŠç, ããŒã¿æŽåœ¢, ããŒã¿ã¯ãªãŒãã³ã°âŠ ãããããªå€æã®çšèªãæŽçãã.
åè«
ðããŒã¿ååŠç
æ©æ¢°åŠç¿ãããŒã¿ãã€ãã³ã°ã®å段éã§è¡ãããããŒã¿å€æ.
Data pre-processing.
ãã°ãã°, wrangingãšåãæå³ã«äœ¿ããããããª, ããããªæŠå¿µãå«ããã®.
ðããŒã¿æŽåœ¢
å€éšããååŸããããŒã¿(éæŽåœ¢ããŒã¿)ã衚圢åŒããŒã¿ã«æŽããããš.
ããªãã¡, ååãå€æ°ã§, åè¡ã芳枬å€ã«ãªã.
æ£èŠåãšã.
ðæŽåœ¢ããŒã¿
ðããŒã¿æŽåœ¢ãããããŒã¿ã¯æŽåœ¢ããŒã¿(tidy data)ãšèšããã.
ååãå€æ°ã§, åè¡ã芳枬å€ã®ããŒã¿.
ããããåŒã³åããã.
- ããŒãã«åœ¢åŒããŒã¿
- TabularããŒã¿
- 衚圢åŒããŒã¿
- æ§é åããŒã¿
- æ£èŠåããŒã¿
ðããŒã¿ã¯ãªãŒãã³ã°
å€ãå€ãæ¬ æå€ãè£ã.
ðããŒã¿å€æ
Data Preprocessing. æŽçããŒã¿ã«å¯ŸããããŒã¿å€æ.
- 察象ãšãªã芳枬ç¯å²ãçãã.
- æ¢åã®å€æ°(å)ããæ°ããªå€æ°(å)ã®è¿œå .
- èŠçŽçµ±èšéã®è¿œå .
ããŒã¿æœåº
ããŒã¿éçŽ
groupby, aggregateãšåŒã°ãããã®.
ææ³ã«ã¯2ã€ãã.
- groupbyã«éçŽé¢æ°ãæå®ãã(count/sumãªã©).
- windowé¢æ°ã«å¯Ÿå¿ããéçŽé¢æ°ãå©çš.
ãŠããŒã¯ã«ãŠã³ãéèš
ã«ããŽãªã«ã«ããŒã¿ã®éè€ãã®ãããæ°ã®éèš.
床æ°ååž
ä»»æã®éçŽããšã«åºçŸé »åºŠãèšç®. é£ç¶ããŒã¿ãéçŽå¥ã«é¢æ£åããäžã§ã®ãŠããŒã¯ã«ãŠã³ãéèš.
cf. ðãã¹ãã°ã©ã : Histgram
åèšå€ç®åº
cumsum, 环ç©ååž.
ããã³ã°åŠç
binning, ãã³åå²ãšããã.
é£ç¶å€ãä»»æã®å¢çå€ã§åºåãã«ããŽãªåãããŠé¢æ£å€ã«å€æããåŠçã®ããš.
ããã³ã°ã«ãããããçš®é¡ããã. äžè¬çãªã®ã¯ãã±ããã³ã«ãã(bucket:ãã±ã).
ããã³ã°ã«ãè²ã ããããã - éèš in hibernation