ubuntu@HDClient:~$ cat MP0101A07.xml
:::
<項目別_Iterm>2017M10</項目別_Iterm>
<總計_Total>3.75</總計_Total>
<男_Male>3.97</男_Male>
<女_Female>3.47</女_Female>
<age_15-19>8.03</age_15-19>
<age_20-24>12.33</age_20-24>
<age_25-29>6.55</age_25-29>
<age_30-34>3.48</age_30-34>
<age_35-39>3.3</age_35-39>
<age_40-44>2.66</age_40-44>
<age_45-49>2.21</age_45-49>
<age_50-54>2.03</age_50-54>
<age_55-59>1.7</age_55-59>
<age_60-64>1.69</age_60-64>
<age_65_over>0.11</age_65_over>
<國中及以下_Junior_high_and_below>2.86</國中及以下_Junior_high_and_below>
<國小及以下_Primary_school_and_below>2.1</國小及以下_Primary_school_and_below>
<國中_Junior_high>3.26</國中_Junior_high>
<高中_職_Senior_high_and_vocational>3.7</高中_職_Senior_high_and_vocational>
<高中_Senior_high>3.84</高中_Senior_high>
<高職_vocational>3.65</高職_vocational>
<大專及以上_Junior_college_and_above>4.07</大專及以上_Junior_college_and_above>
<專科_Junior_college>2.75</專科_Junior_college>
<大學及以上_University_and_above>4.67</大學及以上_University_and_above>
</失業率>
Install the package for transfer XML files
ubuntu@HDClient:~$ sudo apt-get install xsltproc
ubuntu@HDClient:~$ cat unemployment.xslt
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" indent="no"/>
<xsl:template match="/">
<xsl:for-each select="//失業率">
<xsl:value-of select="concat(項目別_Iterm,',',總計_Total,',',男_Male,',',女_Female,',',age_15-19,',',age_20-24,',',age_25-29,',',age_30-34,',',age_35-39,',',age_40-44,',',age_45-49,' ')"/>
</xsl:for-each>
</xsl:template>
ubuntu@HDClient:~$ xsltproc unemployment.xslt MP0101A07.xml
:::
2017M08,3.89,4.1,3.63,8.72,13.18,6.71,3.5,3.32,2.77,2.34
2017M09,3.77,3.97,3.51,8.41,12.7,6.57,3.4,3.26,2.71,2.27
2017M10,3.75,3.97,3.47,8.03,12.33,6.55,3.48,3.3,2.66,2.21
ubuntu@HDClient:~$ hdfs dfs -put unemployment.txt unemployment.txt
ubuntu@HDClient:~$ hdfs dfs -ls unem*.txt
-rw-r--r-- 3 ubuntu supergroup 30187 2017-12-14 08:21 unemployment.txt
ubuntu@HDClient:~$pig
:::
找出失業率最低的十個月份
找出失業率最低的十個月份
grunt> d1 = LOAD 'unemployment.txt' USING PigStorage(',') AS (y:chararray, avg:float) ;
grunt> s1 = ORDER d1 by avg ;
grunt> head10 = LIMIT s1 10 ;
grunt> dump head10 ;
:::
ne.util.MapRedUtil - Total input paths to process : 1
(2017,)
(1981M04,0.86)
(1980M04,0.93)
(1980M01,0.95)
(1981M01,0.96)
(1981M05,1.01)
(1980M03,1.06)
(1979M04,1.09)
(1981M03,1.09)
(1980M02,1.1)
找出失業率最高十個月份
2017-12-29 09:22:45,591 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(2009M08,6.13)
(2009M07,6.07)
(2009M09,6.04)
(2009M10,5.96)
(2009M06,5.94)
(2009M11,5.86)
(2009,5.85)
(2009M05,5.82)
(2009M03,5.81)
(2010M02,5.76)
2017-12-29 09:22:45,623 [main] INFO org.apache.pig.Main - Pig script completed in 25 seconds and 139 milliseconds (25139 ms)
找出失業率最高十年
skill- transfer schema y:int + ilter d1 by y is not null 把有月份的過濾掉
ubuntu@HDClient:~$ cat unemployment10y.pig
d1 = LOAD 'unemployment07.txt' USING PigStorage(',') AS (y:int, avg:float) ;
b1 = filter d1 by y is not null ;
s1 = ORDER b1 by avg desc ;
head10 = LIMIT s1 10 ;
dump head10 ;
ubuntu@HDClient:~$ pig unemployment10y.pig
:::
2017-12-29 09:51:45,644 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input files to process : 1
2017-12-29 09:51:45,644 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(2009,5.85)
(2010,5.21)
(2002,5.17)
(2003,4.99)
(2001,4.57)
(2004,4.44)
(2011,4.39)
(2012,4.24)
(2013,4.18)
(2008,4.14)
2017-12-29 09:51:45,684 [main] INFO org.apache.pig.Main - Pig script completed in 25 seconds and 368 milliseconds (25368 ms)
找出失業率最高十個月份
ubuntu@HDClient:~$ cat unemployment10y.pig
d1= LOAD 'unemployment.txt' USING PigStorage(',') AS (y:chararray, avg:float) ;
s1 = ORDER d1 by avg desc ;
head10 = LIMIT s1 10 ;
dump head10 ;
ubuntu@HDClient:~$ pig -f unemployment10y.pig
:::2017-12-29 09:22:45,591 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(2009M08,6.13)
(2009M07,6.07)
(2009M09,6.04)
(2009M10,5.96)
(2009M06,5.94)
(2009M11,5.86)
(2009,5.85)
(2009M05,5.82)
(2009M03,5.81)
(2010M02,5.76)
2017-12-29 09:22:45,623 [main] INFO org.apache.pig.Main - Pig script completed in 25 seconds and 139 milliseconds (25139 ms)
找出失業率最高十年
skill- transfer schema y:int + ilter d1 by y is not null 把有月份的過濾掉
ubuntu@HDClient:~$ cat unemployment10y.pig
d1 = LOAD 'unemployment07.txt' USING PigStorage(',') AS (y:int, avg:float) ;
b1 = filter d1 by y is not null ;
s1 = ORDER b1 by avg desc ;
head10 = LIMIT s1 10 ;
dump head10 ;
:::
2017-12-29 09:51:45,644 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input files to process : 1
2017-12-29 09:51:45,644 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(2009,5.85)
(2010,5.21)
(2002,5.17)
(2003,4.99)
(2001,4.57)
(2004,4.44)
(2011,4.39)
(2012,4.24)
(2013,4.18)
(2008,4.14)
2017-12-29 09:51:45,684 [main] INFO org.apache.pig.Main - Pig script completed in 25 seconds and 368 milliseconds (25368 ms)
沒有留言:
張貼留言