2017年12月12日 星期二

Hadoop Client 建置 - Pig 基本使用方法

Prepare Hadoop Client

prereq.

  1. Install Hadoop and JDK package
  2. Configure PATH (JAVA_HOME and HADOOP_HOME)
  3. Edit core-site.xml
  4. Edit /etc/hosts to resolve Hadoop hosts 


unzip JDK and Hadoop to /home/ubuntu directory
Edit .bashrc file
ubuntu@HDClient:~$ echo $PATH
/home/ubuntu/bin:/home/ubuntu/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/home/ubuntu/jre1.8.0_151/bin:/home/ubuntu/hadoop-2.8.2/bin:/home/ubuntu/hadoop-2.8.2/sbin

Ensure java -version
Ensure hadoop version

Edit ubuntu@HDClient:~$ sudo more /home/ubuntu/hadoop-2.8.2/etc/hadoop/core-site.xml
:::
<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://nn:8020</value>
        </property>
</configuration>

Hadoop Client Connection Testing
ubuntu@HDClient:~$ hdfs dfsadmin -report
Configured Capacity: 51908788224 (48.34 GB)
Present Capacity: 27684233216 (25.78 GB)
DFS Remaining: 27683569664 (25.78 GB)
DFS Used: 663552 (648 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (2):

Name: 172.16.1.210:50010 (dn01)
Hostname: dn01
Decommission Status : Normal
Configured Capacity: 25954394112 (24.17 GB)
DFS Used: 331776 (324 KB)
Non DFS Used: 12095500288 (11.26 GB)
DFS Remaining: 13841784832 (12.89 GB)
DFS Used%: 0.00%
DFS Remaining%: 53.33%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Tue Dec 12 09:16:45 UTC 2017


Name: 172.16.1.211:50010 (dn02)
Hostname: dn02
Decommission Status : Normal
Configured Capacity: 25954394112 (24.17 GB)
DFS Used: 331776 (324 KB)
Non DFS Used: 12095500288 (11.26 GB)
DFS Remaining: 13841784832 (12.89 GB)
DFS Used%: 0.00%
DFS Remaining%: 53.33%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Tue Dec 12 09:16:45 UTC 2017

Hadoop Client Connection Testing
ubuntu@HDClient:~$ hdfs dfs -ls /
Found 3 items
drwxr-xr-x   - ubuntu supergroup          0 2017-12-11 03:17 /test
drwx------   - ubuntu supergroup          0 2017-12-11 10:04 /tmp
drwxr-xr-x   - ubuntu supergroup          0 2017-12-11 10:04 /user
ubuntu@HDClient:~$ hdfs dfs -ls /test
Found 3 items
-rw-r--r--   2 ubuntu supergroup      11068 2017-12-11 03:17 /test/.bash_history
-rw-r--r--   2 ubuntu supergroup        220 2017-12-11 03:17 /test/.bash_logout
-rw-r--r--   2 ubuntu supergroup       3986 2017-12-11 03:17 /test/.bashrc

Launch Pig interactive mode
ubuntu@HDClient:~$ pig
17/12/13 08:27:38 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
17/12/13 08:27:38 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
17/12/13 08:27:38 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2017-12-13 08:27:38,086 [main] INFO  org.apache.pig.Main - Apache Pig version 0.17.0 (r1797386) compiled Jun 02 2017, 15:41:58
2017-12-13 08:27:38,086 [main] INFO  org.apache.pig.Main - Logging error messages to: /home/ubuntu/pig_1513153658083.log
2017-12-13 08:27:38,113 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /home/ubuntu/.pigbootup not found
2017-12-13 08:27:38,778 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2017-12-13 08:27:38,778 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://nn:8020
2017-12-13 08:27:39,416 [main] INFO  org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-b248c438-3f49-47e0-9760-7dc347d818b2
2017-12-13 08:27:39,416 [main] WARN  org.apache.pig.PigServer - ATS is disabled since yarn.timeline-service.enabled set to false

###sudo cat /opt/hadoop-2.8.2/etc/hadoop/yarn-site.xml
<property>
  <description>Indicate to clients whether Timeline service is enabled or not.
  If enabled, the TimelineClient library used by end-users will post entities
  and events to the Timeline server.</description>
  <name>yarn.timeline-service.enabled</name>
  <value>true</value>
</property>

grunt> pwd
hdfs://nn:8020/user/ubuntu

grunt> sh ls -al
::
total 225304
drwxr-xr-x  9 ubuntu ubuntu      4096 Dec 13 07:19 .
drwxr-xr-x  3 root   root        4096 Dec  5 10:29 ..
drwxrwxr-x  2 ubuntu ubuntu      4096 Dec  6 08:20 archive
-rw-rw-r--  1 ubuntu ubuntu      1202 Dec  7 10:36 authorized_keys
-rw-------  1 ubuntu ubuntu     12878 Dec 13 07:20 .bash_history
-rw-r--r--  1 ubuntu ubuntu       220 Aug 31  2015 .bash_logout
-rw-r--r--  1 ubuntu ubuntu      4049 Dec 13 07:18 .bashrc
:::

grunt> cd hdfs:///
grunt> ls
hdfs://nn:8020/test     <dir>
hdfs://nn:8020/tmp      <dir>
hdfs://nn:8020/user     <dir>

grunt> cd test
grunt> copyFromLocal /etc/passwd .

grunt> takeInfo = LOAD 'passwd' USING PigStorage(':') AS (user:chararray, passwd:chararray, uid:int, gid:int, userinfo:chararray, home:chararray, shell:chararray) ;

grunt> dump takeInfo ;

2017-12-13 08:57:08,456 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2017-12-13 08:57:08,472 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2017-12-13 08:57:08,472 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NestedLimitOptimizer, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2017-12-13 08:57:08,474 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2017-12-13 08:57:08,475 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2017-12-13 08:57:08,475 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2017-12-13 08:57:08,486 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2017-12-13 08:57:08,487 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2017-12-13 08:57:08,487 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2017-12-13 08:57:08,487 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2017-12-13 08:57:08,569 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/ubuntu/pig-0.17.0/pig-0.17.0-core-h2.jar to DistributedCache through /tmp/temp-942992935/tmp216607249/pig-0.17.0-core-h2.jar
2017-12-13 08:57:08,597 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/ubuntu/pig-0.17.0/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-942992935/tmp-815792358/automaton-1.11-8.jar
2017-12-13 08:57:08,625 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/ubuntu/pig-0.17.0/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp-942992935/tmp-1770634992/antlr-runtime-3.4.jar
2017-12-13 08:57:08,666 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/ubuntu/pig-0.17.0/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp-942992935/tmp1684442717/joda-time-2.9.3.jar
2017-12-13 08:57:08,667 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2017-12-13 08:57:08,668 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2017-12-13 08:57:08,669 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2017-12-13 08:57:08,669 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2017-12-13 08:57:08,685 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2017-12-13 08:57:08,687 [JobControl] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2017-12-13 08:57:08,706 [JobControl] WARN  org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2017-12-13 08:57:08,727 [JobControl] INFO  org.apache.pig.builtin.PigStorage - Using PigTextInputFormat
2017-12-13 08:57:08,729 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input files to process : 1
2017-12-13 08:57:08,729 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2017-12-13 08:57:08,730 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2017-12-13 08:57:08,732 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2017-12-13 08:57:08,744 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_local925571862_0002
2017-12-13 08:57:08,870 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-ubuntu/mapred/local/1513155428781/pig-0.17.0-core-h2.jar <- /home/ubuntu/pig-0.17.0-core-h2.jar
2017-12-13 08:57:08,886 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://nn:8020/tmp/temp-942992935/tmp216607249/pig-0.17.0-core-h2.jar as file:/tmp/hadoop-ubuntu/mapred/local/1513155428781/pig-0.17.0-core-h2.jar
2017-12-13 08:57:08,894 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-ubuntu/mapred/local/1513155428782/automaton-1.11-8.jar <- /home/ubuntu/automaton-1.11-8.jar
2017-12-13 08:57:08,907 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://nn:8020/tmp/temp-942992935/tmp-815792358/automaton-1.11-8.jar as file:/tmp/hadoop-ubuntu/mapred/local/1513155428782/automaton-1.11-8.jar
2017-12-13 08:57:08,907 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-ubuntu/mapred/local/1513155428783/antlr-runtime-3.4.jar <- /home/ubuntu/antlr-runtime-3.4.jar
2017-12-13 08:57:08,910 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://nn:8020/tmp/temp-942992935/tmp-1770634992/antlr-runtime-3.4.jar as file:/tmp/hadoop-ubuntu/mapred/local/1513155428783/antlr-runtime-3.4.jar
2017-12-13 08:57:08,910 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-ubuntu/mapred/local/1513155428784/joda-time-2.9.3.jar <- /home/ubuntu/joda-time-2.9.3.jar
2017-12-13 08:57:08,911 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://nn:8020/tmp/temp-942992935/tmp1684442717/joda-time-2.9.3.jar as file:/tmp/hadoop-ubuntu/mapred/local/1513155428784/joda-time-2.9.3.jar
2017-12-13 08:57:08,952 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-ubuntu/mapred/local/1513155428781/pig-0.17.0-core-h2.jar
2017-12-13 08:57:08,953 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-ubuntu/mapred/local/1513155428782/automaton-1.11-8.jar
2017-12-13 08:57:08,953 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-ubuntu/mapred/local/1513155428783/antlr-runtime-3.4.jar
2017-12-13 08:57:08,953 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-ubuntu/mapred/local/1513155428784/joda-time-2.9.3.jar
2017-12-13 08:57:08,953 [JobControl] INFO  org.apache.hadoop.mapreduce.Job - The url to track the job: http://localhost:8080/
2017-12-13 08:57:08,958 [Thread-63] INFO  org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter set in config null
2017-12-13 08:57:08,966 [Thread-63] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2017-12-13 08:57:08,967 [Thread-63] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1
2017-12-13 08:57:08,967 [Thread-63] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2017-12-13 08:57:08,967 [Thread-63] INFO  org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter is org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter
2017-12-13 08:57:08,971 [Thread-63] INFO  org.apache.hadoop.mapred.LocalJobRunner - Waiting for map tasks
2017-12-13 08:57:08,972 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.mapred.LocalJobRunner - Starting task: attempt_local925571862_0002_m_000000_0
2017-12-13 08:57:08,985 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1
2017-12-13 08:57:08,986 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2017-12-13 08:57:08,988 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.mapred.Task -  Using ResourceCalculatorProcessTree : [ ]
2017-12-13 08:57:08,991 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.mapred.MapTask - Processing split: Number of splits :1
Total Length = 1374
Input split[0]:
   Length = 1374
   ClassName: org.apache.hadoop.mapreduce.lib.input.FileSplit
   Locations:

-----------------------

2017-12-13 08:57:08,998 [LocalJobRunner Map Task Executor #0] INFO  org.apache.pig.builtin.PigStorage - Using PigTextInputFormat
2017-12-13 08:57:08,998 [LocalJobRunner Map Task Executor #0] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed hdfs://nn:8020/test/passwd:0+1374
2017-12-13 08:57:09,004 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1
2017-12-13 08:57:09,004 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2017-12-13 08:57:09,015 [LocalJobRunner Map Task Executor #0] INFO  org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (PS Old Gen) of size 699400192 to monitor. collectionUsageThreshold = 489580128, usageThreshold = 489580128
2017-12-13 08:57:09,016 [LocalJobRunner Map Task Executor #0] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2017-12-13 08:57:09,024 [LocalJobRunner Map Task Executor #0] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: takeInfo[1,11],takeInfo[-1,-1] C:  R:
2017-12-13 08:57:09,034 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.mapred.LocalJobRunner -
2017-12-13 08:57:09,070 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.mapred.Task - Task:attempt_local925571862_0002_m_000000_0 is done. And is in the process of committing
2017-12-13 08:57:09,074 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.mapred.LocalJobRunner -
2017-12-13 08:57:09,076 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.mapred.Task - Task attempt_local925571862_0002_m_000000_0 is allowed to commit now
2017-12-13 08:57:09,084 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local925571862_0002_m_000000_0' to hdfs://nn:8020/tmp/temp-942992935/tmp326033959/_temporary/0/task_local925571862_0002_m_000000
2017-12-13 08:57:09,085 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.mapred.LocalJobRunner - map
2017-12-13 08:57:09,085 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.mapred.Task - Task 'attempt_local925571862_0002_m_000000_0' done.
2017-12-13 08:57:09,085 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.mapred.LocalJobRunner - Finishing task: attempt_local925571862_0002_m_000000_0
2017-12-13 08:57:09,085 [Thread-63] INFO  org.apache.hadoop.mapred.LocalJobRunner - map task executor complete.
2017-12-13 08:57:09,186 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local925571862_0002
2017-12-13 08:57:09,186 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases takeInfo
2017-12-13 08:57:09,186 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: takeInfo[1,11],takeInfo[-1,-1] C:  R:
2017-12-13 08:57:09,188 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2017-12-13 08:57:09,188 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_local925571862_0002]
2017-12-13 08:57:14,194 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2017-12-13 08:57:14,195 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2017-12-13 08:57:14,196 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2017-12-13 08:57:14,199 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2017-12-13 08:57:14,200 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:

HadoopVersion   PigVersion      UserId  StartedAt                         FinishedAt                    Features
2.8.2                    0.17.0              ubuntu  2017-12-13 08:57:08     2017-12-13 08:57:14    UNKNOWN

Success!

Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTime      AvgMapTime      MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime   MedianReducetime       Alias    Feature Outputs
job_local925571862_0002 1       0       n/a     n/a     n/a     n/a     0      00       0       takeInfo        MAP_ONLY        hdfs://nn:8020/tmp/temp-942992935/tmp326033959,

Input(s):
Successfully read 26 records (11515989 bytes) from: "hdfs://nn:8020/test/passwd"

Output(s):
Successfully stored 26 records (11516292 bytes) in: "hdfs://nn:8020/tmp/temp-942992935/tmp326033959"

Counters:
Total records written : 26
Total bytes written : 11516292
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_local925571862_0002


2017-12-13 08:57:14,201 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2017-12-13 08:57:14,202 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2017-12-13 08:57:14,202 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2017-12-13 08:57:14,206 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2017-12-13 08:57:14,207 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2017-12-13 08:57:14,210 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input files to process : 1
2017-12-13 08:57:14,210 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(root,x,0,0,root,/root,/bin/bash)
(daemon,x,1,1,daemon,/usr/sbin,/usr/sbin/nologin)
(bin,x,2,2,bin,/bin,/usr/sbin/nologin)
:::
(ubuntu,x,1000,1000,,/home/ubuntu,/bin/bash)

grunt> group_shell = GROUP takeInfo BY shell ;
grunt> dump group_shell ;
:::
ne.util.MapRedUtil - Total input paths to process : 1
(/bin/bash,{(root,x,0,0,root,/root,/bin/bash),(ubuntu,x,1000,1000,,/home/ubuntu,/bin/bash)})
(/bin/sync,{(sync,x,4,65534,sync,/bin,/bin/sync)})
(/bin/false,{(systemd-resolve,x,102,104,systemd Resolver,,,,/run/systemd/resolve,/bin/false),(systemd-network,x,101,103,systemd Network Management,,,,/run/systemd/netif,/bin/false),(systemd-timesync,x,100,102,systemd Time Synchronization,,,,/run/systemd,/bin/false),(syslog,x,104,108,,/home/syslog,/bin/false),(_apt,x,105,65534,,/nonexistent,/bin/false),(systemd-bus-proxy,x,103,105,systemd Bus Proxy,,,,/run/systemd,/bin/false)})
(/usr/sbin/nologin,{(proxy,x,13,13,proxy,/bin,/usr/sbin/nologin),(nobody,x,65534,65534,nobody,/nonexistent,/usr/sbin/nologin),(gnats,x,41,41,Gnats Bug-Reporting System (admin),/var/lib/gnats,/usr/sbin/nologin),(irc,x,39,39,ircd,/var/run/ircd,/usr/sbin/nologin),(list,x,38,38,Mailing List Manager,/var/list,/usr/sbin/nologin),(backup,x,34,34,backup,/var/backups,/usr/sbin/nologin),(www-data,x,33,33,www-data,/var/www,/usr/sbin/nologin),(uucp,x,10,10,uucp,/var/spool/uucp,/usr/sbin/nologin),(news,x,9,9,news,/var/spool/news,/usr/sbin/nologin),(mail,x,8,8,mail,/var/mail,/usr/sbin/nologin),(lp,x,7,7,lp,/var/spool/lpd,/usr/sbin/nologin),(man,x,6,12,man,/var/cache/man,/usr/sbin/nologin),(games,x,5,60,games,/usr/games,/usr/sbin/nologin),(sys,x,3,3,sys,/dev,/usr/sbin/nologin),(bin,x,2,2,bin,/bin,/usr/sbin/nologin),(daemon,x,1,1,daemon,/usr/sbin,/usr/sbin/nologin),(sshd,x,106,65534,,/var/run/sshd,/usr/sbin/nologin)})
:::

grunt> count = foreach group_shell generate group,COUNT(takeInfo) ;
grunt> dump count  ;
:::
2017-12-13 09:06:04,223 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(/bin/bash,2)
(/bin/sync,1)
(/bin/false,6)
(/usr/sbin/nologin,17)






沒有留言:

張貼留言

check_systemv1.1

 check_systemv1.1.bat 可用於電腦資產盤點 @echo off REM 後續命令使用的是:UTF-8編碼 chcp 65001 echo ***Thanks for your cooperation*** echo ***感謝你的合作*** timeout 1...