-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathatom.xml
548 lines (452 loc) · 85.2 KB
/
atom.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>EsharEditor</title>
<subtitle>Do whatever you want.</subtitle>
<link href="/atom.xml" rel="self"/>
<link href="https://eshareditor.github.io/"/>
<updated>2016-10-12T09:40:15.523Z</updated>
<id>https://eshareditor.github.io/</id>
<author>
<name>Kyle Joe</name>
<email>[email protected]</email>
</author>
<generator uri="http://hexo.io/">Hexo</generator>
<entry>
<title>Algorithm-Complexity</title>
<link href="https://eshareditor.github.io/2016/10/12/Algorithm-Complexity/"/>
<id>https://eshareditor.github.io/2016/10/12/Algorithm-Complexity/</id>
<published>2016-10-12T09:05:19.000Z</published>
<updated>2016-10-12T09:40:15.523Z</updated>
<content type="html"><![CDATA[<p>这篇文章覆盖了计算机科学里面常见算法的时间和空间的大O(Big-O) 复杂度。</p>
<img src="/uploads/Algorithm-Complexity/Algorithm-Complexity.jpg" width="660" height="500">
<a id="more"></a>
<h2 id="图例注解"><a href="#图例注解" class="headerlink" title="图例注解"></a>图例注解</h2><img src="/uploads/Algorithm-Complexity/Annotation.jpg">
<h2 id="数据结构操作"><a href="#数据结构操作" class="headerlink" title="数据结构操作"></a>数据结构操作</h2><img src="/uploads/Algorithm-Complexity/Data_Structure.jpg" width="660" height="500">
<h2 id="数组排序算法"><a href="#数组排序算法" class="headerlink" title="数组排序算法"></a>数组排序算法</h2><img src="/uploads/Algorithm-Complexity/Array_Sort.jpg" width="660" height="500">
<h2 id="图操作"><a href="#图操作" class="headerlink" title="图操作"></a>图操作</h2><img src="/uploads/Algorithm-Complexity/Graph_Manipulation.jpg" width="660" height="500">
<h2 id="堆操作"><a href="#堆操作" class="headerlink" title="堆操作"></a>堆操作</h2><img src="/uploads/Algorithm-Complexity/Heap_Operation.jpg" width="660" height="500">
<h2 id="大O复杂度表"><a href="#大O复杂度表" class="headerlink" title="大O复杂度表"></a>大O复杂度表</h2><img src="/uploads/Algorithm-Complexity/O_Complexity.jpg" width="660" height="500">
<p>参考链接:<br><a href="https://linux.cn/article-7480-1.html" title="https://linux.cn/article-7480-1.html" target="_blank" rel="external">https://linux.cn/article-7480-1.html</a><br><a href="http://bigocheatsheet.com/" title="http://bigocheatsheet.com/" target="_blank" rel="external">http://bigocheatsheet.com/</a></p>
]]></content>
<summary type="html">
<p>这篇文章覆盖了计算机科学里面常见算法的时间和空间的大O(Big-O) 复杂度。</p>
<img src="/uploads/Algorithm-Complexity/Algorithm-Complexity.jpg" width="660" height="500">
</summary>
<category term="Arithmetic" scheme="https://eshareditor.github.io/categories/Arithmetic/"/>
<category term="Arithmetic" scheme="https://eshareditor.github.io/tags/Arithmetic/"/>
</entry>
<entry>
<title>Ambari-Rest-API</title>
<link href="https://eshareditor.github.io/2016/10/11/Ambari-Rest-API/"/>
<id>https://eshareditor.github.io/2016/10/11/Ambari-Rest-API/</id>
<published>2016-10-11T07:22:40.000Z</published>
<updated>2016-10-11T07:43:53.513Z</updated>
<content type="html"><![CDATA[<p>本文记录了一些Ambari常用的Rest API。</p>
<img src="/uploads/Ambari-Rest-API/Ambari-Rest-API.png">
<a id="more"></a>
<h3 id="查看服务状态"><a href="#查看服务状态" class="headerlink" title="查看服务状态"></a>查看服务状态</h3><figure class="highlight nsis"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line">查看所有服务信息:(注意服务名大写)</div><div class="line">curl -u <span class="literal">admin</span>:<span class="variable">$PASSWORD</span> -H <span class="string">"X-Requested-By: ambari"</span> -X GET http://<span class="variable">$AMBARI_HOST</span>:<span class="number">8080</span>/api/v1/clusters/<span class="variable">$CLUSTER_NAME</span>/services</div><div class="line">查看某个服务的信息:</div><div class="line">curl -u <span class="literal">admin</span>:<span class="variable">$PASSWORD</span> -H <span class="string">"X-Requested-By: ambari"</span> -X GET http://<span class="variable">$AMBARI_HOST</span>:<span class="number">8080</span>/api/v1/clusters/<span class="variable">$CLUSTER_NAME</span>/services/<span class="variable">$SERVICE_NAME</span></div></pre></td></tr></table></figure>
<h3 id="停止服务"><a href="#停止服务" class="headerlink" title="停止服务"></a>停止服务</h3><figure class="highlight nsis"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">curl -u <span class="literal">admin</span>:<span class="variable">$PASSWORD</span> -i -H 'X-Requested-By: ambari' -X PUT -d '{<span class="string">"RequestInfo"</span>: {<span class="string">"context"</span> :<span class="string">"Stop Service"</span>}, <span class="string">"Body"</span>: {<span class="string">"ServiceInfo"</span>: {<span class="string">"state"</span>: <span class="string">"INSTALLED"</span>}}}' http://<span class="variable">$AMBARI_HOST</span>:<span class="number">8080</span>/api/v1/clusters/<span class="variable">$CLUSTER_NAME</span>/services/<span class="variable">$SERVICE_NAME</span></div></pre></td></tr></table></figure>
<h3 id="删除服务"><a href="#删除服务" class="headerlink" title="删除服务"></a>删除服务</h3><figure class="highlight awk"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line">删除某服务</div><div class="line">curl -u admin:<span class="variable">$PASSWORD</span> -i -H <span class="string">'X-Requested-By: ambari'</span> -X DELETE http:<span class="regexp">//</span><span class="variable">$AMBARI_HOST</span>:<span class="number">8080</span><span class="regexp">/api/</span>v1<span class="regexp">/clusters/</span><span class="variable">$CLUSTER_NAME</span><span class="regexp">/services/</span><span class="variable">$SERVICE_NAME</span></div><div class="line">删除某节点上服务组件:</div><div class="line">curl -u admin:<span class="variable">$PASSWORD</span> -i -H <span class="string">'X-Requested-By: ambari'</span> -X DELETE http:<span class="regexp">//</span><span class="variable">$AMBARI_HOST</span>:<span class="number">8080</span><span class="regexp">/api/</span>v1<span class="regexp">/clusters/</span><span class="variable">$CLUSTER_NAME</span><span class="regexp">/hosts/</span><span class="variable">$HOST_NAME</span><span class="regexp">/host_components/</span><span class="variable">$COMPONENT_NAME</span></div></pre></td></tr></table></figure>
<h3 id="添加服务"><a href="#添加服务" class="headerlink" title="添加服务"></a>添加服务</h3><figure class="highlight nsis"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">添加某个服务的组件:</div><div class="line">curl -u <span class="literal">admin</span>:<span class="variable">$PASSWORD</span> -H <span class="string">"X-Requested-By:ambari"</span> -X POST <span class="string">"http://<span class="variable">$AMBARI_HOST</span>:8080/api/v1/clusters/<span class="variable">$CLUSTER_NAME</span>/hosts/<span class="variable">$HOST_NAME</span>/host_components/<span class="variable">$COMPONENT_NAME</span>"</span></div></pre></td></tr></table></figure>
<h3 id="获取服务在节点信息"><a href="#获取服务在节点信息" class="headerlink" title="获取服务在节点信息"></a>获取服务在节点信息</h3><figure class="highlight awk"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">curl -u admin:<span class="variable">$PASSWORD</span> -H <span class="string">"X-Requested-By: ambari"</span> -X GET http:<span class="regexp">//</span><span class="variable">$AMBARI_HOST</span>:<span class="number">8080</span><span class="regexp">/api/</span>v1<span class="regexp">/clusters/</span><span class="variable">$CLUSTER_NAME</span><span class="regexp">/services/</span><span class="variable">$SERVICE_NAME</span><span class="regexp">/components/</span><span class="variable">$COMPONENT_NAME</span> <span class="number">2</span>><span class="regexp">/dev/</span>null |grep <span class="string">"host_name"</span></div></pre></td></tr></table></figure>
<h3 id="启动所有服务"><a href="#启动所有服务" class="headerlink" title="启动所有服务"></a>启动所有服务</h3><figure class="highlight awk"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">curl -u admin:<span class="variable">$PASSWORD</span> -i -H <span class="string">"X-Requested-By:ambari"</span> -X PUT -d <span class="string">'{"ServiceInfo": {"state" : "STARTED"}}'</span> http:<span class="regexp">//</span><span class="variable">$AMBARI_HOST</span>:<span class="number">8080</span><span class="regexp">/api/</span>v1<span class="regexp">/clusters/</span><span class="variable">$CLUSTER_NAME</span><span class="regexp">/services</span></div></pre></td></tr></table></figure>
]]></content>
<summary type="html">
<p>本文记录了一些Ambari常用的Rest API。</p>
<img src="/uploads/Ambari-Rest-API/Ambari-Rest-API.png">
</summary>
<category term="BigData" scheme="https://eshareditor.github.io/categories/BigData/"/>
<category term="Ambari" scheme="https://eshareditor.github.io/tags/Ambari/"/>
</entry>
<entry>
<title>Ambari Hue Service</title>
<link href="https://eshareditor.github.io/2016/10/11/Ambari-Hue-Service/"/>
<id>https://eshareditor.github.io/2016/10/11/Ambari-Hue-Service/</id>
<published>2016-10-11T03:19:56.000Z</published>
<updated>2016-10-11T06:39:25.071Z</updated>
<content type="html"><![CDATA[<p>最近一段时间将Hue(v3.11.0)服务集成到了Ambari(v2.4.0+),通过Ambari来便捷的操作配置Hue服务。源码已提交到Github:<a href="https://github.com/EsharEditor/ambari-hue-service" title="https://github.com/EsharEditor/ambari-hue-service" target="_blank" rel="external">https://github.com/EsharEditor/ambari-hue-service</a>,基本功能的测试结束,后续Bug修复会持续进行…</p>
<img src="/uploads/Ambari-Hue-Service/Ambari-Hue-Service.png">
<a id="more"></a>
<h2 id="组件简介"><a href="#组件简介" class="headerlink" title="组件简介"></a>组件简介</h2><ul>
<li>Ambari: Apache Ambari是一种基于Web的工具,支持Apache Hadoop集群的供应、管理和监控。Ambari目前已支持大多数Hadoop组件,包括HDFS、MapReduce、Yarn、Hive、Pig、 Hbase、Zookeper、Sqoop和Hcatalog等。</li>
<li>Hue: Hadoop UI系统,实现在Web控制台与Hadoop集群进行交互来分析和处理数据。用户可以通过Hue在Web端使用大数据集群中的HDFS、Hbase、Hive、Oozie、Zookeeper、Pig等。</li>
</ul>
<h2 id="安装说明"><a href="#安装说明" class="headerlink" title="安装说明"></a>安装说明</h2><h3 id="版本说明"><a href="#版本说明" class="headerlink" title="版本说明"></a>版本说明</h3><p>Github中提供了两个版本的Ambari-Hue:v1.0.0和v2.0.0</p>
<p>release-1.0.0</p>
<ul>
<li>Ambari: 2.1.0~2.2.2</li>
<li>Hue: 3.9.0</li>
</ul>
<p>release-2.0.0</p>
<ul>
<li>Ambari: 2.4.0+</li>
<li>Hue: 3.10.0+</li>
</ul>
<h3 id="部署Ambari-hue"><a href="#部署Ambari-hue" class="headerlink" title="部署Ambari-hue"></a>部署Ambari-hue</h3><ul>
<li><p>下载ambari-hue到Ambari组件服务目录</p>
<figure class="highlight crystal"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line">VERSION=<span class="string">`hdp-select status hadoop-client | sed 's/hadoop-client - \([0-9]\.[0-9]\).*/\1/'`</span></div><div class="line">rm -rf /var/<span class="class"><span class="keyword">lib</span>/<span class="title">ambari</span>-<span class="title">server</span>/<span class="title">resources</span>/<span class="title">stacks</span>/<span class="title">HDP</span>/$<span class="title">VERSION</span>/<span class="title">services</span>/<span class="title">HUE</span> </span></div><div class="line">sudo git clone <span class="symbol">https:</span>/<span class="regexp">/github.com/</span>EsharEditor/ambari-hue-service.git /var/<span class="class"><span class="keyword">lib</span>/<span class="title">ambari</span>-<span class="title">server</span>/<span class="title">resources</span>/<span class="title">stacks</span>/<span class="title">HDP</span>/$<span class="title">VERSION</span>/<span class="title">services</span>/<span class="title">HUE</span></span></div></pre></td></tr></table></figure>
</li>
<li><p>重启ambari-server</p>
<figure class="highlight axapta"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">service ambari-<span class="keyword">server</span> restart</div></pre></td></tr></table></figure>
</li>
<li><p>在Ambari Web端的‘Action’下拉菜单中的点击‘Add Service’按钮</p>
</li>
</ul>
<p>On bottom left -> Actions -> Add service -> check Hue server -> Next -> Next -> Change any config you like (e.g. install dir, port) -> Next -> Deploy</p>
<ul>
<li><p>默认安装目录是/usr/local/hue</p>
</li>
<li><p>安装成功后在Ambari界面能看到Hue服务:</p>
<img src="/uploads/Ambari-Hue-Service/Ambari-Hue-Service.png">
</li>
</ul>
<h3 id="使用Ambari-hue"><a href="#使用Ambari-hue" class="headerlink" title="使用Ambari-hue"></a>使用Ambari-hue</h3><h4 id="配置Hue"><a href="#配置Hue" class="headerlink" title="配置Hue"></a>配置Hue</h4><ul>
<li>Ambari-hue v2.0.0版本在v1.0.0基础上做了更多的配置方面的优化,如用户能够在Ambari界面上通过Hue Service Module配置模块开启和关闭Hue中的组件服务、通过Hue User Info配置模块开启和配置unix或者Ldap用户同步、通过Hue Database配置模块配置元数据库和其它数据库信息等。如图:<img src="/uploads/Ambari-Hue-Service/Ambari-Hue-config1.png">
<img src="/uploads/Ambari-Hue-Service/Ambari-Hue-config2.png">
<img src="/uploads/Ambari-Hue-Service/Ambari-Hue-config3.png">
<img src="/uploads/Ambari-Hue-Service/Ambari-Hue-config4.png">
</li>
</ul>
<h4 id="Service-Action"><a href="#Service-Action" class="headerlink" title="Service Action"></a>Service Action</h4><ul>
<li>UserSync: 从Hue安装linux节点或Ldap服务器中同步用户到Hue中</li>
<li>DatabaseSync: 同步Hue元数据库信息<img src="/uploads/Ambari-Hue-Service/Ambari-Hue-action.png">
</li>
</ul>
<h3 id="移除Hue服务"><a href="#移除Hue服务" class="headerlink" title="移除Hue服务"></a>移除Hue服务</h3><ul>
<li><p>通过Ambari界面Hue服务‘Service Action’-‘Delete Service’</p>
</li>
<li><p>通过Rest API删除Hue服务</p>
<figure class="highlight awk"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div></pre></td><td class="code"><pre><div class="line"><span class="comment">#停止Hue</span></div><div class="line">curl -u admin:<span class="variable">$PASSWORD</span> -i -H <span class="string">'X-Requested-By: ambari'</span> -X PUT -d <span class="string">'{"RequestInfo": {"context" :"Stop $SERVICE via REST"}, "Body": {"ServiceInfo": {"state": "INSTALLED"}}}'</span> http:<span class="regexp">//</span><span class="variable">$AMBARI_HOST</span>:<span class="number">8080</span><span class="regexp">/api/</span>v1<span class="regexp">/clusters/</span><span class="variable">$CLUSTER</span><span class="regexp">/services/</span>HUE</div><div class="line"></div><div class="line"><span class="comment">#删除Hue</span></div><div class="line">curl -u admin:<span class="variable">$PASSWORD</span> -i -H <span class="string">'X-Requested-By: ambari'</span> -X DELETE http:<span class="regexp">//</span><span class="variable">$AMBARI_HOST</span>:<span class="number">8080</span><span class="regexp">/api/</span>v1<span class="regexp">/clusters/</span><span class="variable">$CLUSTER</span><span class="regexp">/services/</span>HUE</div></pre></td></tr></table></figure>
</li>
<li><p>删除节点上Hue相关信息</p>
<figure class="highlight crystal"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div></pre></td><td class="code"><pre><div class="line">rm -rf /usr/local/hue*</div><div class="line">rm -rf /var/log/hue</div><div class="line">rm -rf /var/run/hue</div><div class="line">rm /usr/hdp/current/hadoop-client/<span class="class"><span class="keyword">lib</span>/<span class="title">hue</span>-<span class="title">plugins</span>-3.11.0-<span class="title">SNAPSHOT</span>.<span class="title">jar</span></span></div><div class="line">rm /usr/hdp/current/hue-server</div></pre></td></tr></table></figure>
</li>
</ul>
<p>注:本人使用的是本地搭建的HDP的源,Hue是自己编译打后打成hue-3.11.0.tgz包上传到本地HDP源的hue目录下。</p>
]]></content>
<summary type="html">
<p>最近一段时间将Hue(v3.11.0)服务集成到了Ambari(v2.4.0+),通过Ambari来便捷的操作配置Hue服务。源码已提交到Github:<a href="https://github.com/EsharEditor/ambari-hue-service" title="https://github.com/EsharEditor/ambari-hue-service">https://github.com/EsharEditor/ambari-hue-service</a>,基本功能的测试结束,后续Bug修复会持续进行…</p>
<img src="/uploads/Ambari-Hue-Service/Ambari-Hue-Service.png">
</summary>
<category term="BigData" scheme="https://eshareditor.github.io/categories/BigData/"/>
<category term="Ambari" scheme="https://eshareditor.github.io/tags/Ambari/"/>
<category term="Hue" scheme="https://eshareditor.github.io/tags/Hue/"/>
</entry>
<entry>
<title>Hue CLI Operation</title>
<link href="https://eshareditor.github.io/2016/09/19/Hue-CLI-Operation/"/>
<id>https://eshareditor.github.io/2016/09/19/Hue-CLI-Operation/</id>
<published>2016-09-19T09:35:54.000Z</published>
<updated>2016-09-20T10:06:30.302Z</updated>
<content type="html"><![CDATA[<p>本文主要介绍一些Hue的命令操作,如通过命令行重置用户密码、创建超级用户、同步Linux用户、更换Hue元数据存储位置等<br><img src="/uploads/Hue-CLI-Operation/Hue-CLI-Operation.jpg" width="640"><br><a id="more"></a></p>
<h2 id="设置用户信息"><a href="#设置用户信息" class="headerlink" title="设置用户信息"></a>设置用户信息</h2><h3 id="命令重置用户密码"><a href="#命令重置用户密码" class="headerlink" title="命令重置用户密码"></a>命令重置用户密码</h3><figure class="highlight awk"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">$ {HUE_HOME}<span class="regexp">/build/</span>env<span class="regexp">/bin/</span>hue changepassword username</div></pre></td></tr></table></figure>
<h3 id="shell修改用户密码"><a href="#shell修改用户密码" class="headerlink" title="shell修改用户密码"></a>shell修改用户密码</h3><figure class="highlight ruby"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div></pre></td><td class="code"><pre><div class="line">$ {HUE_HOME}/build/env/bin/hue shell</div><div class="line"><span class="meta">>></span>from django.contrib.auth.models import User</div><div class="line"><span class="meta">>></span>user = User.objects.get(username=<span class="string">'test'</span>)</div><div class="line"><span class="meta">>></span>user.set_password(<span class="string">'123456'</span>)</div><div class="line"><span class="meta">>></span>user.save()</div></pre></td></tr></table></figure>
<p>Hue shell亦可导入脚本批量修改用户密码,如创建脚本script.py,执行如下命令</p>
<figure class="highlight awk"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">$ {HUE_HOME}<span class="regexp">/build/</span>env<span class="regexp">/bin/</span>hue shell < script.py</div></pre></td></tr></table></figure>
<p>script.py内容</p>
<figure class="highlight dockerfile"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">from</span> django.contrib.auth.models import <span class="keyword">User</span></div><div class="line"><span class="keyword">user</span> = <span class="keyword">User</span>.objects.get(username=<span class="string">'test'</span>)</div><div class="line"><span class="keyword">user</span>.set_password('123456a')</div><div class="line"><span class="keyword">user</span>.save()</div><div class="line"><span class="keyword">user</span> = <span class="keyword">User</span>.objects.get(username=<span class="string">'test1'</span>)</div><div class="line"><span class="keyword">user</span>.set_password('123456b')</div><div class="line"><span class="keyword">user</span>.save()</div></pre></td></tr></table></figure>
<h3 id="shell删除用户"><a href="#shell删除用户" class="headerlink" title="shell删除用户"></a>shell删除用户</h3><figure class="highlight ruby"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line">$ {HUE_HOME}/build/env/bin/hue shell</div><div class="line"><span class="meta">>></span>from django.contrib.auth.models import User</div><div class="line"><span class="meta">>></span>user = User.objects.get(username=<span class="string">'test'</span>)</div><div class="line"><span class="meta">>></span>user.delete()</div></pre></td></tr></table></figure>
<h2 id="设置超级用户"><a href="#设置超级用户" class="headerlink" title="设置超级用户"></a>设置超级用户</h2><h3 id="命令方式设置"><a href="#命令方式设置" class="headerlink" title="命令方式设置"></a>命令方式设置</h3><figure class="highlight awk"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">$ {HUE_HOME}<span class="regexp">/build/</span>env<span class="regexp">/bin/</span>hue createsuperuser</div></pre></td></tr></table></figure>
<h3 id="Shell方式设置"><a href="#Shell方式设置" class="headerlink" title="Shell方式设置"></a>Shell方式设置</h3><figure class="highlight ruby"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div></pre></td><td class="code"><pre><div class="line">$ {HUE_HOME}/build/env/bin/hue shell</div><div class="line"><span class="meta">>></span>from django.contrib.auth.models import User</div><div class="line"><span class="meta">>></span>user = User.objects.get(username=<span class="string">'hdfs'</span>)</div><div class="line"><span class="meta">>></span>user.is_staff = True</div><div class="line"><span class="meta">>></span>user.is_superuser = True</div><div class="line"><span class="meta">>></span>user.set_password(<span class="string">'hdfs'</span>)</div><div class="line"><span class="meta">>></span>user.save()</div></pre></td></tr></table></figure>
<h2 id="同步系统用户"><a href="#同步系统用户" class="headerlink" title="同步系统用户"></a>同步系统用户</h2><p>同步当前操作系统中存在的用户到Hue,但同步到Hue中的用户没有密码,需要通过超级管理员手动设置。注:Hue不能自动同步系统上存在的用户,所以每次系统新增用户时需执行同步用户。</p>
<figure class="highlight awk"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">$ {HUE_HOME}<span class="regexp">/build/</span>env<span class="regexp">/bin/</span>hue useradmin_sync_with_unix</div></pre></td></tr></table></figure>
<p>可选参数:</p>
<figure class="highlight monkey"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div></pre></td><td class="code"><pre><div class="line">--<span class="built_in">min</span>-uid=MIN_UID Minimum UID <span class="keyword">to</span> <span class="keyword">import</span> (Inclusive).</div><div class="line">--<span class="built_in">max</span>-uid=MAX_UID Maximum UID <span class="keyword">to</span> <span class="keyword">import</span> (Exclusive).</div><div class="line">--<span class="built_in">min</span>-gid=MIN_GID Minimum GID <span class="keyword">to</span> <span class="keyword">import</span> (Inclusive).</div><div class="line">--<span class="built_in">max</span>-gid=MAX_GID Maximum GID <span class="keyword">to</span> <span class="keyword">import</span> (Exclusive).</div><div class="line">--check-shell=CHECK_SHELL Whether <span class="literal">or</span> <span class="keyword">not</span> <span class="keyword">to</span> check that the user<span class="comment">'s shell is not /bin/false.</span></div></pre></td></tr></table></figure>
<h2 id="元数据存储"><a href="#元数据存储" class="headerlink" title="元数据存储"></a>元数据存储</h2><h3 id="简介"><a href="#简介" class="headerlink" title="简介"></a>简介</h3><p>默认情况下,Hue的元数据是存储到SQLite数据库(一款轻型的关系型数据库,常用于嵌入式设备),数据库文件$HUE_HOME/desktop/desktop.db。</p>
<p>存在的问题:当多用户访问Hue时出现“Database is locked”的提示情况,且Hue Web页面提示:“SQLite is only recommended for small development environments with a few users”。所以将Hue的元数据存储到其它数据库中,这里使用MySQL数据库为例。</p>
<h3 id="更换元数据存储:"><a href="#更换元数据存储:" class="headerlink" title="更换元数据存储:"></a>更换元数据存储:</h3><p>1.配置$HUE_HOME/desktop/conf/hui.ini文件,修改[desktop]->[[database]]相关配置信息,更改如下:</p>
<figure class="highlight nix"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div></pre></td><td class="code"><pre><div class="line">[desktop]</div><div class="line"> [[database]]</div><div class="line"> <span class="attr">engine=mysql</span></div><div class="line"> <span class="attr">host=localhost</span></div><div class="line"> <span class="attr">port=3306</span></div><div class="line"> <span class="attr">user=root</span> </div><div class="line"> <span class="attr">password=123456a?</span></div><div class="line"> <span class="attr">name=desktop</span></div><div class="line"> <span class="comment">##options={}</span></div></pre></td></tr></table></figure>
<p>2.在MySQL数据库中新建数据库desktop(name=desktop),注意如果对数据库访问进行用户授权。</p>
<p>3.同步和迁移数据库,执行命令:</p>
<figure class="highlight awk"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line"><span class="variable">$HUE_HOME</span><span class="regexp">/build/</span>env<span class="regexp">/bin/</span>hue syncdb --noinput (同步数据库)</div><div class="line"><span class="variable">$HUE_HOME</span><span class="regexp">/build/</span>env<span class="regexp">/bin/</span>hue migrate (迁移数据)</div></pre></td></tr></table></figure>
<p>参考链接:<br>[1]<a href="http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_hue_database.html" target="_blank" rel="external">http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_hue_database.html</a></p>
]]></content>
<summary type="html">
<p>本文主要介绍一些Hue的命令操作,如通过命令行重置用户密码、创建超级用户、同步Linux用户、更换Hue元数据存储位置等<br><img src="/uploads/Hue-CLI-Operation/Hue-CLI-Operation.jpg" width="640"><br>
</summary>
<category term="BigData" scheme="https://eshareditor.github.io/categories/BigData/"/>
<category term="Hue" scheme="https://eshareditor.github.io/tags/Hue/"/>
</entry>
<entry>
<title>基于Ambari的Kerberos安装和命令使用</title>
<link href="https://eshareditor.github.io/2016/09/18/Kerberos-KDC-Install/"/>
<id>https://eshareditor.github.io/2016/09/18/Kerberos-KDC-Install/</id>
<published>2016-09-18T09:15:55.000Z</published>
<updated>2016-09-19T00:55:21.021Z</updated>
<content type="html"><![CDATA[<p>前段时间折腾了点大数据安全(Security)方面的东西,主要应用是大数据集群中安全方面的配置和使用。本文简单介绍下Kerberos认证方面的知识,记录一下通过Ambari搭建的Hadoop集群,Kerberos安装过程中涉及到Kerberos KDC的安装和配置,以及一些常用的命令介绍。</p>
<img src="/uploads/Kerberos-KDC-Install/Kerberos.png" width="660" height="440">
<a id="more"></a>
<h2 id="Kerberos简介"><a href="#Kerberos简介" class="headerlink" title="Kerberos简介"></a>Kerberos简介</h2><p>Kerberos这个名字来源于希腊神话,是冥界守护神兽的名字。Kerberos是一个三头怪兽,之所以用它来命名一种认证协议,是因为在整个认证过程涉及到三方:客户端、服务端和KDC(Key Distribution Center),在Windows域环境中,KDC的角色由DC(Domain Controller)来担当。<br>Kerberos实际上是一种基于票据(Ticket)的认证方式。客户端要访问服务器的资源,需要首先购买服务端认可的票据。也就是说,客户端在访问服务器之前需要预先买好票,等待服务验票之后才能入场。在这之前,客户端需要先买票,但是这张票不能直接购买,需要一张认购权证。客户端在买票之前需要预先获得一张认购权证。这张认购权证和进入服务器的入场券均由KDC发售。如图所示:</p>
<img src="/uploads/Kerberos-KDC-Install/Kerberos.png" width="660" height="440">
<h2 id="Ambari-Kerberos安装"><a href="#Ambari-Kerberos安装" class="headerlink" title="Ambari Kerberos安装"></a>Ambari Kerberos安装</h2><p>基于操作系统是Centos6.7版本,Ambari版本2.2.2+,KDC安装节点主机名:server.bigdata</p>
<h3 id="KDC安装"><a href="#KDC安装" class="headerlink" title="KDC安装"></a>KDC安装</h3><p>1.安装KDC Server</p>
<figure class="highlight sql"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">$ yum -y <span class="keyword">install</span> krb5-<span class="keyword">server</span> krb5-libs krb5-workstation</div></pre></td></tr></table></figure>
<p>2.修改配置文件:/etc/krb5.conf</p>
<figure class="highlight nix"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div></pre></td><td class="code"><pre><div class="line">[libdefaults]</div><div class="line"> <span class="attr">renew_lifetime</span> = <span class="number">7</span>d</div><div class="line"> <span class="attr">forwardable</span> = <span class="literal">true</span></div><div class="line"> <span class="attr">default_realm</span> = BIGDATA</div><div class="line"> <span class="attr">ticket_lifetime</span> = <span class="number">24</span>h</div><div class="line"> <span class="attr">dns_lookup_realm</span> = <span class="literal">false</span></div><div class="line"> <span class="attr">dns_lookup_kdc</span> = <span class="literal">false</span></div><div class="line"> <span class="comment">#default_tgs_enctypes = aes des3-cbc-sha1 rc4 des-cbc-md5</span></div><div class="line"> <span class="comment">#default_tkt_enctypes = aes des3-cbc-sha1 rc4 des-cbc-md5</span></div><div class="line"> <span class="attr">forwardable</span> = <span class="literal">true</span></div><div class="line"></div><div class="line">[logging]</div><div class="line"> <span class="attr">default</span> = FILE:/var/log/krb5kdc.log</div><div class="line"> <span class="attr">admin_server</span> = FILE:/var/log/kadmind.log</div><div class="line"> <span class="attr">kdc</span> = FILE:/var/log/krb5kdc.log</div><div class="line"></div><div class="line">[realms]</div><div class="line"> <span class="attr">BIGDATA</span> = {</div><div class="line"> <span class="attr">admin_server</span> = server.bigdata</div><div class="line"> <span class="attr">kdc</span> = server.bigdata</div><div class="line"> }</div></pre></td></tr></table></figure>
<p>3.修改文件:/var/kerberos/krb5kdc</p>
<figure class="highlight nix"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div></pre></td><td class="code"><pre><div class="line">[kdcdefaults]</div><div class="line"> <span class="attr">kdc_ports</span> = <span class="number">88</span></div><div class="line"> <span class="attr">kdc_tcp_ports</span> = <span class="number">88</span></div><div class="line"></div><div class="line">[realms]</div><div class="line"> <span class="attr">BIGDATA</span> = {</div><div class="line"> <span class="comment">#master_key_type = aes256-cts</span></div><div class="line"> <span class="attr">acl_file</span> = /var/kerberos/krb5kdc/kadm5.acl</div><div class="line"> <span class="attr">dict_file</span> = /usr/share/dict/words</div><div class="line"> <span class="attr">admin_keytab</span> = /var/kerberos/krb5kdc/kadm5.keytab</div><div class="line"> <span class="attr">supported_enctypes</span> = aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal</div><div class="line"> <span class="attr">max_renewable_life</span> = <span class="number">7</span>d</div><div class="line"> <span class="attr">default_principal_flags</span> = +renewable, +forwardable</div><div class="line"> }</div></pre></td></tr></table></figure>
<p>4.创建KDC数据库</p>
<figure class="highlight armasm"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">$ db5_util create -r <span class="keyword">BIGDATA </span>-s</div></pre></td></tr></table></figure>
<p>5.启动KDC数据库并设置开机启动</p>
<figure class="highlight livecodeserver"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line">$ /etc/rc.d/init.d/krb5kdc <span class="built_in">start</span></div><div class="line">$ /etc/rc.d/init.d/kadmin <span class="built_in">start</span></div><div class="line">$ chkconfig krb5kdc <span class="keyword">on</span></div><div class="line">$ chkconfig kadmin <span class="keyword">on</span></div></pre></td></tr></table></figure>
<p>6.创建远程管理员</p>
<figure class="highlight stylus"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">$ kadmin<span class="selector-class">.local</span> -<span class="selector-tag">q</span> <span class="string">"addprinc admin/admin@BIGDATA"</span></div><div class="line">$ kadmin<span class="selector-class">.local</span> -<span class="selector-tag">q</span> <span class="string">"xst -norandkey admin/admin@BIGDATA"</span></div></pre></td></tr></table></figure>
<p>7.修改其它配置文件<br>如果以上设置的票据是:admin/admin@BIGDATA 把/var/kerberos/krb5kdc/kadm5.acl文件内容就应该改为:</p>
<figure class="highlight asciidoc"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="strong">*/admin@BIGDATA *</span></div></pre></td></tr></table></figure>
<p>8.重启后KDC安装完成</p>
<figure class="highlight awk"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">$ <span class="regexp">/etc/</span>rc.d<span class="regexp">/init.d/</span>kadmin restart</div></pre></td></tr></table></figure>
<h3 id="安装JCE"><a href="#安装JCE" class="headerlink" title="安装JCE"></a>安装JCE</h3><p>Ambari搭建集群过程中,如说是手动指定的JDK路径,一般需要在JDK中安装JCE,默认使用Ambari提供的JDK自带JCE。</p>
<p>JDK1.7-JCE 下载地址: <a href="http://www.oracle.com/technetwork/java/javase/downloads/jce-7-download-432124.html" title="JDK1.7-JCE" target="_blank" rel="external">http://www.oracle.com/technetwork/java/javase/downloads/jce-7-download-432124.html</a><br>JDK1.8-JCE 下载地址: <a href="http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html" title="JDK1.8-JCE" target="_blank" rel="external">http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html</a><br>解压安装到${JAVA_HOME}/jre/lib/security目录下</p>
<figure class="highlight crystal"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">$ unzip -o -j -q /opt/jce_policy-<span class="number">8</span>.zip -d /usr/jdk64/jdk1.<span class="number">8.0_60</span>/jre/<span class="class"><span class="keyword">lib</span>/<span class="title">security</span>/</span></div></pre></td></tr></table></figure>
<h3 id="Kerberos-Client安装"><a href="#Kerberos-Client安装" class="headerlink" title="Kerberos Client安装"></a>Kerberos Client安装</h3><p>在Ambari菜单【Admin】-【Kerberos】-【Enable Kerberos】安装Kerberos Client,如下图:</p>
<img src="/uploads/Kerberos-KDC-Install/Kerberos-Client-Install.jpg">
<h2 id="常用命令"><a href="#常用命令" class="headerlink" title="常用命令"></a>常用命令</h2><p>1.关于kadmin.local和kadmin</p>
<p>kadmin.local和kadmin至于用哪个,取决于账户和访问权限:<br>kadmin.local(on the KDC machine)or kadmin (on others machine)<br>如果有访问kdc服务器的root权限,但是没有kerberos admin账户,使用kadmin.local<br>如果没有访问kdc服务器的root权限,但是有kerberos admin账户,使用kadmin</p>
<p>2.添加票据</p>
<figure class="highlight stylus"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line">$ kadmin<span class="selector-class">.local</span></div><div class="line">addprinc -randkey test/server.bigdata@BIGDATA</div><div class="line">xst -norandkey -k /etc/security/keytabs/test<span class="selector-class">.service</span><span class="selector-class">.keytab</span> test/server.bigdata@BIGDATA</div></pre></td></tr></table></figure>
<p>3.获取票据信息</p>
<figure class="highlight autoit"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">$ kadmin.<span class="keyword">local</span></div><div class="line">getprinc test/server.bigdata<span class="symbol">@BIGDATA</span></div></pre></td></tr></table></figure>
<p>4.列出KDC中所有票据</p>
<figure class="highlight stylus"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">$ kadmin<span class="selector-class">.local</span></div><div class="line">listprincs</div></pre></td></tr></table></figure>
<p>5.删除票据</p>
<figure class="highlight autoit"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">$ kadmin.<span class="keyword">local</span></div><div class="line">delprinc test/server.bigdata<span class="symbol">@BIGDATA</span></div></pre></td></tr></table></figure>
<p>6.修改principal属性</p>
<figure class="highlight autoit"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">$kadmin.<span class="keyword">local</span></div><div class="line">modprinc -maxrenewlife <span class="number">30</span>days test/server.bigdata<span class="symbol">@BIGDATA</span></div></pre></td></tr></table></figure>
<p>7.缓存票据</p>
<figure class="highlight groovy"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line">$ klist -k -t <span class="regexp">/etc/</span>security<span class="regexp">/keytabs/</span>test.service.keytab</div><div class="line">$ kinit -k -t <span class="regexp">/etc/</span>security<span class="regexp">/keytabs/</span>test.service.keytab test/server.bigdata<span class="meta">@BIGDATA</span></div><div class="line">$ kinit -k -t <span class="regexp">/etc/</span>security<span class="regexp">/keytabs/</span>test.service.keytab -c <span class="regexp">/tmp/</span>testkeytab test/server.bigdata<span class="meta">@BIGDATA</span></div></pre></td></tr></table></figure>
<p>8.更新票据</p>
<figure class="highlight elixir"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="variable">$ </span>kinit -R</div></pre></td></tr></table></figure>
<p>9.查看或删除用户缓存的票据</p>
<figure class="highlight elixir"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line"><span class="variable">$ </span>klist</div><div class="line"><span class="variable">$ </span>kdestroy</div></pre></td></tr></table></figure>
<p>10.合并票据</p>
<figure class="highlight stylus"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line">$ ktutil</div><div class="line">$ ktutil: rkt test<span class="selector-class">.service</span><span class="selector-class">.keytab</span></div><div class="line">$ ktutil: rkt test1<span class="selector-class">.service</span><span class="selector-class">.keytab</span></div><div class="line">$ ktutil: wkt test-test1<span class="selector-class">.service</span><span class="selector-class">.keytab</span></div></pre></td></tr></table></figure>
<p>参考链接:<br><a href="http://gost.isi.edu/publications/kerberos-neuman-tso.html" target="_blank" rel="external">http://gost.isi.edu/publications/kerberos-neuman-tso.html</a><br><a href="http://www.cnblogs.com/artech/archive/2011/01/24/kerberos.html" target="_blank" rel="external">http://www.cnblogs.com/artech/archive/2011/01/24/kerberos.html</a></p>
<hr>
]]></content>
<summary type="html">
<p>前段时间折腾了点大数据安全(Security)方面的东西,主要应用是大数据集群中安全方面的配置和使用。本文简单介绍下Kerberos认证方面的知识,记录一下通过Ambari搭建的Hadoop集群,Kerberos安装过程中涉及到Kerberos KDC的安装和配置,以及一些常用的命令介绍。</p>
<img src="/uploads/Kerberos-KDC-Install/Kerberos.png" width="660" height="440">
</summary>
<category term="BigData" scheme="https://eshareditor.github.io/categories/BigData/"/>
<category term="Ambari" scheme="https://eshareditor.github.io/tags/Ambari/"/>
<category term="Kerberos" scheme="https://eshareditor.github.io/tags/Kerberos/"/>
</entry>
<entry>
<title>Hue 组件使用</title>
<link href="https://eshareditor.github.io/2016/09/16/Hue-Component-Use/"/>
<id>https://eshareditor.github.io/2016/09/16/Hue-Component-Use/</id>
<published>2016-09-16T01:48:07.000Z</published>
<updated>2016-09-18T12:00:08.692Z</updated>
<content type="html"><![CDATA[<p>Hue作为一个能与Hadoop集群进行交互的Web UI,大大降低了用户对某些大数据组件的操作使用成本。本文通过一些简单实例来介绍Hue中某些组件的使用,如Oozie任务调度,Hbase BulkLoad数据导入,Pig数据加载等</p>
<img src="/uploads/Hue-Component-Use/Hue-Component-Use.jpg">
<a id="more"></a>
<h2 id="Oozie任务调度"><a href="#Oozie任务调度" class="headerlink" title="Oozie任务调度"></a>Oozie任务调度</h2><h3 id="MapReduce任务"><a href="#MapReduce任务" class="headerlink" title="MapReduce任务"></a>MapReduce任务</h3><p>说明:本例使用Hadoop自带的jar包中的WordCount任务,操作用户:admin</p>
<p>1.上传hadoop自带的jar包到HDFS上(这里上传到/user/admin/MapReduceJob目录下)</p>
<p>2.使用Hue中Workflow编辑器,添加Mapreduce任务,并做如下配置,如图:</p>
<img src="/uploads/Hue-Component-Use/Hue-Oozie-WordCount.jpg">
<p>propersities具体配置信息:<br><figure class="highlight dust"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div><div class="line">46</div><div class="line">47</div><div class="line">48</div><div class="line">49</div><div class="line">50</div><div class="line">51</div><div class="line">52</div><div class="line">53</div><div class="line">54</div><div class="line">55</div><div class="line">56</div><div class="line">57</div><div class="line">58</div><div class="line">59</div><div class="line">60</div><div class="line">61</div><div class="line">62</div><div class="line">63</div><div class="line">64</div><div class="line">65</div><div class="line">66</div><div class="line">67</div><div class="line">68</div><div class="line">69</div><div class="line">70</div><div class="line">71</div><div class="line">72</div><div class="line">73</div><div class="line">74</div><div class="line">75</div><div class="line">76</div></pre></td><td class="code"><pre><div class="line"><span class="xml"><span class="tag"><<span class="name">workflow-app</span> <span class="attr">name</span>=<span class="string">"Job-MR-wordcount"</span> <span class="attr">xmlns</span>=<span class="string">"uri:oozie:workflow:0.5"</span>></span></span></div><div class="line"> <span class="tag"><<span class="name">start</span> <span class="attr">to</span>=<span class="string">"mapreduce-627c"</span>/></span></div><div class="line"> <span class="tag"><<span class="name">kill</span> <span class="attr">name</span>=<span class="string">"Kill"</span>></span></div><div class="line"> <span class="tag"><<span class="name">message</span>></span>操作失败,错误消息[$<span class="template-variable">{wf:errorMessage(wf:lastErrorNode())}</span><span class="xml">]<span class="tag"></<span class="name">message</span>></span></span></div><div class="line"> <span class="tag"></<span class="name">kill</span>></span></div><div class="line"> <span class="tag"><<span class="name">action</span> <span class="attr">name</span>=<span class="string">"mapreduce-627c"</span>></span></div><div class="line"> <span class="tag"><<span class="name">map-reduce</span>></span></div><div class="line"> <span class="tag"><<span class="name">job-tracker</span>></span>$<span class="template-variable">{jobTracker}</span><span class="xml"><span class="tag"></<span class="name">job-tracker</span>></span></span></div><div class="line"> <span class="tag"><<span class="name">name-node</span>></span>$<span class="template-variable">{nameNode}</span><span class="xml"><span class="tag"></<span class="name">name-node</span>></span></span></div><div class="line"> <span class="tag"><<span class="name">prepare</span>></span></div><div class="line"> <span class="tag"><<span class="name">delete</span> <span class="attr">path</span>=<span class="string">"$</span></span><span class="template-variable">{nameNode}</span><span class="xml"><span class="tag"><span class="string">/user/admin/mapreduce/output-wordcount"</span>/></span></span></div><div class="line"> <span class="tag"></<span class="name">prepare</span>></span></div><div class="line"> <span class="tag"><<span class="name">configuration</span>></span></div><div class="line"> <span class="tag"><<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">name</span>></span>mapreduce.job.map.class<span class="tag"></<span class="name">name</span>></span></div><div class="line"> <span class="tag"><<span class="name">value</span>></span>org.apache.hadoop.examples.WordCount$TokenizerMapper<span class="tag"></<span class="name">value</span>></span></div><div class="line"> <span class="tag"></<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">name</span>></span>mapreduce.job.reduce.class<span class="tag"></<span class="name">name</span>></span></div><div class="line"> <span class="tag"><<span class="name">value</span>></span>org.apache.hadoop.examples.WordCount$IntSumReducer<span class="tag"></<span class="name">value</span>></span></div><div class="line"> <span class="tag"></<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">name</span>></span>mapreduce.job.combine.class<span class="tag"></<span class="name">name</span>></span></div><div class="line"> <span class="tag"><<span class="name">value</span>></span>org.apache.hadoop.examples.WordCount$IntSumReducer<span class="tag"></<span class="name">value</span>></span></div><div class="line"> <span class="tag"></<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">name</span>></span>mapreduce.input.fileinputformat.inputdir<span class="tag"></<span class="name">name</span>></span></div><div class="line"> <span class="tag"><<span class="name">value</span>></span>/user/admin/MapReduceJob/input<span class="tag"></<span class="name">value</span>></span></div><div class="line"> <span class="tag"></<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">name</span>></span>mapreduce.output.fileoutputformat.outputdir<span class="tag"></<span class="name">name</span>></span></div><div class="line"> <span class="tag"><<span class="name">value</span>></span>/user/admin/MapReduceJob/output-wordcount<span class="tag"></<span class="name">value</span>></span></div><div class="line"> <span class="tag"></<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">name</span>></span>mapreduce.job.output.key.class<span class="tag"></<span class="name">name</span>></span></div><div class="line"> <span class="tag"><<span class="name">value</span>></span>org.apache.hadoop.io.Text<span class="tag"></<span class="name">value</span>></span></div><div class="line"> <span class="tag"></<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">name</span>></span>mapreduce.job.output.value.class<span class="tag"></<span class="name">name</span>></span></div><div class="line"> <span class="tag"><<span class="name">value</span>></span>org.apache.hadoop.io.IntWritable<span class="tag"></<span class="name">value</span>></span></div><div class="line"> <span class="tag"></<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">name</span>></span>mapreduce.job.maps<span class="tag"></<span class="name">name</span>></span></div><div class="line"> <span class="tag"><<span class="name">value</span>></span>6<span class="tag"></<span class="name">value</span>></span></div><div class="line"> <span class="tag"></<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">name</span>></span>mapreduce.job.reduces<span class="tag"></<span class="name">name</span>></span></div><div class="line"> <span class="tag"><<span class="name">value</span>></span>3<span class="tag"></<span class="name">value</span>></span></div><div class="line"> <span class="tag"></<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">name</span>></span>mapreduce.cluster.mapmemory.mb<span class="tag"></<span class="name">name</span>></span></div><div class="line"> <span class="tag"><<span class="name">value</span>></span>1024<span class="tag"></<span class="name">value</span>></span></div><div class="line"> <span class="tag"></<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">name</span>></span>mapreduce.cluster.reducememory.mb<span class="tag"></<span class="name">name</span>></span></div><div class="line"> <span class="tag"><<span class="name">value</span>></span>1024<span class="tag"></<span class="name">value</span>></span></div><div class="line"> <span class="tag"></<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">name</span>></span>mapreduce.job.queuename<span class="tag"></<span class="name">name</span>></span></div><div class="line"> <span class="tag"><<span class="name">value</span>></span>queue1<span class="tag"></<span class="name">value</span>></span></div><div class="line"> <span class="tag"></<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">name</span>></span>mapred.mapper.new-api<span class="tag"></<span class="name">name</span>></span></div><div class="line"> <span class="tag"><<span class="name">value</span>></span>true<span class="tag"></<span class="name">value</span>></span></div><div class="line"> <span class="tag"></<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">name</span>></span>mapred.reducer.new-api<span class="tag"></<span class="name">name</span>></span></div><div class="line"> <span class="tag"><<span class="name">value</span>></span>true<span class="tag"></<span class="name">value</span>></span></div><div class="line"> <span class="tag"></<span class="name">property</span>></span></div><div class="line"> <span class="tag"></<span class="name">configuration</span>></span></div><div class="line"> <span class="tag"></<span class="name">map-reduce</span>></span></div><div class="line"> <span class="tag"><<span class="name">ok</span> <span class="attr">to</span>=<span class="string">"End"</span>/></span></div><div class="line"> <span class="tag"><<span class="name">error</span> <span class="attr">to</span>=<span class="string">"Kill"</span>/></span></div><div class="line"> <span class="tag"></<span class="name">action</span>></span></div><div class="line"> <span class="tag"><<span class="name">end</span> <span class="attr">name</span>=<span class="string">"End"</span>/></span></div><div class="line"><span class="tag"></<span class="name">workflow-app</span>></span></div></pre></td></tr></table></figure></p>
<p>3.在Coordinator和Bundles页面给任务做更多的设置,如设置任务执行时间,绑定多个任务等。点击保存并提交任务,在Oozie的仪表盘和Yarn任务管理页面查看任务状态。</p>
<h3 id="Spark任务"><a href="#Spark任务" class="headerlink" title="Spark任务"></a>Spark任务</h3><p>说明:本例测试使用Hue提供的SparkFileCopy任务,操作用户:admin</p>
<p>1.上传jar包到HDFS某个目录</p>
<p>2.使用Hue中Workflow编辑器,添加Spark任务,并做如下配置,如图:</p>
<img src="/uploads/Hue-Component-Use/Hue-Oozie-SparkFileCopy.jpg">
<img src="/uploads/Hue-Component-Use/Hue-Oozie-SparkFileCopy1.jpg">
<p>3.保存并提交任务,在Oozie的仪表盘和Yarn任务管理页面查看任务状态。</p>
<h2 id="Hbase数据导入"><a href="#Hbase数据导入" class="headerlink" title="Hbase数据导入"></a>Hbase数据导入</h2><h3 id="BulkLoad数据导入"><a href="#BulkLoad数据导入" class="headerlink" title="BulkLoad数据导入"></a>BulkLoad数据导入</h3><p>1.在Hue的Hbase模块新建表:userinfo,两个列族:info和grade</p>
<p>2.在Windows本地某一目录新建数据文件:userinfo.csv,文件内容如下图</p>
<img src="/uploads/Hue-Component-Use/Hue-Hbase-BulkLoad-Userinfo.jpg">
<p>说明:<br>1)导入的文件中每行数据数量必须匹配对应的行数,数据可以不存在,但必须用对应的分隔符留出该字段位置。<br>2)因bulkload不支持多个分隔符划分字段,数据文件第一行每个列之间注意用单个分隔符分割,且不要留空格。</p>
<p>3.在Hue的Hbase模块打开userinfo表,点击BulkLoad加载本地数据文件userinfo.csv,数据导入成功</p>
<h2 id="Pig数据加载"><a href="#Pig数据加载" class="headerlink" title="Pig数据加载"></a>Pig数据加载</h2><p>Hue提供的Pig应用允许用户定义pig脚本、运行脚本、以及查看Job的工作状态。执行的Pig脚本会自动由Oozie提交到Yarn任务管理器上,所以在Oozie仪表盘以及Yarn任务管理都能看到此Pig任务的运行状态。可以看出在Hue中Pig的使用依赖Oozie。</p>
<p>1.在HDFS的某目录下(这里使用/user/admin/PigTestData目录)创建数据文件student.txt</p>
<p>student.txt数据文件内容:<br><figure class="highlight css"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div></pre></td><td class="code"><pre><div class="line"><span class="selector-tag">C01</span><span class="selector-pseudo">:N0101</span><span class="selector-pseudo">:82</span><span class="selector-pseudo">:90</span></div><div class="line"><span class="selector-tag">C01</span><span class="selector-pseudo">:N0102</span><span class="selector-pseudo">:59</span><span class="selector-pseudo">:68</span></div><div class="line"><span class="selector-tag">C01</span><span class="selector-pseudo">:N0103</span><span class="selector-pseudo">:65</span><span class="selector-pseudo">:73</span></div><div class="line"><span class="selector-tag">C02</span><span class="selector-pseudo">:N0201</span><span class="selector-pseudo">:81</span><span class="selector-pseudo">:88</span></div><div class="line"><span class="selector-tag">C02</span><span class="selector-pseudo">:N0202</span><span class="selector-pseudo">:94</span><span class="selector-pseudo">:99</span></div><div class="line"><span class="selector-tag">C02</span><span class="selector-pseudo">:N0203</span><span class="selector-pseudo">:79</span><span class="selector-pseudo">:92</span></div><div class="line"><span class="selector-tag">C03</span><span class="selector-pseudo">:N0301</span><span class="selector-pseudo">:56</span><span class="selector-pseudo">:67</span></div><div class="line"><span class="selector-tag">C03</span><span class="selector-pseudo">:N0302</span><span class="selector-pseudo">:92</span><span class="selector-pseudo">:84</span></div><div class="line"><span class="selector-tag">C03</span><span class="selector-pseudo">:N0306</span><span class="selector-pseudo">:72</span><span class="selector-pseudo">:49</span></div></pre></td></tr></table></figure></p>
<p>Pig脚本语句:<br><figure class="highlight cs"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line">records = load <span class="string">'/user/admin/PigTestData/student.txt'</span> <span class="function"><span class="keyword">using</span> <span class="title">PigStorage</span>(<span class="params"><span class="string">','</span></span>) <span class="title">as</span>(<span class="params">classNo:chararray, studNo:chararray, score_Chinese:<span class="keyword">int</span>, score_Math:<span class="keyword">int</span></span>)</span>;</div><div class="line">dump records;</div><div class="line">store records <span class="keyword">into</span> <span class="string">'/user/admin/PigTestData/student_out'</span> <span class="function"><span class="keyword">using</span> <span class="title">PigStorage</span>(<span class="params"><span class="string">':'</span></span>)</span>;</div></pre></td></tr></table></figure></p>
<p>2.在Hue的Pig模块执行如下Pig脚本,内容如下图</p>
<img src="/uploads/Hue-Component-Use/Hue-Pig-LoadData.jpg">
]]></content>
<summary type="html">
<p>Hue作为一个能与Hadoop集群进行交互的Web UI,大大降低了用户对某些大数据组件的操作使用成本。本文通过一些简单实例来介绍Hue中某些组件的使用,如Oozie任务调度,Hbase BulkLoad数据导入,Pig数据加载等</p>
<img src="/uploads/Hue-Component-Use/Hue-Component-Use.jpg">
</summary>
<category term="BigData" scheme="https://eshareditor.github.io/categories/BigData/"/>
<category term="Hue" scheme="https://eshareditor.github.io/tags/Hue/"/>
</entry>
<entry>
<title>Hbase BulkLoad</title>
<link href="https://eshareditor.github.io/2016/09/14/Hbase-BulkLoad/"/>
<id>https://eshareditor.github.io/2016/09/14/Hbase-BulkLoad/</id>
<published>2016-09-14T08:40:23.000Z</published>
<updated>2016-09-15T09:34:43.111Z</updated>
<content type="html"><![CDATA[<p>Hbase快速数据导入工具BulkLoad:利用Hbase的数据以特定的格式存储在HDFS内这一原理,通过MapReduce作业直接生成这种数据格式文件,然后上传到HDFS上合适的位置,完成大量数据快速入库。</p>
<img src="/uploads/Hbase-BulkLoad/Hbase-BulkLoad.jpg">
<a id="more"></a>
<h2 id="Hbase数据带入方式"><a href="#Hbase数据带入方式" class="headerlink" title="Hbase数据带入方式"></a>Hbase数据带入方式</h2><ul>
<li>HBase Client 调用方式</li>
<li>MapReduce 任务方式</li>
<li>BulkLoad 工具方式</li>
<li>Sqoop 工具方式</li>
</ul>
<h2 id="BulkLoad优劣势"><a href="#BulkLoad优劣势" class="headerlink" title="BulkLoad优劣势"></a>BulkLoad优劣势</h2><h3 id="优势"><a href="#优势" class="headerlink" title="优势"></a>优势</h3><ul>
<li>如果一次性入导入Hbase表的数据量巨大,不仅处理速度慢不说,还特别占用Region资源, 一个比较高效便捷的方法就是使用 “Bulk Loading”方法,即HBase提供的HFileOutputFormat类。</li>
<li>利用Hbase的数据以特定的格式存储在HDFS内这一原理,直接生成这种HDFS中存储的数据格式文件,然后上传至合适位置,即完成巨量数据快速入库的办法。配合mapreduce完成,高效便捷,而且不占用region资源,增添负载。</li>
</ul>
<h3 id="劣势"><a href="#劣势" class="headerlink" title="劣势"></a>劣势</h3><ul>
<li>仅适合初次数据导入,即表内数据为空。</li>
<li>HBase集群与Hadoop集群为同一集群,即HBase所基于的HDFS为生成HFile的MR的集群。</li>
</ul>
<h2 id="BulkLoad使用"><a href="#BulkLoad使用" class="headerlink" title="BulkLoad使用"></a>BulkLoad使用</h2><ol>
<li>从数据源(通常为文本文件或其它数据库)提取数据并上传到HDFS。</li>
<li>利用一个MapReduce作业准备数据:通过编排一个MapReduce作业,需手动编写map函数,作业需要使用rowkey(行键)作为输出Key,KeyValue、Put或者Delete作为输出Value。MapReduce作业需要使用Hbase提供的HFileOutputFormat2来生成Hbase底层存储的HFile数据格式文件。为了有效的导入数据,需要配置HFileOutputFormat2使得每一个输出文件都在一个合适的区域中。</li>
<li>告诉RegionServers数据的位置并导入数据,这一步需要使用Hbase中的LoadIncrementalHFiles,将文件在HDFS上的位置传递给它,它就会利用RegionServer将数据导入到相应的区域,完成与Hbase表的关联。<img src="/uploads/Hbase-BulkLoad/Hbase-BulkLoad.jpg" title="操作流程图">
</li>
</ol>
<h3 id="命令行导入"><a href="#命令行导入" class="headerlink" title="命令行导入"></a>命令行导入</h3><h4 id="直接导入数据"><a href="#直接导入数据" class="headerlink" title="直接导入数据"></a>直接导入数据</h4><p>在Hbase中创建空表,如userinfo,使用Hbase自带的ImportTsv工具导入/opt/userinfo.tsv数据文件</p>
<figure class="highlight groovy"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">$ hadoop jar $HBASE_HOME<span class="regexp">/lib/</span>hbase-server<span class="number">-1.1</span><span class="number">.2</span>.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,<span class="string">cf1:</span>name,<span class="string">cf1:</span>sex,<span class="string">cf2:</span><span class="class"><span class="keyword">class</span> <span class="title">userinfo</span> /<span class="title">opt</span>/<span class="title">userinfo</span>.<span class="title">tsv</span></span></div></pre></td></tr></table></figure>
<h4 id="分步导入数据"><a href="#分步导入数据" class="headerlink" title="分步导入数据"></a>分步导入数据</h4><ul>
<li>将userinfo.tsv文件上传到HDFS某个目录,并在Hbase中创建userinfo表</li>
</ul>
<figure class="highlight awk"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">$ hadoop fs -put <span class="regexp">/opt/u</span>serinfo.tsv <span class="regexp">/data_dir/</span>source_file</div></pre></td></tr></table></figure>
<ul>
<li>利用ImportTsv工具的Dimporttsv.bulk.output参数项指定生成的HFile文件位置/date_dir/bulkload_data/output</li>
</ul>
<figure class="highlight ruby"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">$ hadoop jar $HBASE_HOME/lib/hbase-server-<span class="number">1.1</span>.<span class="number">2</span>.jar importtsv -Dimporttsv.bulk.output=<span class="regexp">/date_dir/bulkload</span>_data/output -Dimporttsv.columns=HBASE_ROW_KEY,<span class="symbol">cf1:</span>name,<span class="symbol">cf1:</span>sex,<span class="symbol">cf2:</span><span class="class"><span class="keyword">class</span> <span class="title">userinfo</span> /<span class="title">opt</span>/<span class="title">userinfo</span>.<span class="title">tsv</span></span></div></pre></td></tr></table></figure>
<ul>
<li>利用completebulkload完成bluk load数据导入</li>
</ul>
<figure class="highlight crystal"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">$ hadoop jar $HBASE_HOME/<span class="class"><span class="keyword">lib</span>/<span class="title">hbase</span>-<span class="title">server</span>-1.1.2.<span class="title">jar</span> <span class="title">completebulkload</span> /<span class="title">date_dir</span>/<span class="title">bulkload_data</span>/<span class="title">output</span> <span class="title">userinfo</span></span></div></pre></td></tr></table></figure>
<h3 id="相关说明"><a href="#相关说明" class="headerlink" title="相关说明"></a>相关说明</h3><ol>
<li>将数据文件生成HFile方式是所有的加载方案里面是最快的,前提是:Hbase表须为空!如果表中已经有数据,HFile再次导入的时候,HBase的表会触发split分割操作</li>
<li>最终输出结果,无论是Map还是Reduce,输出建议只使用<immutablebyteswritable, keyvalue=""></immutablebyteswritable,></li>
<li>指定的列和源文件字段必须全部匹配,不匹配则一定导入失败</li>
<li>命令行导入时默认的分隔符为tab符</li>
<li>RowKey可以在任何位置:<br>假设存在数据4列(逗号为分隔符)<br>1,2,3,4<br>第一列RowKey<br>RowKey,2,3,4<br>第三列RowKey<br>1,2,RowKey,4</li>
<li>常用参数说明:<br>-Dimporttsv.skip.bad.lines=false –> 若遇到无效行则失败<br>‘-Dimporttsv.separator=|’或‘-Dimporttsv.separator=,’ –> 指定分割符<br>-Dimporttsv.timestamp=currentTimeAsLong –> 导入时使用指定的时间戳<br>-Dimporttsv.mapper.class=my.Mapper –> 使用用户指定的Mapper类(默认的org.apache.hadoop.hbase.mapreduce.TsvImporterMapper)</li>
</ol>
<h2 id="参考链接"><a href="#参考链接" class="headerlink" title="参考链接"></a>参考链接</h2><p><a href="http://www.ibm.com/developerworks/cn/opensource/os-cn-data-import/index.html" target="_blank" rel="external">http://www.ibm.com/developerworks/cn/opensource/os-cn-data-import/index.html</a><br><a href="http://www.aboutyun.com/thread-11652-1-1.html" target="_blank" rel="external">http://www.aboutyun.com/thread-11652-1-1.html</a></p>
]]></content>
<summary type="html">
<p>Hbase快速数据导入工具BulkLoad:利用Hbase的数据以特定的格式存储在HDFS内这一原理,通过MapReduce作业直接生成这种数据格式文件,然后上传到HDFS上合适的位置,完成大量数据快速入库。</p>
<img src="/uploads/Hbase-BulkLoad/Hbase-BulkLoad.jpg">
</summary>
<category term="BigData" scheme="https://eshareditor.github.io/categories/BigData/"/>
<category term="Hbase" scheme="https://eshareditor.github.io/tags/Hbase/"/>
</entry>
<entry>
<title>Hue 编译安装</title>
<link href="https://eshareditor.github.io/2016/09/14/Hue-%E7%BC%96%E8%AF%91%E5%AE%89%E8%A3%85/"/>
<id>https://eshareditor.github.io/2016/09/14/Hue-编译安装/</id>
<published>2016-09-14T01:46:21.000Z</published>
<updated>2016-09-18T12:00:03.379Z</updated>
<content type="html"><![CDATA[<p>Hue是一个能够与Apache Hadoop交互的Web应用程序,一个开源的Apache Hadoop UI。<br>本文介绍Hue的编译、安装、启动过程以及记录这个过程中可能遇到的问题。</p>
<img src="/uploads/Hue-编译安装/hue-search.png">
<a id="more"></a>
<h2 id="编译、安装、启动"><a href="#编译、安装、启动" class="headerlink" title="编译、安装、启动"></a>编译、安装、启动</h2><h3 id="环境准备"><a href="#环境准备" class="headerlink" title="环境准备"></a>环境准备</h3><ul>
<li>IOS: Centos6.7 (minimum)</li>
<li>Maven 3.3.3</li>
<li>Ant 1.9.4+ </li>
<li>Python 2.6.6+</li>
</ul>
<h3 id="依赖的工具包"><a href="#依赖的工具包" class="headerlink" title="依赖的工具包"></a>依赖的工具包</h3><ul>
<li><p>Hue官网列出的工具包</p>
<blockquote>
<p>asciidoc krb5-devel libxml2-devel libxslt-devel libtidy mysql mysql-devel openldap-devel python-devel sqlite-devel make openssl-devel gcc gcc-c++ gmp-devel cyrus-sasl-plain cyrus-sasl-devel cyrus-sasl-gssapi libffi-devel</p>
</blockquote>
</li>
<li><p>额外需要的工具包(在编译过程中根据报错信息需额外安装的工具包)</p>
<blockquote>
<p>python-simplejson python-setuptools rsync saslwrapper-devel pycrypto libyaml-devel libsasl2-dev libsasl2-modules-gssapi-mit libkrb5-dev libssl-devel</p>
</blockquote>
</li>
<li><p>注意:Centos7系统安装:gmp libtidy mysql-devel工具无法安装。且在编译过程中提示需安装ipdb: centos7系统自带的python2.7,而默认情况下python2.7不提供pip工具的,需手动安装好,或者使用hue里的python2.7的pip工具来安装ipdb。命令:pip install ipdb</p>
</li>
</ul>
<h3 id="编译安装"><a href="#编译安装" class="headerlink" title="编译安装"></a>编译安装</h3><ul>
<li>下载</li>
</ul>
<figure class="highlight crmsh"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">$ git <span class="keyword">clone</span> <span class="title">https</span>://github.com/cloudera/hue.git</div></pre></td></tr></table></figure>
<ul>
<li>编译</li>
</ul>
<figure class="highlight elixir"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line"><span class="variable">$ </span>make apps(这个过程可能由于网络等问题导致某些包无法下载,多执行几次就OK)</div><div class="line"><span class="variable">$ </span>make locales(编译多语言版本支持)</div></pre></td></tr></table></figure>
<p>注:若只编译简体中文版,需修改$HUE/desktop/core/src/desktop/settings.py文件中:LANGUAGE_CODE = ‘zh_CN’,并删除LANGUAGES中的其它语言项。</p>
<ul>
<li>安装</li>
</ul>
<figure class="highlight elixir"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line"><span class="variable">$ </span>make install (默认安装到/usr/local目录下)</div><div class="line"><span class="variable">$ </span>make install PREFIX=<span class="regexp">/usr/hdp</span><span class="regexp">/2.3.4.0-3485/hue</span> (指定安装目录)</div></pre></td></tr></table></figure>
<ul>
<li>测试</li>
</ul>
<figure class="highlight awk"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">$ HUE<span class="regexp">/build/</span>env<span class="regexp">/bin/</span>hue test all</div></pre></td></tr></table></figure>
<h3 id="启动"><a href="#启动" class="headerlink" title="启动"></a>启动</h3><ul>
<li>添加用户,目录赋权</li>
</ul>
<figure class="highlight processing"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">$ adduser <span class="built_in">hue</span> (确保<span class="built_in">hue</span>用户与其home目录存在)</div><div class="line">$ chown -R <span class="built_in">hue</span>:<span class="built_in">hue</span> /usr/local/<span class="built_in">hue</span></div></pre></td></tr></table></figure>
<ul>
<li>启动Hue</li>
</ul>
<figure class="highlight awk"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">$ <span class="variable">$HUE</span><span class="regexp">/build/</span>env<span class="regexp">/bin/</span>supervisor & (用户生产环境)</div><div class="line">$ <span class="variable">$HUE</span><span class="regexp">/build/</span>env<span class="regexp">/bin/</span>hue runcpserver & (用户开发模式)</div></pre></td></tr></table></figure>
<p>Web浏览器:<a href="http://HUE_SERVER_HOST:8000" target="_blank" rel="external">http://HUE_SERVER_HOST:8000</a> 第一次输入的用户名密码为超级管理员的用户密码。</p>
<h2 id="问题剖析"><a href="#问题剖析" class="headerlink" title="问题剖析"></a>问题剖析</h2><ol>
<li><p>Hue 3.10.0以上版本编译过程中报错:error: ffi.h: No such file or directory。<br>问题分析:编译环境缺少libssl-devel libffi-devel,编译过程中若缺少一些依赖的工具包,会有很多类似的错误。<br>解决办法:yum -y install libssl-devel libffi-devel</p>
</li>
<li><p>Hue在启动过程中报错:UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xef in position 166: ordinal not in range(128)。<br>问题分析:编码问题,Python的str默认是ascii编码,和unicode编码冲突<br>解决办法:在出现问题的python文件或者页面添加如下3行信息:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">import</span> sys</div><div class="line">reload(sys)</div><div class="line">sys.setdefaultencoding(<span class="string">'utf-8'</span>)</div></pre></td></tr></table></figure>
</li>
<li><p>Hue 3.9.0及以下版本自带LivyServer服务,支持Spark 1.6.0以下版本;<br>Hue 3.10.0及以上版本将不在集成LivyServer服务,支持Spark 1.6.0及以上版本,LivyServer服务需单独安装部署。</p>
</li>
</ol>
]]></content>
<summary type="html">
<p>Hue是一个能够与Apache Hadoop交互的Web应用程序,一个开源的Apache Hadoop UI。<br>本文介绍Hue的编译、安装、启动过程以及记录这个过程中可能遇到的问题。</p>
<img src="/uploads/Hue-编译安装/hue-search.png">
</summary>
<category term="BigData" scheme="https://eshareditor.github.io/categories/BigData/"/>
<category term="Hue" scheme="https://eshareditor.github.io/tags/Hue/"/>
</entry>
<entry>
<title>Hexo 简介</title>
<link href="https://eshareditor.github.io/2016/09/12/hello-world/"/>
<id>https://eshareditor.github.io/2016/09/12/hello-world/</id>
<published>2016-09-12T03:04:07.189Z</published>
<updated>2016-09-18T11:58:18.273Z</updated>
<content type="html"><![CDATA[<p>Hexo: A fast, simple & powerful blog framework, powered by Node.js.</p>
<img src="/uploads/Hello-World/Hello-World-Hexo.jpg" width="600">
<a id="more"></a>
<p>Welcome to <a href="https://hexo.io/" target="_blank" rel="external">Hexo</a>! This is your very first post. Check <a href="https://hexo.io/docs/" target="_blank" rel="external">documentation</a> for more info. If you get any problems when using Hexo, you can find the answer in <a href="https://hexo.io/docs/troubleshooting.html" target="_blank" rel="external">troubleshooting</a> or you can ask me on <a href="https://github.com/hexojs/hexo/issues" target="_blank" rel="external">GitHub</a>.</p>
<h2 id="Quick-Start"><a href="#Quick-Start" class="headerlink" title="Quick Start"></a>Quick Start</h2><h3 id="Create-a-new-post"><a href="#Create-a-new-post" class="headerlink" title="Create a new post"></a>Create a new post</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">$ hexo new <span class="string">"My New Post"</span></div></pre></td></tr></table></figure>
<p>More info: <a href="https://hexo.io/docs/writing.html" target="_blank" rel="external">Writing</a></p>
<h3 id="Run-server"><a href="#Run-server" class="headerlink" title="Run server"></a>Run server</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">$ hexo server</div></pre></td></tr></table></figure>
<p>More info: <a href="https://hexo.io/docs/server.html" target="_blank" rel="external">Server</a></p>
<h3 id="Generate-static-files"><a href="#Generate-static-files" class="headerlink" title="Generate static files"></a>Generate static files</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">$ hexo generate</div></pre></td></tr></table></figure>
<p>More info: <a href="https://hexo.io/docs/generating.html" target="_blank" rel="external">Generating</a></p>
<h3 id="Deploy-to-remote-sites"><a href="#Deploy-to-remote-sites" class="headerlink" title="Deploy to remote sites"></a>Deploy to remote sites</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">$ hexo deploy</div></pre></td></tr></table></figure>
<p>More info: <a href="https://hexo.io/docs/deployment.html" target="_blank" rel="external">Deployment</a></p>
]]></content>
<summary type="html">
<p>Hexo: A fast, simple &amp; powerful blog framework, powered by Node.js.</p>
<img src="/uploads/Hello-World/Hello-World-Hexo.jpg" width="600">
</summary>
<category term="Web" scheme="https://eshareditor.github.io/categories/Web/"/>
<category term="Hexo" scheme="https://eshareditor.github.io/tags/Hexo/"/>
</entry>
<entry>
<title>About Me</title>
<link href="https://eshareditor.github.io/2016/09/09/About-Me/"/>
<id>https://eshareditor.github.io/2016/09/09/About-Me/</id>
<published>2016-09-09T11:00:00.000Z</published>
<updated>2016-09-22T02:45:27.397Z</updated>
<content type="html"><![CDATA[<blockquote><p>You can either travel or read,but either your body or soul must be on the way. </p>
<footer><strong>《Roman Holiday》</strong></footer></blockquote>
<img src="/uploads/About-Me/AboutMe_Travel.jpg" width="600">
<a id="more"></a>
<p>9/12/2016 22:00:00 AM </p>
]]></content>
<summary type="html">
<blockquote><p>You can either travel or read,but either your body or soul must be on the way. </p>
<footer><strong>《Roman Holiday》</strong></footer></blockquote>
<img src="/uploads/About-Me/AboutMe_Travel.jpg" width="600">
</summary>
<category term="Life" scheme="https://eshareditor.github.io/categories/Life/"/>
<category term="Life" scheme="https://eshareditor.github.io/tags/Life/"/>
</entry>
</feed>