HIVE中的order by操作-白红宇

HIVE中的order by操作

阅读量：5030 次

发布时间：2019-06-12

本文共 3306 字，大约阅读时间需要 11 分钟。

hive中常见的高级查询包括：group by、Order by、join、distribute by、sort by、cluster by、Union all。今天我们来看看order by操作，Order by表示按照某些字段排序，语法如下：

[java]

select col,col2...

from tableName

where condition

order by col1,col2 [asc|desc]

注意：

(1)：order by后面可以有多列进行排序，默认按字典排序。

(2)：order by为全局排序。

(3)：order by需要reduce操作，且只有一个reduce，无法配置(因为多个reduce无法完成全局排序)。

order by操作会受到如下属性的制约：

[java]

set hive.mapred.mode=nonstrict; (default value / 默认值)

set hive.mapred.mode=strict;

注：如果在strict模式下使用order by语句，那么必须要在语句中加上limit关键字，因为执行order by的时候只能启动单个reduce，如果排序的结果集过大，那么执行时间会非常漫长。

下面我们通过一个示例来深入体会order by的用法：

有一个employees表，数据如下：

[java]

hive> select * from employees;

lavimer 15000.0 ["li","lu","wang"]  {
"k1":1.0,"k2":2.0,"k3":3.0}    {
"street":"dingnan","city":"ganzhou","num":101} 2015-01-24  love

liao    18000.0 ["liu","li","huang"]    {
"k4":2.0,"k5":3.0,"k6":6.0}    {
"street":"dingnan","city":"ganzhou","num":102} 2015-01-24  love

zhang   19000.0 ["xiao","wen","tian"]   {
"k7":7.0,"k8":8.0,"k8":8.0}    {
"street":"dingnan","city":"ganzhou","num":103} 2015-01-24  love

现在我要按第二列(salary)降序排列：

[java]

hive> select * from employees order by salary desc;

//执行MapReduce的过程

Job 0: Map: 1 Reduce: 1 Cumulative CPU: 2.62 sec HDFS Read: 415 HDFS Write: 245 SUCCESS

Total MapReduce CPU Time Spent: 2 seconds 620 msec

zhang 19000.0 ["xiao","wen","tian"] {
"k7":7.0,"k8":8.0} {
"street":"dingnan","city":"ganzhou","num":103} 2015-01-24 love

liao    18000.0 ["liu","li","huang"]    {
"k4":2.0,"k5":3.0,"k6":6.0}    {
"street":"dingnan","city":"ganzhou","num":102} 2015-01-24  love

lavimer 15000.0 ["li","lu","wang"]  {
"k1":1.0,"k2":2.0,"k3":3.0}    {
"street":"dingnan","city":"ganzhou","num":101} 2015-01-24  love

Time taken: 20.484 seconds

hive>

此时的hive.mapred.mode属性为：

[java]

hive> set hive.mapred.mode;

hive.mapred.mode=nonstrict

hive>

现在我们将它改为strict，然后再使用order by进行查询：

[java]

hive> set hive.mapred.mode=strict;

hive> select * from employees order by salary desc;

FAILED: Error in semantic analysis: 1:33 In strict mode, if ORDER BY is specified, LIMIT must also be specified. Error encountered near token 'salary'

hive>

注：在strict模式下查询必须加上limit关键字。

[java]

hive> select * from employees order by salary desc limit 3;

FAILED: Error in semantic analysis: No partition predicate found for Alias "employees" Table "employees"

注：另外还有一个要注意的是strict模式也会限制分区表的查询，解决方案是必须指定分区

先来看看分区：

[java]

hive> show partitions employees;

date_time=2015-01-24/type=love

Time taken: 0.096 seconds

在strict模式先使用order by查询：

[java]

hive> select * from employees where partition(date_time='2015-01-24',type='love') order by salary desc limit 3;

FAILED: Parse Error: line 1:30 cannot recognize input near 'partition' '(' 'date_time' in expression specification

hive

> select * from employees where date_time='2015-01-24' and type='love' order by salary desc limit 3;

//执行MapReduce程序

Total MapReduce CPU Time Spent: 3 seconds 510 msec

zhang 19000.0 ["xiao","wen","tian"] {
"k7":7.0,"k8":8.0} {
"street":"dingnan","city":"ganzhou","num":103} 2015-01-24 love

liao    18000.0 ["liu","li","huang"]    {
"k4":2.0,"k5":3.0,"k6":6.0}    {
"street":"dingnan","city":"ganzhou","num":102} 2015-01-24  love

lavimer 15000.0 ["li","lu","wang"]  {
"k1":1.0,"k2":2.0,"k3":3.0}    {
"street":"dingnan","city":"ganzhou","num":101} 2015-01-24  love

Time taken: 19.861 seconds

hive>

转载于:https://www.cnblogs.com/yjd_hycf_space/p/6801388.html

你可能感兴趣的文章

PLSQL日期函数

查看>>

8 个最好的 jQuery 树形 Tree 插件

前端读者 | 前端开发者调试面板vConsole

查看>>

PrimeNumber

查看>>

Array对象的方法实现（1）----Array.prototype.push和Array.prototype.concat（实现常规参数的功能）...

查看>>

UVA 10200 Prime Time 水

js时间的操作，为了让cookie在当天24点过期~

查看>>

【USACO】干草金字塔

查看>>

编译Nginx, 并使用自签证书实现https访问

《nodejs+gulp+webpack基础实战篇》课程笔记（五）-- 实战演练，构建用户登录

查看>>

工作中EF遇到的问题

查看>>

bzoj1505 [NOI2004]小H的小屋