hadoop和hive

hadoop基础教程系统运行与维护

hive10G数据测试

http://blog.fens.me/hadoop-hive-10g/

hadoop分布式集群

http://blog.csdn.net/zpf336/article/details/53262379

Hive取非Group by字段数据的方法

http://www.crazyant.net/1600.html

select sid,collect_set(class_id) from table2 group by sid;

select sid,collect_set(class_id)[0] from table2 group by sid;

http://www.linuxidc.com/Linux/2017-06/144600.htm

Flume采集Nginx日志到HDFS

下载apache-flume-1.7.0-bin.tar.gz，用

tar -zxvf

解压，在/etc/profile文件中增加设置：

export FLUME_HOME=/opt/apache-flume-1.7.0-bin
export PATH=$PATH:$FLUME_HOME/bin

修改$FLUME_HOME/conf/下的两个文件，在flume-env.sh中增加JAVA_HOME：

JAVA_HOME=/opt/jdk1.8.0_121

最重要的，修改flume-conf.properties文件：

# 配置Agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置Source
a1.sources.r1.type = exec
a1.sources.r1.channels = c1
a1.sources.r1.deserializer.outputCharset = UTF-8

# 配置需要监控的日志输出目录
a1.sources.r1.command = tail -F /usr/local/nginx/log/access.log

# 配置Sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.sinks.k1.hdfs.path = hdfs://master:9000/flume/events/%Y-%m
a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H
a1.sinks.k1.hdfs.fileSuffix = .log
a1.sinks.k1.hdfs.minBlockReplicas = 1
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.rollInterval = 86400
a1.sinks.k1.hdfs.rollSize = 1000000
a1.sinks.k1.hdfs.rollCount = 10000
a1.sinks.k1.hdfs.idleTimeout = 0

# 配置Channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 将三者连接
a1.sources.r1.channel = c1
a1.sinks.k1.channel = c1

以上文件设置了Source、Channel和Sink，将Nginx日志中的记录采集到HDFS，运行

flume-ng agent -n a1 -c conf -f $FLUME_HOME/conf/flume-conf.properties

如果没有报错，则安装设置成功了，Nginx中新增加的记录都会被Flume采集，并且存储到HDFS。

http://www.cnblogs.com/cnmenglang/p/6543927.html

[root@10-10-67-217 apache-flume-1.7.0-bin]# pwd
/data/bigdata/apache-flume-1.7.0-bin
[root@10-10-67-217 apache-flume-1.7.0-bin]# flume-ng agent --conf-file conf/hdfs.properties -c conf --name LogAgent -Dflume.root.logger=DEBUG,console

mysql>grant all privileges on *.* to hive@"%" identified by "hive" with grant option;

mysql>flush privileges;

几种保存Hive查询结果的方法

http://www.cnblogs.com/harvey888/p/6279503.html?utm_source=itdadao&utm_medium=referral

https://stackoverflow.com/questions/29138498/sqoop-import-without-primary-key-in-rdbms

sqoop import \
    --connect jdbc:mysql://localhost/test_db \
    --username root \
    --password **** \
    --table user \
    --target-dir /user/root/user_data \
    --columns "first_name, last_name, created_date"
    -m 1

sqoop import \
    --connect jdbc:mysql://localhost/test_db \
    --username root \
    --password **** \
    --table user \
    --target-dir /user/root/user_data \
    --columns "first_name, last_name, created_date"
    --split-by created_date

hdfs to mysql

hive> INSERT OVERWRITE DIRECTORY '/sqoop/1.csv'  ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE  SELECT * FROM userinfo;

[root@web_test hadoop-2.7.3]# ./bin/hdfs dfs -cat /sqoop/1.csv/000000_0

[root@web_test bin]# ./sqoop export --connect jdbc:mysql://localhost:3306/hive \--username root \--table userinfo -m 1 \--export-dir /sqoop/1.csv

set hive.cli.print.header=true;

mysql to hive

https://stackoverflow.com/questions/26853164/sqoop-from-mysql-where-the-data-contains-carriage-returns

--hive-drop-import-delims

[root@web_test bin]# ./sqoop import --connect jdbc:mysql://localhost:3306/hive \--username root \--table userinfo \--hive-import \--hive-overwrite

\--create-hive-table \--hive-table userinfo \--target-dir /sqoop/tree1 -m 1 --hive-drop-import-delims

[root@web_test bin]# ./sqoop import --connect jdbc:mysql://localhost:3306/hive \--username root \--table userinfo \--hive-import \--hive-overwrite \--create-hive-table \--hive-table userinfo \--target-dir /sqoop/tree3 -m 1

[root@web_test hadoop-2.7.3]# ./bin/hdfs dfs -ls /user/hive/warehouse/

17/07/21 23:43:13 INFO output.FileOutputCommitter: Saved output of task 'attempt_local269851262_0001_m_000000_0' to hdfs://localhost:9000/sqoop/tree3/_temporary/0/task_local269851262_0001_m_000000

sqoop安装

http://1csh1.github.io/2016/04/19/Sqoop%E5%AE%89%E8%A3%85%E4%B8%8E%E9%85%8D%E7%BD%AE/

hive commands with example

https://www.edureka.co/blog/hive-commands-with-examples

export JAVA_HOME=/home/test/jdk1.8.0_121
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export HADOOP_HOME=/home/test/hadoop-2.7.3
export HIVE_HOME=/home/test/apache-hive-2.1.1-bin
export PATH=$HIVE_HOME/bin:$PATH

hadoop dfsadmin -safemode leave

hadoop非22号端口

[root@10-10-67-217 hadoop]# cat hadoop-env.sh

export HADOOP_SSH_OPTS="-p 52222"

先安装java，参考elasticsearch的java安装,设置环境变量

http://192.168.0.109:50070/dfshealth.html#tab-overview

保证java和hadoop在同一个目录

am using Cloudera quickstart vm 5.3 for the first time on my virtual machine. The task to is to create a simple text file on my local pc and move it to HDFS, display the contents of the file- all using HDFS commands. I have created a directory using the command that looks exactly like:

[cloudera@quickstart ~]$ hdfs dfs -mkdir skk411.

The folder got created but I am not able to locate where exactly it got created. I used both, search tool and manually searched all the folders present. But the directory does exist because when I use the same command again, it says "File exists". Someone please help me to find the path.

HDFS is a separate filesystem, so its directories and files do not map directly to directories and files on your local filesystem. HDFS does store its data on the local filesystem but in its own format.

If you want to see what you have in your home directory in HDFS you can run hdfs dfs -ls.

参考hadoop官网

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html#Standalone_Operation

http://www.powerxing.com/install-hadoop-in-centos/

251 ./bin/hadoop version
256 vim etc/hadoop/hadoop-env.sh
258 vim /root/.bashrc
259 source /root/.bashrc
265 vim hadoop-config.sh
268 mkdir input
269 cp etc/hadoop/*.xml input
270 ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'
285 export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec
286 ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'
300 cd /home/test/
330 export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec
331 /home/test/hadoop-2.7.3/bin/hadoop version
332 cd /home/test/
334 cd hadoop-2.7.3
335 export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec
336 ./bin/hadoop version
337 export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec
338 ./bin/hadoop version
359 cd hadoop-2.7.3
367 cd output/
374 cd ../etc/hadoop/
376 cp core-site.xml{,.bak}

[root@10-10-67-217 hadoop]# cat core-site.xml

<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/data/bigdata/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

378 cp hdfs-site.xml{,.bak}

[root@10-10-67-217 hadoop]# cat hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/data/bigdata/hadoop-root/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/data/bigdata/hadoop-root/dfs/data</value>
</property>
</configuration>
383 ssh-keygen
384 ssh-copy-id root@localhost
385 ssh localhost
386 bin/hdfs namenode -format
390 ./sbin/start-dfs.sh
391 ./bin/hdfs dfs -mkdir /user

397 dd if=/dev/zero of=test.img count=1 bs=100M
433 ./bin/hdfs dfs -mkdir example
434 ./bin/hdfs dfs -ls
435 ./bin/hdfs dfs -put test.img example
436 ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep example output 'dfs[a-z.]+'
440 ./bin/hdfs dfs -du
446 ./bin/hdfs dfs -get output example
[root@localhost hadoop-2.7.3]#

mysql hadoop

https://www.percona.com/blog/2013/07/11/mysql-and-hadoop/

https://blog.cloudera.com/blog/2014/04/using-apache-hadoop-and-impala-with-mysql-for-data-analysis/

https://zh.hortonworks.com/tutorial/how-to-process-data-with-apache-hive/

https://stackoverflow.com/questions/18129581/how-do-i-output-the-results-of-a-hiveql-query-to-csv

hive to hdfs

hive> INSERT OVERWRITE DIRECTORY '/path/to/output/dir' SELECT * FROM userinfo;

[root@web_test hadoop-2.7.3]# ./bin/hdfs dfs -ls /path/to/output/dir/000000_0
-rwxr-xr-x 1 root supergroup 24 2017-07-22 00:06 /path/to/output/dir/000000_0
[root@web_test hadoop-2.7.3]# ./bin/hdfs dfs -cat /path/to/output/dir/000000_0
1test
2siyu
3victor

hive to csv

[root@virtualmachine bin]# hive -e 'select * from userinfo' > /home/test/1.csv

[root@10-10-67-217 apache-hive-2.1.1-bin]# hive -e "set hive.support.sql11.reserved.keywords=false;create table kunbang_updatechecker_run20170710(id int,timestamp int,softname string,softid string,version string,ip string,uid string,clienttime int,qid string,installymd int,os string,diskid string,macid string,cpuid string,uuid string,package_md5 string,kunbang_software string,is_show int,is_checked int,is_downloaded int,is_installed int,show_position int,kunbang_source int,is_installed_before int,is_installed_after int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;"

[root@virtualmachine bin]#

hive> INSERT OVERWRITE LOCAL DIRECTORY '/home/test/temp.csv' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select * from userinfo;

hbase

http://1csh1.github.io/2016/04/02/HBase%E5%8D%95%E6%9C%BA%E6%A8%A1%E5%BC%8F%E9%85%8D%E7%BD%AE/

http://hbase.apache.org/book.html#quickstart

Sqoop导入关系数据库到Hive

https://segmentfault.com/a/1190000002532293

hive索引

http://www.cnblogs.com/zlslch/p/6105294.html

hive

https://chu888chu888.gitbooks.io/hadoopstudy/content/Content/8/chapter0802.html

hive> create table kunbang(id int,`timestamp` int,softname string,softid string,version string,ip string,uid string,clienttime int,qid string,installymd int,os string,diskid string,macid string,cpuid string,uuid string,package_md5 string,kunbang_software string,is_show int,is_checked int,is_downloaded int,is_installed int,show_position int,kunbang_source int,is_installed_before int,is_installed_after int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

hive> LOAD DATA LOCAL INPATH '/data/bigdata/kunbang20170708.csv' OVERWRITE INTO TABLE kunbang_table;

hive> set hive.support.sql11.reserved.keywords=false;create table kunbang_updatechecker_run(id int,`timestamp` int,softname string,softid string,version string,ip string,uid string,clienttime int,qid string,installymd int,os string,diskid string,macid string,cpuid string,uuid string,package_md5 string,kunbang_software string,is_show int,is_checked int,is_downloaded int,is_installed int,show_position int,kunbang_source int,is_installed_before int,is_installed_after int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

hive> create table kunbang_updatechecker_run_task(id int,`timestamp` int,softname string,softid string,version string,ip string,uid string,clienttime int,qid string,installymd int,os string,diskid string,macid string,cpuid string,uuid string,package_md5 string,kunbang_software string,is_show int,is_checked int,is_downloaded int,is_installed int,show_position int,kunbang_source int,is_installed_before int,is_installed_after int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

hive> LOAD DATA LOCAL INPATH '/data/bigdata/kunbang_updatechecker_run_task20170708.csv' OVERWRITE INTO TABLE kunbang_updatechecker_run_task;

https://chu888chu888.gitbooks.io/hadoopstudy/content/Content/8/chapter0807.html

http://yanliu.org/2015/08/13/Hadoop%E9%9B%86%E7%BE%A4%E4%B9%8BHive%E5%AE%89%E8%A3%85%E9%85%8D%E7%BD%AE/

http://www.cr173.com/soft/45333.html

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://192.168.1.178:3306/hive?createDatabaseIfNotExist=true</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hive</value>
  <description>username to use against metastore database</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>hive</value>
  <description>password to use against metastore database</description>
</property>

其中：
javax.jdo.option.ConnectionURL参数指定的是Hive连接数据库的连接字符串；
javax.jdo.option.ConnectionDriverName参数指定的是驱动的类入口名称；
javax.jdo.option.ConnectionUserName参数指定了数据库的用户名；
javax.jdo.option.ConnectionPassword参数指定了数据库的密码。

<property> 
 <name>hive.exec.local.scratchdir</name>
 <value>/home/hadoop/iotmp</value>
 <description>Local scratch space for Hive jobs</description>
 </property>
 <property>
 <name>hive.downloaded.resources.dir</name>
 <value>/home/hadoop/iotmp</value>
 <description>Temporary local directory for added resources in the remote file system.</description>
 </property>

hadoop@hadoopmaster:/usr/local/hive/lib$ ./schematool -initSchema -dbType mysql

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://192.168.1.166:3306/hive?createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: hive
Starting metastore schema initialization to 2.1.0
Initialization script hive-schema-2.1.0.mysql.sql
Initialization script completed
schemaTool completed

mysql>flush privileges;

几种保存Hive查询结果的方法

发表评论取消回复

微信