hadoop和hive

hadoop基础教程  系统运行与维护

hive10G数据测试

http://blog.fens.me/hadoop-hive-10g/

hadoop分布式集群

http://blog.csdn.net/zpf336/article/details/53262379

Hive取非Group by字段数据的方法

http://www.crazyant.net/1600.html

select sid,collect_set(class_id) from table2 group by sid;

select sid,collect_set(class_id)[0] from table2 group by sid;

http://www.linuxidc.com/Linux/2017-06/144600.htm

Flume采集Nginx日志到HDFS

下载apache-flume-1.7.0-bin.tar.gz,用

tar -zxvf

解压,在/etc/profile文件中增加设置:

export FLUME_HOME=/opt/apache-flume-1.7.0-bin
export PATH=$PATH:$FLUME_HOME/bin

修改$FLUME_HOME/conf/下的两个文件,在flume-env.sh中增加JAVA_HOME:

JAVA_HOME=/opt/jdk1.8.0_121

最重要的,修改flume-conf.properties文件:

# 配置Agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置Source
a1.sources.r1.type = exec
a1.sources.r1.channels = c1
a1.sources.r1.deserializer.outputCharset = UTF-8

# 配置需要监控的日志输出目录
a1.sources.r1.command = tail -F /usr/local/nginx/log/access.log

# 配置Sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.sinks.k1.hdfs.path = hdfs://master:9000/flume/events/%Y-%m
a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H
a1.sinks.k1.hdfs.fileSuffix = .log
a1.sinks.k1.hdfs.minBlockReplicas = 1
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.rollInterval = 86400
a1.sinks.k1.hdfs.rollSize = 1000000
a1.sinks.k1.hdfs.rollCount = 10000
a1.sinks.k1.hdfs.idleTimeout = 0

# 配置Channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 将三者连接
a1.sources.r1.channel = c1
a1.sinks.k1.channel = c1

以上文件设置了Source、Channel和Sink,将Nginx日志中的记录采集到HDFS,运行

flume-ng agent -n a1 -c conf -f $FLUME_HOME/conf/flume-conf.properties

如果没有报错,则安装设置成功了,Nginx中新增加的记录都会被Flume采集,并且存储到HDFS。

http://www.cnblogs.com/cnmenglang/p/6543927.html

[root@10-10-67-217 apache-flume-1.7.0-bin]# pwd
/data/bigdata/apache-flume-1.7.0-bin
[root@10-10-67-217 apache-flume-1.7.0-bin]# flume-ng agent --conf-file  conf/hdfs.properties -c conf --name LogAgent -Dflume.root.logger=DEBUG,console

mysql>grant all privileges on *.* to hive@"%" identified by "hive" with grant option;

mysql>flush privileges;

几种保存Hive查询结果的方法

http://www.cnblogs.com/harvey888/p/6279503.html?utm_source=itdadao&utm_medium=referral

https://stackoverflow.com/questions/29138498/sqoop-import-without-primary-key-in-rdbms
sqoop import \
    --connect jdbc:mysql://localhost/test_db \
    --username root \
    --password **** \
    --table user \
    --target-dir /user/root/user_data \
    --columns "first_name, last_name, created_date"
    -m 1

sqoop import \
    --connect jdbc:mysql://localhost/test_db \
    --username root \
    --password **** \
    --table user \
    --target-dir /user/root/user_data \
    --columns "first_name, last_name, created_date"
    --split-by created_date
hdfs to mysql
hive> INSERT OVERWRITE DIRECTORY '/sqoop/1.csv'  ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE  SELECT * FROM userinfo;
[root@web_test hadoop-2.7.3]# ./bin/hdfs dfs -cat /sqoop/1.csv/000000_0
[root@web_test bin]# ./sqoop export --connect jdbc:mysql://localhost:3306/hive \--username root \--table userinfo -m 1 \--export-dir /sqoop/1.csv

set hive.cli.print.header=true;

mysql to hive

https://stackoverflow.com/questions/26853164/sqoop-from-mysql-where-the-data-contains-carriage-returns

--hive-drop-import-delims
[root@web_test bin]# ./sqoop import --connect jdbc:mysql://localhost:3306/hive \--username root \--table userinfo \--hive-import \--hive-overwrite 
\--create-hive-table \--hive-table userinfo \--target-dir /sqoop/tree1 -m 1 --hive-drop-import-delims

[root@web_test bin]# ./sqoop import --connect jdbc:mysql://localhost:3306/hive \--username root \--table userinfo \--hive-import \--hive-overwrite \--create-hive-table \--hive-table userinfo \--target-dir /sqoop/tree3 -m 1

[root@web_test hadoop-2.7.3]# ./bin/hdfs dfs -ls /user/hive/warehouse/

17/07/21 23:43:13 INFO output.FileOutputCommitter: Saved output of task 'attempt_local269851262_0001_m_000000_0' to hdfs://localhost:9000/sqoop/tree3/_temporary/0/task_local269851262_0001_m_000000

sqoop安装

http://1csh1.github.io/2016/04/19/Sqoop%E5%AE%89%E8%A3%85%E4%B8%8E%E9%85%8D%E7%BD%AE/

hive commands with example

https://www.edureka.co/blog/hive-commands-with-examples

export JAVA_HOME=/home/test/jdk1.8.0_121
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export HADOOP_HOME=/home/test/hadoop-2.7.3
export HIVE_HOME=/home/test/apache-hive-2.1.1-bin
export PATH=$HIVE_HOME/bin:$PATH

hadoop dfsadmin -safemode leave 

hadoop非22号端口

[root@10-10-67-217 hadoop]# cat hadoop-env.sh

export HADOOP_SSH_OPTS="-p 52222"

先安装java,参考elasticsearch的java安装,设置环境变量

http://192.168.0.109:50070/dfshealth.html#tab-overview

保证java和hadoop在同一个目录

 am using Cloudera quickstart vm 5.3 for the first time on my virtual machine. The task to is to create a simple text file on my local pc and move it to HDFS, display the contents of the file- all using HDFS commands. I have created a directory using the command that looks exactly like:

[cloudera@quickstart ~]$ hdfs dfs -mkdir skk411.

The folder got created but I am not able to locate where exactly it got created. I used both, search tool and manually searched all the folders present. But the directory does exist because when I use the same command again, it says "File exists". Someone please help me to find the path.

HDFS is a separate filesystem, so its directories and files do not map directly to directories and files on your local filesystem. HDFS does store its data on the local filesystem but in its own format.

If you want to see what you have in your home directory in HDFS you can run hdfs dfs -ls.

参考hadoop官网

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html#Standalone_Operation

http://www.powerxing.com/install-hadoop-in-centos/

  251  ./bin/hadoop version
  256  vim etc/hadoop/hadoop-env.sh
  258  vim /root/.bashrc
  259  source /root/.bashrc
  265  vim hadoop-config.sh
  268  mkdir input
  269  cp etc/hadoop/*.xml input
  270  ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'
  285  export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec
  286  ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'
  300  cd /home/test/
  330  export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec
  331  /home/test/hadoop-2.7.3/bin/hadoop version
  332  cd /home/test/
  334  cd hadoop-2.7.3
  335  export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec
  336  ./bin/hadoop version
  337  export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec
  338  ./bin/hadoop version
  359  cd hadoop-2.7.3
  367  cd output/
  374  cd ../etc/hadoop/
  376  cp core-site.xml{,.bak}

[root@10-10-67-217 hadoop]# cat core-site.xml

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/data/bigdata/tmp</value>
        <description>Abase for other temporary directories.</description>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

  378  cp hdfs-site.xml{,.bak}

[root@10-10-67-217 hadoop]# cat hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/data/bigdata/hadoop-root/dfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/data/bigdata/hadoop-root/dfs/data</value>
    </property>
</configuration>
  383  ssh-keygen
  384  ssh-copy-id root@localhost
  385  ssh localhost
  386  bin/hdfs namenode -format
  390  ./sbin/start-dfs.sh
  391  ./bin/hdfs dfs -mkdir /user

  397  dd if=/dev/zero of=test.img count=1 bs=100M
  433  ./bin/hdfs dfs -mkdir example
  434  ./bin/hdfs dfs -ls
  435  ./bin/hdfs dfs -put test.img example
  436  ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep example output 'dfs[a-z.]+'
  440  ./bin/hdfs dfs -du
  446  ./bin/hdfs dfs -get output example
[root@localhost hadoop-2.7.3]#

mysql hadoop

https://www.percona.com/blog/2013/07/11/mysql-and-hadoop/

https://blog.cloudera.com/blog/2014/04/using-apache-hadoop-and-impala-with-mysql-for-data-analysis/

https://zh.hortonworks.com/tutorial/how-to-process-data-with-apache-hive/

https://stackoverflow.com/questions/18129581/how-do-i-output-the-results-of-a-hiveql-query-to-csv

hive to hdfs

hive> INSERT OVERWRITE DIRECTORY '/path/to/output/dir' SELECT * FROM userinfo;

[root@web_test hadoop-2.7.3]# ./bin/hdfs dfs -ls /path/to/output/dir/000000_0
-rwxr-xr-x   1 root supergroup         24 2017-07-22 00:06 /path/to/output/dir/000000_0
[root@web_test hadoop-2.7.3]# ./bin/hdfs dfs -cat /path/to/output/dir/000000_0
1test
2siyu
3victor

hive to csv

[root@virtualmachine bin]# hive -e 'select * from userinfo' > /home/test/1.csv

[root@10-10-67-217 apache-hive-2.1.1-bin]# hive -e "set hive.support.sql11.reserved.keywords=false;create table kunbang_updatechecker_run20170710(id int,timestamp int,softname string,softid string,version string,ip string,uid string,clienttime int,qid string,installymd int,os string,diskid string,macid string,cpuid string,uuid string,package_md5 string,kunbang_software string,is_show int,is_checked int,is_downloaded int,is_installed int,show_position int,kunbang_source int,is_installed_before int,is_installed_after int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;"

[root@virtualmachine bin]# 
hive> INSERT OVERWRITE LOCAL DIRECTORY '/home/test/temp.csv' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select * from userinfo;
hbase
http://1csh1.github.io/2016/04/02/HBase%E5%8D%95%E6%9C%BA%E6%A8%A1%E5%BC%8F%E9%85%8D%E7%BD%AE/

http://hbase.apache.org/book.html#quickstart

Sqoop导入关系数据库到Hive 

https://segmentfault.com/a/1190000002532293

hive索引

http://www.cnblogs.com/zlslch/p/6105294.html

hive

https://chu888chu888.gitbooks.io/hadoopstudy/content/Content/8/chapter0802.html

hive> create table kunbang(id int,`timestamp` int,softname string,softid string,version string,ip string,uid string,clienttime int,qid string,installymd int,os string,diskid string,macid string,cpuid string,uuid string,package_md5 string,kunbang_software string,is_show int,is_checked int,is_downloaded int,is_installed int,show_position int,kunbang_source int,is_installed_before int,is_installed_after int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

hive> LOAD DATA LOCAL INPATH '/data/bigdata/kunbang20170708.csv' OVERWRITE INTO TABLE kunbang_table;

hive> set hive.support.sql11.reserved.keywords=false;create table kunbang_updatechecker_run(id int,`timestamp` int,softname string,softid string,version string,ip string,uid string,clienttime int,qid string,installymd int,os string,diskid string,macid string,cpuid string,uuid string,package_md5 string,kunbang_software string,is_show int,is_checked int,is_downloaded int,is_installed int,show_position int,kunbang_source int,is_installed_before int,is_installed_after int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

hive> create table kunbang_updatechecker_run_task(id int,`timestamp` int,softname string,softid string,version string,ip string,uid string,clienttime int,qid string,installymd int,os string,diskid string,macid string,cpuid string,uuid string,package_md5 string,kunbang_software string,is_show int,is_checked int,is_downloaded int,is_installed int,show_position int,kunbang_source int,is_installed_before int,is_installed_after int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

hive> LOAD DATA LOCAL INPATH '/data/bigdata/kunbang_updatechecker_run_task20170708.csv' OVERWRITE INTO TABLE kunbang_updatechecker_run_task;

https://chu888chu888.gitbooks.io/hadoopstudy/content/Content/8/chapter0807.html

http://yanliu.org/2015/08/13/Hadoop%E9%9B%86%E7%BE%A4%E4%B9%8BHive%E5%AE%89%E8%A3%85%E9%85%8D%E7%BD%AE/

http://www.cr173.com/soft/45333.html

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://192.168.1.178:3306/hive?createDatabaseIfNotExist=true</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hive</value>
  <description>username to use against metastore database</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>hive</value>
  <description>password to use against metastore database</description>
</property>

其中:
javax.jdo.option.ConnectionURL参数指定的是Hive连接数据库的连接字符串;
javax.jdo.option.ConnectionDriverName参数指定的是驱动的类入口名称;
javax.jdo.option.ConnectionUserName参数指定了数据库的用户名;
javax.jdo.option.ConnectionPassword参数指定了数据库的密码。
<property> 
 <name>hive.exec.local.scratchdir</name>
 <value>/home/hadoop/iotmp</value>
 <description>Local scratch space for Hive jobs</description>
 </property>
 <property>
 <name>hive.downloaded.resources.dir</name>
 <value>/home/hadoop/iotmp</value>
 <description>Temporary local directory for added resources in the remote file system.</description>
 </property>

hadoop@hadoopmaster:/usr/local/hive/lib$ ./schematool -initSchema -dbType mysql 

SLF4J: Class path contains multiple SLF4J bindings. 

SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] 

SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] 

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 

SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://192.168.1.166:3306/hive?createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: hive
Starting metastore schema initialization to 2.1.0
Initialization script hive-schema-2.1.0.mysql.sql
Initialization script completed
schemaTool completed

    A+
发布日期:2017年07月07日  所属分类:未分类

发表评论

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: