hadoop基础教程 系统运行与维护
hive10G数据测试
http://blog.fens.me/hadoop-hive-10g/
hadoop分布式集群
http://blog.csdn.net/zpf336/article/details/53262379
Hive取非Group by字段数据的方法
http://www.crazyant.net/1600.html
select sid,collect_set(class_id) from table2 group by sid;
select sid,collect_set(class_id)[0] from table2 group by sid;
http://www.linuxidc.com/Linux/2017-06/144600.htm
Flume采集Nginx日志到HDFS
下载apache-flume-1.7.0-bin.tar.gz,用
tar -zxvf
解压,在/etc/profile文件中增加设置:
export FLUME_HOME=/opt/apache-flume-1.7.0-bin
export PATH=$PATH:$FLUME_HOME/bin
修改$FLUME_HOME/conf/下的两个文件,在flume-env.sh中增加JAVA_HOME:
JAVA_HOME=/opt/jdk1.8.0_121
最重要的,修改flume-conf.properties文件:
# 配置Agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# 配置Source
a1.sources.r1.type = exec
a1.sources.r1.channels = c1
a1.sources.r1.deserializer.outputCharset = UTF-8
# 配置需要监控的日志输出目录
a1.sources.r1.command = tail -F /usr/local/nginx/log/access.log
# 配置Sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.sinks.k1.hdfs.path = hdfs://master:9000/flume/events/%Y-%m
a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H
a1.sinks.k1.hdfs.fileSuffix = .log
a1.sinks.k1.hdfs.minBlockReplicas = 1
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.rollInterval = 86400
a1.sinks.k1.hdfs.rollSize = 1000000
a1.sinks.k1.hdfs.rollCount = 10000
a1.sinks.k1.hdfs.idleTimeout = 0
# 配置Channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 将三者连接
a1.sources.r1.channel = c1
a1.sinks.k1.channel = c1
以上文件设置了Source、Channel和Sink,将Nginx日志中的记录采集到HDFS,运行
flume-ng agent -n a1 -c conf -f $FLUME_HOME/conf/flume-conf.properties
如果没有报错,则安装设置成功了,Nginx中新增加的记录都会被Flume采集,并且存储到HDFS。
http://www.cnblogs.com/cnmenglang/p/6543927.html
[root@10-10-67-217 apache-flume-1.7.0-bin]# pwd
/data/bigdata/apache-flume-1.7.0-bin
[root@10-10-67-217 apache-flume-1.7.0-bin]# flume-ng agent --conf-file conf/hdfs.properties -c conf --name LogAgent -Dflume.root.logger=DEBUG,console
mysql>grant all privileges on *.* to hive@"%" identified by "hive" with grant option;
mysql>flush privileges;
几种保存Hive查询结果的方法
http://www.cnblogs.com/harvey888/p/6279503.html?utm_source=itdadao&utm_medium=referral
https://stackoverflow.com/questions/29138498/sqoop-import-without-primary-key-in-rdbms
sqoop import \ --connect jdbc:mysql://localhost/test_db \ --username root \ --password **** \ --table user \ --target-dir /user/root/user_data \ --columns "first_name, last_name, created_date" -m 1
sqoop import \ --connect jdbc:mysql://localhost/test_db \ --username root \ --password **** \ --table user \ --target-dir /user/root/user_data \ --columns "first_name, last_name, created_date" --split-by created_date
hdfs to mysql
hive> INSERT OVERWRITE DIRECTORY '/sqoop/1.csv' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE SELECT * FROM userinfo;
[root@web_test hadoop-2.7.3]# ./bin/hdfs dfs -cat /sqoop/1.csv/000000_0
[root@web_test bin]# ./sqoop export --connect jdbc:mysql://localhost:3306/hive \--username root \--table userinfo -m 1 \--export-dir /sqoop/1.csv
set hive.cli.print.header=true;
mysql to hive
https://stackoverflow.com/questions/26853164/sqoop-from-mysql-where-the-data-contains-carriage-returns
--hive-drop-import-delims
[root@web_test bin]# ./sqoop import --connect jdbc:mysql://localhost:3306/hive \--username root \--table userinfo \--hive-import \--hive-overwrite
\--create-hive-table \--hive-table userinfo \--target-dir /sqoop/tree1 -m 1 --hive-drop-import-delims
[root@web_test bin]# ./sqoop import --connect jdbc:mysql://localhost:3306/hive \--username root \--table userinfo \--hive-import \--hive-overwrite \--create-hive-table \--hive-table userinfo \--target-dir /sqoop/tree3 -m 1
[root@web_test hadoop-2.7.3]# ./bin/hdfs dfs -ls /user/hive/warehouse/
17/07/21 23:43:13 INFO output.FileOutputCommitter: Saved output of task 'attempt_local269851262_0001_m_000000_0' to hdfs://localhost:9000/sqoop/tree3/_temporary/0/task_local269851262_0001_m_000000
sqoop安装
http://1csh1.github.io/2016/04/19/Sqoop%E5%AE%89%E8%A3%85%E4%B8%8E%E9%85%8D%E7%BD%AE/
hive commands with example
https://www.edureka.co/blog/hive-commands-with-examples
export JAVA_HOME=/home/test/jdk1.8.0_121
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export HADOOP_HOME=/home/test/hadoop-2.7.3
export HIVE_HOME=/home/test/apache-hive-2.1.1-bin
export PATH=$HIVE_HOME/bin:$PATH
hadoop dfsadmin -safemode leave
hadoop非22号端口
[root@10-10-67-217 hadoop]# cat hadoop-env.sh
export HADOOP_SSH_OPTS="-p 52222"
先安装java,参考elasticsearch的java安装,设置环境变量
http://192.168.0.109:50070/dfshealth.html#tab-overview
保证java和hadoop在同一个目录
am using Cloudera quickstart vm 5.3 for the first time on my virtual machine. The task to is to create a simple text file on my local pc and move it to HDFS, display the contents of the file- all using HDFS commands. I have created a directory using the command that looks exactly like:
[cloudera@quickstart ~]$ hdfs dfs -mkdir skk411.
The folder got created but I am not able to locate where exactly it got created. I used both, search tool and manually searched all the folders present. But the directory does exist because when I use the same command again, it says "File exists". Someone please help me to find the path.
HDFS is a separate filesystem, so its directories and files do not map directly to directories and files on your local filesystem. HDFS does store its data on the local filesystem but in its own format.
If you want to see what you have in your home directory in HDFS you can run hdfs dfs -ls.
参考hadoop官网
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html#Standalone_Operation
http://www.powerxing.com/install-hadoop-in-centos/
251 ./bin/hadoop version
256 vim etc/hadoop/hadoop-env.sh
258 vim /root/.bashrc
259 source /root/.bashrc
265 vim hadoop-config.sh
268 mkdir input
269 cp etc/hadoop/*.xml input
270 ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'
285 export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec
286 ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'
300 cd /home/test/
330 export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec
331 /home/test/hadoop-2.7.3/bin/hadoop version
332 cd /home/test/
334 cd hadoop-2.7.3
335 export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec
336 ./bin/hadoop version
337 export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec
338 ./bin/hadoop version
359 cd hadoop-2.7.3
367 cd output/
374 cd ../etc/hadoop/
376 cp core-site.xml{,.bak}
[root@10-10-67-217 hadoop]# cat core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/data/bigdata/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
378 cp hdfs-site.xml{,.bak}
[root@10-10-67-217 hadoop]# cat hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/data/bigdata/hadoop-root/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/data/bigdata/hadoop-root/dfs/data</value>
</property>
</configuration>
383 ssh-keygen
384 ssh-copy-id root@localhost
385 ssh localhost
386 bin/hdfs namenode -format
390 ./sbin/start-dfs.sh
391 ./bin/hdfs dfs -mkdir /user
397 dd if=/dev/zero of=test.img count=1 bs=100M
433 ./bin/hdfs dfs -mkdir example
434 ./bin/hdfs dfs -ls
435 ./bin/hdfs dfs -put test.img example
436 ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep example output 'dfs[a-z.]+'
440 ./bin/hdfs dfs -du
446 ./bin/hdfs dfs -get output example
[root@localhost hadoop-2.7.3]#
mysql hadoop
https://www.percona.com/blog/2013/07/11/mysql-and-hadoop/
https://blog.cloudera.com/blog/2014/04/using-apache-hadoop-and-impala-with-mysql-for-data-analysis/
https://zh.hortonworks.com/tutorial/how-to-process-data-with-apache-hive/
https://stackoverflow.com/questions/18129581/how-do-i-output-the-results-of-a-hiveql-query-to-csv
hive to hdfs
hive> INSERT OVERWRITE DIRECTORY '/path/to/output/dir' SELECT * FROM userinfo;
[root@web_test hadoop-2.7.3]# ./bin/hdfs dfs -ls /path/to/output/dir/000000_0
-rwxr-xr-x 1 root supergroup 24 2017-07-22 00:06 /path/to/output/dir/000000_0
[root@web_test hadoop-2.7.3]# ./bin/hdfs dfs -cat /path/to/output/dir/000000_0
1test
2siyu
3victor
hive to csv
[root@virtualmachine bin]# hive -e 'select * from userinfo' > /home/test/1.csv
[root@10-10-67-217 apache-hive-2.1.1-bin]# hive -e "set hive.support.sql11.reserved.keywords=false;create table kunbang_updatechecker_run20170710(id int,timestamp int,softname string,softid string,version string,ip string,uid string,clienttime int,qid string,installymd int,os string,diskid string,macid string,cpuid string,uuid string,package_md5 string,kunbang_software string,is_show int,is_checked int,is_downloaded int,is_installed int,show_position int,kunbang_source int,is_installed_before int,is_installed_after int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;"
[root@virtualmachine bin]#
hive> INSERT OVERWRITE LOCAL DIRECTORY '/home/test/temp.csv' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select * from userinfo;
hbase
http://1csh1.github.io/2016/04/02/HBase%E5%8D%95%E6%9C%BA%E6%A8%A1%E5%BC%8F%E9%85%8D%E7%BD%AE/
http://hbase.apache.org/book.html#quickstart
Sqoop导入关系数据库到Hive
https://segmentfault.com/a/1190000002532293
hive索引
http://www.cnblogs.com/zlslch/p/6105294.html
hive
https://chu888chu888.gitbooks.io/hadoopstudy/content/Content/8/chapter0802.html
hive> create table kunbang(id int,`timestamp` int,softname string,softid string,version string,ip string,uid string,clienttime int,qid string,installymd int,os string,diskid string,macid string,cpuid string,uuid string,package_md5 string,kunbang_software string,is_show int,is_checked int,is_downloaded int,is_installed int,show_position int,kunbang_source int,is_installed_before int,is_installed_after int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
hive> LOAD DATA LOCAL INPATH '/data/bigdata/kunbang20170708.csv' OVERWRITE INTO TABLE kunbang_table;
hive> set hive.support.sql11.reserved.keywords=false;create table kunbang_updatechecker_run(id int,`timestamp` int,softname string,softid string,version string,ip string,uid string,clienttime int,qid string,installymd int,os string,diskid string,macid string,cpuid string,uuid string,package_md5 string,kunbang_software string,is_show int,is_checked int,is_downloaded int,is_installed int,show_position int,kunbang_source int,is_installed_before int,is_installed_after int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
hive> create table kunbang_updatechecker_run_task(id int,`timestamp` int,softname string,softid string,version string,ip string,uid string,clienttime int,qid string,installymd int,os string,diskid string,macid string,cpuid string,uuid string,package_md5 string,kunbang_software string,is_show int,is_checked int,is_downloaded int,is_installed int,show_position int,kunbang_source int,is_installed_before int,is_installed_after int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
hive> LOAD DATA LOCAL INPATH '/data/bigdata/kunbang_updatechecker_run_task20170708.csv' OVERWRITE INTO TABLE kunbang_updatechecker_run_task;
https://chu888chu888.gitbooks.io/hadoopstudy/content/Content/8/chapter0807.html
http://yanliu.org/2015/08/13/Hadoop%E9%9B%86%E7%BE%A4%E4%B9%8BHive%E5%AE%89%E8%A3%85%E9%85%8D%E7%BD%AE/
http://www.cr173.com/soft/45333.html
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://192.168.1.178:3306/hive?createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hive</value> <description>password to use against metastore database</description> </property> 其中: javax.jdo.option.ConnectionURL参数指定的是Hive连接数据库的连接字符串; javax.jdo.option.ConnectionDriverName参数指定的是驱动的类入口名称; javax.jdo.option.ConnectionUserName参数指定了数据库的用户名; javax.jdo.option.ConnectionPassword参数指定了数据库的密码。
<property> <name>hive.exec.local.scratchdir</name> <value>/home/hadoop/iotmp</value> <description>Local scratch space for Hive jobs</description> </property> <property> <name>hive.downloaded.resources.dir</name> <value>/home/hadoop/iotmp</value> <description>Temporary local directory for added resources in the remote file system.</description> </property>
hadoop@hadoopmaster:/usr/local/hive/lib$ ./schematool -initSchema -dbType mysql
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://192.168.1.166:3306/hive?createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: hive
Starting metastore schema initialization to 2.1.0
Initialization script hive-schema-2.1.0.mysql.sql
Initialization script completed
schemaTool completed