1.版本问题

Zeppelin 从0.6.1版本开始,默认是基于 Spark 2.x 和 Scala 2.11版本进行编译的,亲测 Zeppelin 0.6.2与 Spark 1.6.x 版本是不兼容的,导致 Saprk Interpreters 不能正确运行,如果需要安装在老版本上的,需要自己从源码编译,可以指定 Spark、Hadoop等版本参数,可以参考 http://zeppelin.apache.org/docs/snapshot/install/build.html ,如果是0.6.0版本,可与 Spark 1.6.x 之前的兼容运行。

2.Phoenix-thin 连接问题

2017年1月3号更新

Phoenix for Spark 2.x Integration的补丁已经出来了,可以直接加载为DataFrame而不用通过JDBC的方式连接数据库了,会获得更高的效率。Pheonix for Spark 2.x 版本的问题可以参见https://issues.apache.org/jira/browse/PHOENIX-3333,如何使用可以参见文章Spark 连接 Phoenix 配置


Zeppelin 从0.6.0版本开始支持 Phoenix 连接,Phoenix默认是在jdbc interpreter 中配置的,配置过程可以参考 https://zeppelin.apache.org/docs/0.6.2/interpreter/jdbc.html#phoenix注意一定要在Dependencies中添加artifact 依赖,如果从 maven远程库下载太慢,可以直接填写本地phoenix-<version>-thin-client.jar文件路径,或者把 jar 文件复制到路径ZEPPELIN_HOME/interpreter/jdbc下。

但是如果使用的是phoenix-thin 连接,会报错误

No suitable driver found for http://localhost:8765

原因可以参见 https://github.com/apache/zeppelin/pull/1442 ,提供我已经编译好的 zeppelin-jdbc-0.6.2.jar,替换掉 ZEPPELIN_HOME/interpreter/jdbc 下边对应的同名文件即可。

文件下载:zeppelin-jdbc-0.6.2.jar

3.zeppelin中用 scala 加载 jdbc 数据问题

2017年1月3号更新

好久没有使用,重新折腾了一下,发现org.apache.hadoop.tracing.SpanReceiverHost.get(xxx)报错是由于Zeppelin提供的Hadoop版本和Spark编译时指定的版本不一致引起,只需要使用$SPARK_HOME/jars/hadoop-annotations-2.7.3.jar、hadoop-auth-2.7.3.jar、hadoop-common-2.7.3.jar替换掉$ZEPPELIN_HOME/lib下的对应文件即可。具体可以参考Zeppelin 0.6.2 使用spark2.x 的一些错误处理


刚开始使用的是Spark 2.0.1,使用下面的代码用 jdbc 读取数据库中的数据,发现总是报错,第一个关于 xxx.hive.ql.xxx 的错误,在 interpreter 的配置中将zeppelin.spark.useHiveContext项设置为false即可,如果后面org.apache.hadoop.tracing.SpanReceiverHost.get(xxx)还继续报错,可以 升级 Spark2.0.2试试 ,我是无意在笔记本上使用 Spark2.0.2 发现的 。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
val jdbcDf = spark.read
.format("jdbc")
.option("driver","org.apache.phoenix.queryserver.client.Driver")
.option("url","jdbc:phoenix:thin:url=http://localhost:8765;serialization=PROTOBUF")
.option("dbtable","bigjoy.imos")
.load()


java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:189)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263)
at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39)
at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38)
at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46)
at org.apache.spark.sql.hive.HiveSharedState.externalCatalog(HiveSharedState.scala:45)
at org.apache.spark.sql.hive.HiveSessionState.catalog$lzycompute(HiveSessionState.scala:50)
at org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48)
at org.apache.spark.sql.hive.HiveSessionState$$anon$1.<init>(HiveSessionState.scala:63)
at org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63)
at org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:382)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:143)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122)
... 47 elided
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
... 69 more
Caused by: java.lang.reflect.InvocationTargetException: java.lang.NoSuchMethodError: org.apache.hadoop.tracing.SpanReceiverHost.get(Lorg/apache/hadoop/conf/Configuration;Ljava/lang/String;)Lorg/apache/hadoop/tracing/SpanReceiverHost;
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
... 75 more
Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.tracing.SpanReceiverHost.get(Lorg/apache/hadoop/conf/Configuration;Ljava/lang/String;)Lorg/apache/hadoop/tracing/SpanReceiverHost;
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:634)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:354)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:104)
at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:140)
at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:146)
at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:159)
at org.apache.hadoop.hive.metastore.Warehouse.getDefaultDatabasePath(Warehouse.java:177)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:600)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:199)
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
... 80 more

参考

[1]: Zeppelin 源码编译
[2]: Zeppelin Phoenix Interpreter 配置
[3]: ZEPPELIN-1459: Zeppelin JDBC URL properties mangled
[4]: Zeppelin 0.6.2 使用spark2.x 的一些错误处理