问题描述:
hbase表中有数据,而使用hbase的Client取数据的条数小于hbase实际的条数。并且在客户端是没有报错信息。
Bug1:
使用的是协处理器进行取数据的,说下协处理器的作用,在客户端对所取的数据进行处理后,再返回给客户端。这样可以减少数据的传输,提高查询速度。
客户端没有报错,找了下服务器端,报错信息如下:
2019-03-01 14:46:04,924 ERROR [B.defaultRpcServer.handler=59,queue=5,port=16020] observer.AggrRegionObserver: tracker Coprocessor Errorjava.lang.RuntimeException: tracker coprocess memory usage goes beyond cap, (40 + 4194304) * 50 > 209715200. Abord coprocessor. at com.tracker.coprocessor.observer.aggregate.handler.TopNHandler.checkMemoryUsage(TopNHandler.java:103) at com.tracker.coprocessor.observer.aggregate.AggrRegionScanner.buildAggrCache(AggrRegionScanner.java:61) at com.tracker.coprocessor.observer.aggregate.AggrRegionScanner.(AggrRegionScanner.java:37) at com.tracker.coprocessor.observer.AggrRegionObserver.doPostScannerObserver(AggrRegionObserver.java:70) at com.tracker.coprocessor.observer.AggrRegionObserver.postScannerOpen(AggrRegionObserver.java:37) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$52.call(RegionCoprocessorHost.java:1334) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1673) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1749) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperationWithResult(RegionCoprocessorHost.java:1712) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1329) at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2434) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33648) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2196) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108) at java.lang.Thread.run(Thread.java:748)
重点在黄色部分,占用的内存超过了预期值。
当我减少一次性所取数据,发现问题又来了,即Bug2。
Bug2:
跟踪代码,发现从Scan获取的量就有问题,只能取一条数据。后来发现注释掉scan.setBatch()方法时,可以正常获取数据。
看下Scan的源码:
/** * Set the maximum number of values to return for each call to next(). * Callers should be aware that invoking this method with any value * is equivalent to calling {@link #setAllowPartialResults(boolean)} * with a value of {@code true}; partial results may be returned if * this method is called. Use {@link #setMaxResultSize(long)}} to * limit the size of a Scan's Results instead. * * @param batch the maximum number of values */ public Scan setBatch(int batch) { if (this.hasFilter() && this.filter.hasFilterRow()) { throw new IncompatibleFilterException( "Cannot set batch on a scan using a filter" + " that returns true for filter.hasFilterRow"); } this.batch = batch; return this; }
代码的意思为:行过滤器和批处理不能同时使用,有冲突。
有人疑问为什么没有报错,是因为在Hbase的Client上又封装了下,相关代码为:
protected Scan constructScanByRowRange(String startRowKey, String endRowKey, QueryExtInfo queryExtInfo, boolean isAggr, Class clsType){ //构造Scan Scan scan = constructScan(queryExtInfo, isAggr, clsType); scan.setStartRow(Bytes.toBytes(startRowKey)); scan.setStopRow(Bytes.toBytes(endRowKey)); //构造filter if(queryExtInfo != null && queryExtInfo.isFilterSet()) scan.setFilter(queryExtInfo.getFilterList()); if(queryExtInfo != null && queryExtInfo.getScanCacheSize() != null) scan.setCaching(queryExtInfo.getScanCacheSize()); else scan.setCaching(scanCachingSize); if (!(scan.hasFilter() && scan.getFilter().hasFilterRow())) { scan.setBatch(batchRead); } return scan; }
所有的过滤行为都是在QueryExtInfo 类中实现,但是在使用行过滤器时并没有给改变scan的内部变量,所以scan.hasFilter() && scan.getFilter().hasFilterRow()为flase。
所以bug在这里被忽略了。