Java线上应用故障排查之一:高CPU占用

时间:2022-02-22 02:58:03

先确定pid:

top

找到最消耗cpu的进程15495

 

再确定tid:

ps -mp 15495 -o THREAD,tid,time

找到最占用cpu的进程18448

 

printf "%x\n" 18448

4810

 

打印堆栈

jstack 15495 | grep 4810 -A 30

 

例如发现栈如下:

Java代码  Java线上应用故障排查之一:高CPU占用
  1. "regionserver60020-smallCompactions-1438827962552" daemon prio=10 tid=0x00007f4ce1903800 nid=0xe2a72 runnable [0x00007f443b8f6000]  
  2.    java.lang.Thread.State: RUNNABLE  
  3.         at com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect(Native Method)  
  4.         at com.hadoop.compression.lzo.LzoDecompressor.decompress(LzoDecompressor.java:315)  
  5.         - locked <0x00007f450c42d820> (a com.hadoop.compression.lzo.LzoDecompressor)  
  6.         at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:88)  
  7.         at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)  
  8.         at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)  
  9.         at java.io.BufferedInputStream.read(BufferedInputStream.java:334)  
  10.         - locked <0x00007f4494423b70> (a java.io.BufferedInputStream)  
  11.         at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)  
  12.         at org.apache.hadoop.hbase.io.compress.Compression.decompress(Compression.java:439)  
  13.         at org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultDecodingContext.prepareDecoding(HFileBlockDefaultDecodingContext.java:91)  
  14.         at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1522)  
  15.         at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1314)  
  16.         at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:358)  
  17.         at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.readNextDataBlock(HFileReaderV2.java:610)  
  18.         at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:724)  
  19.         at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:136)  
  20.         at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:108)  
  21.         at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:507)  
  22.         at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:217)  
  23.         at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:76)  
  24.         at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:109)  
  25.         at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1106)  
  26.         at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1482)  
  27.         at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:475)  
  28.         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)  
  29.         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)  
  30.         at java.lang.Thread.run(Thread.java:745)  

 是读写hfile发生的错误,导致启动多个runnable

 

一个应用占用CPU很高,除了确实是计算密集型应用之外,通常原因都是出现了死循环。