jvm内存崩溃案例1
一、问题发现
今天 XXX 反馈在接口机 XXX 上做 hdfs dfs -mv(同时执行 150 个 mv)操作的时候出现了大量的类似hs_err_pid10147.log(10147 代表 java 进程号)这样的文件,这个文件是 java 的致命错误日志,进程会直接崩溃。
问题排查
查看日志
hs_err_pid10147.log 中关键日志如下,或者是内存不足或者是 java 不能创建新的线程(即在系统层面限制了进程或线程创建的数量)
# There is insufficient memory for the Java Runtime Environment to
continue. # Cannot create GC thread. Out of system resources. # Possible reasons:
# The system is out of physical RAM or swap space
# In 32 bit mode, the process size limit was hit
# Possible solutions:
# Reduce memory load on the system
# Increase physical memory or swap space
# Check if swap backing store is full
# Use 64 bit Java on a 64 bit OS
# Decrease Java heap size (-Xmx/-Xms)
# Decrease number of Java threads
# Decrease Java thread stack sizes (-Xss)
# Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
# Out of Memory Error (gcTaskThread.cpp:48), pid=14974,
tid=0x00007fcbeaf75700
#
# JRE Version: (8.0_161-b12) (build )
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.161-b12 mixed
mode linux-amd64 compressed oops)
# Failed to write core dump. Core dumps have been disabled. To enable
core dumping, try "ulimit -c unlimited" before starting Java again
确实是 hdfs mv 的 java 进程崩溃
VM Arguments:
jvm_args: -Dproc_dfs -Xmx1000m
-Dhadoop.log.dir=/opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/
lib/hadoop/logs -Dhadoop.log.file=hadoop.log
-Dhadoop.home.dir=/opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0. 2/lib/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console
-Djava.library.path=/opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2
/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true
-Dhadoop.security.logger=INFO,NullAppender
java_command: org.apache.hadoop.fs.FsShell -mv hdfs://10.244.12.214:8020/serv/smartsteps/raw/events/locationev
ent/2019/05/15/011/000150_0.gz
hdfs://10.244.12.214:8020/serv/smartsteps/raw/events/locationevent/2
019/05/15/011/./CD051201010101ASMARTSTEPS18051510001700001
51.011.gz
当时的内存还很大:
OS:CentOS Linux release 7.2.1511 (Core)
uname:Linux 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC
2015 x86_64
libc:glibc 2.17 NPTL 2.17
rlimit: STACK 8192k, CORE 0k, NPROC 4096, NOFILE 65535, AS
infinity
load average:46.84 11.88 4.02
/proc/meminfo:
MemTotal: 131464872 kB
MemFree: 107818664 kB
MemAvailable: 125719212 kB
BuffeRegionServer: 2220 kB
Cached: 17496468 kB
SwapCached: 0 kB
Active: 10411796 kB
查看某个 java 进程堆栈信息,因为内存足够故推断是第二种可能(黄色)
# There is insufficient memory for the Java Runtime Environment to
continue. # Cannot create GC thread. Out of system resources. # Possible reasons:
# The system is out of physical RAM or swap space
# In 32 bit mode, the process size limit was hit
查看程序日志,发现不能创建线程,确定问题是第二种可能
解决方案
用户可以启动的最大进程或线程数之前配置的是 1024,修改为 65535,在重新执行 150 个 mv 操作时正常