We’re developing a custom (non-immerse) front end to our mapd core back end. We’re currently running 3.3.1 on a simple CPU only server with 64 GBs of RAM.
I noticed that, while I was testing the front end, periodically the charts would not redraw and then everything would start up again in a couple of minutes. The mapd logs showed that the service was restarted but there were no error messages. Then when I looked in the OS log messages, I saw that it had a bunch of segfault errors for the mapd_server.
For example (in /var/spool/log/messages):
Dec 4 09:10:02 hadoop2 kernel: mapd_server: segfault at 11f ip 0000000000e55c64 sp 00007f20d13dc0e0 error 4 in mapd_server[400000+3314000] Dec 4 09:10:02 hadoop2 abrt-hook-ccpp: Process 25034 (mapd_server) of user 1001 killed by SIGSEGV - dumping core Dec 4 09:11:28 hadoop2 abrt-hook-ccpp: Failed to create core_backtrace: waitpid failed: No child processes Dec 4 09:11:28 hadoop2 abrt-server: Package 'mapd' isn't signed with proper key Dec 4 09:11:28 hadoop2 abrt-server: 'post-create' on '/var/spool/abrt/ccpp-2017-12-04-09:10:02-25034' exited with 1 Dec 4 09:11:28 hadoop2 abrt-server: Deleting problem directory '/var/spool/abrt/ccpp-2017-12-04-09:10:02-25034' Dec 4 09:11:30 hadoop2 systemd: mapd_server.service: main process exited, code=killed, status=11/SEGV Dec 4 09:11:30 hadoop2 systemd: Unit mapd_server.service entered failed state. Dec 4 09:11:30 hadoop2 systemd: mapd_server.service failed. Dec 4 09:11:30 hadoop2 systemd: mapd_server.service holdoff time over, scheduling restart.
Dec 4 13:57:30 hadoop2 kernel: mapd_server: segfault at 4b3d3af8 ip 00007f5ba11424dc sp 00007f5b9f4a79f8 error 4 in libc-2.17.so[7f5ba10c2000+1b8000]
Unfortunately, I was not able to look at the
core_dump and I think it’s because of the
abrt-hook-ccp error message.
We need to get to the bottom of this as it is preventing us from deploying it on a “real”, GPU server. Any/all help is greatly appreciated!