Mapd_server keeps crashing/restarting


#1

We’re developing a custom (non-immerse) front end to our mapd core back end. We’re currently running 3.3.1 on a simple CPU only server with 64 GBs of RAM.

I noticed that, while I was testing the front end, periodically the charts would not redraw and then everything would start up again in a couple of minutes. The mapd logs showed that the service was restarted but there were no error messages. Then when I looked in the OS log messages, I saw that it had a bunch of segfault errors for the mapd_server.

For example (in /var/spool/log/messages):

Dec  4 09:10:02 hadoop2 kernel: mapd_server[22458]: segfault at 11f ip 0000000000e55c64 sp 00007f20d13dc0e0 error 4 in mapd_server[400000+3314000]
Dec  4 09:10:02 hadoop2 abrt-hook-ccpp: Process 25034 (mapd_server) of user 1001 killed by SIGSEGV - dumping core

Dec  4 09:11:28 hadoop2 abrt-hook-ccpp: Failed to create core_backtrace: waitpid failed: No child processes
Dec  4 09:11:28 hadoop2 abrt-server: Package 'mapd' isn't signed with proper key
Dec  4 09:11:28 hadoop2 abrt-server: 'post-create' on '/var/spool/abrt/ccpp-2017-12-04-09:10:02-25034' exited with 1
Dec  4 09:11:28 hadoop2 abrt-server: Deleting problem directory '/var/spool/abrt/ccpp-2017-12-04-09:10:02-25034'
Dec  4 09:11:30 hadoop2 systemd: mapd_server.service: main process exited, code=killed, status=11/SEGV
Dec  4 09:11:30 hadoop2 systemd: Unit mapd_server.service entered failed state.
Dec  4 09:11:30 hadoop2 systemd: mapd_server.service failed.
Dec  4 09:11:30 hadoop2 systemd: mapd_server.service holdoff time over, scheduling restart.

Or

Dec 4 13:57:30 hadoop2 kernel: mapd_server[97709]: segfault at 4b3d3af8 ip 00007f5ba11424dc sp 00007f5b9f4a79f8 error 4 in libc-2.17.so[7f5ba10c2000+1b8000]

Unfortunately, I was not able to look at the core_dump and I think it’s because of the abrt-hook-ccp error message.

We need to get to the bottom of this as it is preventing us from deploying it on a “real”, GPU server. Any/all help is greatly appreciated!


#2

I figured out how to get to the core dump. Here’s the information.

[root@hadoop2 ~]# abrt-cli list
id 7dc0465ed5a57df954b7a8bf197e65a9deb31c0a
reason:         mapd_server killed by SIGSEGV
time:           Thu 07 Dec 2017 09:05:52 AM CST
cmdline:        /opt/mapd/bin/mapd_server --config /hadoop/d1/mapd/mapd.conf
package:        mapd-3.3.1_20171108_32e7bcc-1
uid:            1001 (mapd)
count:          1
Directory:      /var/spool/abrt/ccpp-2017-12-07-09:05:52-93712
Run 'abrt-cli report /var/spool/abrt/ccpp-2017-12-07-09:05:52-93712' for creating a case in Red Hat Customer Portal

Please reply to this thread and I’ll send you the core dump files for further debugging if you’d like.


#3

Hello,

We are very excited that you decided to use mapd as your back end. We will do everything we can to help you out from this problem. I would help if you can provide us with the core dump. If possible it would also be very helpful if we can get approximate schema for the table and size of the data.

Thank you.


#4

Thank you Vraj. I sent you a private message.