Discussions

Expand all | Collapse all

Cluster not working after upgrade

  • 1.  Cluster not working after upgrade

    Posted 08-22-2019 04:06
    Edited by Georgia Harlow 08-22-2019 04:24
    First, I'm a newbie to mapd so I'm certain that I've missed a step or not configured my environment correctly and any help I can get with this would be most appreciated! Luckily I'm working on a test environment so it's a good place to learn. The reason I believe my cluster is not working because after all my services start successfully, when I run "validate cluster" it says "Not running a cluster nothing to validate." I get this when connecting to the omnisci database as the admin user using the port for the aggregator or leaf (same result).

    Here's what I did... I was attempting to upgrade our (functioning) test environment from 4.5 to 4.7.1. Reviewed the online docs on how to do this with a tarball and followed the steps.

    1. Stop the services on each node. (Later found out there was an auto-restart set up for the service, but did finally get the service completely stopped).
    2. Untar file in installs directory and recreated symbolic linked to point newly created folder.
    3. Ran install_omnisci_systemd.sh on each node.
    4. Attempted to start each node. The services all start with a statement about when the license will expire.

    First, I was getting a driver error so I updated the GPU drivers. Then I got an error that multipe servers were using the same data directory which is when I realized that old services I'd stopped and autorestarted. Got those formally killed. When troubleshooting that error I realized that for some reason, all of the omnisci.conf files got updated to remove the string-server entries initially so and the cluster wasn't working, I stopped the services and updated the omnisci.conf files to return them to have the settings prior to the upgrade for each of the servers but at this point, things went south with our test database so I tried running initdb with "-f" to recreate the database. Started with the aggregator then ran initdb for each of the leaf nodes. Still doesn't seem to work as a clustered environment.

    Here's my configuration information...

    Config Summary: 5 nodes, Node 1 has the aggregator and leaf node, Node 2 has the String Dictionary and Leaf and Nodes 3 - 5 are just leaf nodes.


    ##omnisci.conf - leaf 1

    port = 16274
    http-port = 16278
    calcite-port = 16279
    data = "/share/data/omnisci/testnode1/data"
    null-div-by-zero = true
    render-mem-bytes = 1000000000
    start-gpu = 1
    num-gpus = 1
    string-servers = "/share/data/omnisci/cluster.conf"

    [web]
    port = 16273
    frontend = "/share/data/omnisci/installs/latest/frontend"

    ##omnisci.conf - leaf 2 - 5

    port = 16274
    http-port = 16278
    calcite-port = 16279
    data = "/share/data/omnisci/testnode<node_number>/data"
    null-div-by-zero = true
    render-mem-bytes = 1000000000
    string-servers = "/share/data/omnisci/cluster.conf"

    [web]
    port = 16273
    frontend = "/share/data/omnisci/installs/latest/frontend"


    ##omnisci-agg.conf

    port = 6274
    http-port = 6278
    calcite-port = 6279
    data = "/share/data/omnisci/testnode1/agg/data"
    null-div-by-zero = true
    start-gpu = 0
    num-gpus = 1
    enable-debug-timer = true
    read-only = false
    cluster = "/share/data/omnisci/cluster.conf"

    [web]
    port = 6273
    frontend = "/share/data/omnisci/installs/latest/frontend"


    ##omnisci-sds.conf (FYI, noticed there's one of these sds conf files in every node folder but I only am starting the service on node 2)

    port = 6277
    path = "/share/data/omnisci/testnode2/sds"
    cluster = "/share/data/omnisci/cluster.conf"

    ##cluster.conf
    [
    {
    "host": "testnode1",
    "port": 16274,
    "role": "dbleaf"
    },
    {
    "host": "testnode2",
    "port": 16274,
    "role": "dbleaf"
    },
    {
    "host": "testnode3",
    "port": 16274,
    "role": "dbleaf"
    },
    {
    "host": "testnode4",
    "port": 16274,
    "role": "dbleaf"
    },
    {
    "host": "testnode5",
    "port": 16274,
    "role": "dbleaf"
    },
    {
    "host": "testnode2",
    "port": 6277,
    "role": "string"
    }
    ]

    Thanks in advance for any advice to help me figure this out!

    G
    #Core
    #General


  • 2.  RE: Cluster not working after upgrade

    Posted 08-22-2019 06:14
    Hi @Georgia Harlow,


    The configuration looks right. So just a couple of question
    have you used the - - skip-geo switch when you ran initdb?
    Are the big tables shared?



  • 3.  RE: Cluster not working after upgrade

    Posted 08-22-2019 07:29
    Hi, thanks for your reply.  Yes, I used the --skip-geo and --data /share/data/omnisci/testnode1/agg/data  (for the aggregator and just "data" for the leaf nodes).  I tried creating a table but it only shows up in the first node or whichever node I create it on.  When I query any of the other nodes using omnisql, it's not there.  Tried creating a new user and db and again, only shows up on the node I create it on.

    Thanks!

    G