Discussions

Expand all | Collapse all

What we expect more from OmniSciDB

  • 1.  What we expect more from OmniSciDB

    Posted 14 days ago

    Hi Omnisci Team,

    As doing lots of testing with Omnisci DB open source and telecom data. so thought let me share my summarised feedback.

    Before starting, let me clarify first thing, I feel that OmniSci Database is revolutionary database in Design and can do lots of change in near future the way we think about database structure and performance.

    However, in my subsequent post I will concentrate more on issues/limitation which i have observed. My believe system says if we want to move from good to great then we should take the comments in positive way. My Testing is mostly on CPU and very less on GPU Card. We have face straight challenge on GPU memory as test database size is more than 1000GB and available GPU memory is only 8GB. Even if we buy V100 card also this will not help us more.

    Observation 1

    • Inconsistent performance – OmniSci Database is fully dependent on available CPU and Memory. Now assume if available system memory is 100 and as of now utilisation only 10%. This time query response will be immediate. However same query will not provide the response if already system consumed 90% of available memory for other queries.
    • I understand as OmniSci cached earlier queries columns in memory and for new query it need to release space in memory. However one odd query can change the performance matrix.

    Suggestion

    • For Adhoc queries, Omnisci should provide an option that we system administrator can decide which are tables and Columns need to cache completely in memory and other response from HDD/SSD.

    Observation 2

    • Running Simultaneous Query is problematic-Most of the time I observed during testing that OmniSci Database processing queries in sequential manner. Unfortunately if one query (select MobileNumber, usage from usage_Table group by MobileNumber limit 10 executed on 5 billion records), now any small query like (select usage from  usage_table where mobileNumber=aaaa will get stuck)

    Suggestion

    • In other database we observed option to allocate the Memory chunk for specific query. In that case if one query is taking time, doesn't impact the overall system.

    Observation 3

    • No Partition Table concept- When I am referring the Telecom data, then always Data will be segmented as following way-Hot Data, Warm Data and Cold Data.
    • today/Yesterday's Data is Hot data as Marketing, Analytics and Operation team will receive maximum query on this data.
    • A week old data is Warm Data as still lots of queries required a week data  for analysis.
    • A month old data is cold data due to legal compliance and billing conflict clearance.
    • It is not correct to assume all Data is same and required similar performance treatment.
    • This is not logical design to increase GPU,CPU and memory for overall Data volume. May be system has 100TB of storage but 80% query will run on yesterday or last week data but still 20% come on old data.

    Suggestion

    • Omnisci Database should provide the Table partitioning feature so administrator can decide which Data he/she need to load in GPU, Memory or Direst Response from HDD/SSD. This way administrator can meet the team expectations on performance.
    • Omnisci should provide an option to drop the Table partition. This drop partition should do cleanup on Operating system level also. This way administrator can manage system more effectively.

    Observation 4

    • No Logical Backup – OmniSci not provide an option for taking the Table or Schema Backup. This is impossible to generate a CSV file when you have 1 Billion records in a Table. This will takes months to complete.

    Suggestion

    • Backup Utility for Table partition, Table or Schema full Backup and incremental Backup.

     Observation 5

    • Metadata Information- Comparative to other database, OmnisciDB not provide any information about system Metadata. Like Table size, how much additional memory  is required if someone want to run query on Column X, How many queries executed in last an hour, which are queries taking maximum resources, if system need additional memory/cpu etc.

    Suggestion

    • Refer the other Database like and feel such types of tools will help to administrator
      • Oracle AWR, Snapshot
      • Vertica Query Plans, Analyzing workloads, Monitoring
      • SQL server performance Dashboard

    Observation 6

    • Not working with external Data Store

    Suggestion

    • Attach other Data stores and access Data using OmniSci External Table

     
    Regards,
    sumit


    #Core


  • 2.  RE: What we expect more from OmniSciDB

    Posted 9 days ago

    Hi Sumit -I'm responsible for Product Mgt here at OmniSci and this has given us a lot to read and respond to. Thank you so much for your message! You clearly care about our platform a lot, and seem to have put a fair bit of effort into both compiling these observations and what you'd like to see.

    ​While we'll respond to your observations in turn, I'm wondering if you're up to a 1:1 chat with our engineering team about this list. @Aaron Williams who runs our community team, or @Randy Zwitch who leads our dev advocate efforts can arrange for this session with you when convenient.

    Thanks for your patience and I'll respond with more detailed answers shortly

    Venkat

    ​​


  • 3.  RE: What we expect more from OmniSciDB

    Posted 8 days ago
    Just sent an email, will setup time for us to talk. Thanks!

    ------------------------------
    Aaron Williams
    VP Global Community and Tech Partnerships
    OmniSci
    San Francisco CA
    http://www.omnisci.com
    ------------------------------