How could I implement paging on sorted dataset?


#1

Hi, all.
I’ve just reached a point in exploration of mapd when I have some basic visualisations in my application and all sort of new questions started to pop up.
At a moment I’m stuck with a query that looks like it shoud work but it do not.
So in appplication I have some dynamically created datagrid, that for a sake of simplicity shows one of selected dimensions, and one of selected measure. For example “Item name” as a dimension and :“Sales amount” as a measure. As number of active items could be substantial (23K in that sample project, but coud reach 100K in biggish projects) I have to implement paging.
I do it with queries like this:

SELECT 
  itemName as "НоменклатураНаименование",
  (-sum(case when documentType = 'Sales' AND periodType = 'Current' then measure2 else 0 end))/1000 as "СуммаПродажиТыс"    
  FROM main_data
    JOIN calendar ON main_data.dateKey = calendar.dateKey
    JOIN invent_table ON main_data.itemKey = invent_table.itemKey
      WHERE documentType = 'Sales' AND periodType = 'Current'
  GROUP BY itemName
  LIMIT 1000
  OFFSET 5000

and all works well

But if I add sorting on top of that query, like this:

SELECT 
  itemName as "НоменклатураНаименование",
  (-sum(case when documentType = 'Sales' AND periodType = 'Current' then measure2 else 0 end))/1000 as "СуммаПродажиТыс"    
  FROM main_data
    JOIN calendar ON main_data.dateKey = calendar.dateKey
    JOIN invent_table ON main_data.itemKey = invent_table.itemKey
      WHERE documentType = 'Sales' AND periodType = 'Current'
  GROUP BY itemName
  ORDER BY СуммаПродажиТыс DESC
  LIMIT 1000
  OFFSET 5000

I get exception like Exception: Query couldn't keep the entire working set of columns in GPU memory

Strange thing is, that sorting without paging work just well: Query like this give result quickly:

SELECT 
	itemName as "НоменклатураНаименование",
	(-sum(case when documentType = 'Sales' AND periodType = 'Current' then measure2 else 0 end))/1000 as "СуммаПродажиТыс"
	 FROM main_data JOIN calendar ON main_data.dateKey = calendar.dateKey 
	   JOIN invent_table ON main_data.itemKey = invent_table.itemKey 
	        WHERE documentType = 'Sales' AND periodType = 'Current' 
	            GROUP BY itemName 
    	        ORDER BY СуммаПродажиТыс DESC

Intuitevely it looks like return some part of sorted recordset should not be impossible? Could I use some trick to get this done?


#2

I’ve maid a simplified example that demonstrate the same behaviour.

So in that case I have two tables: main_data ~70M and invent_table ~23K
Schemas for these tables are:

CREATE TABLE main_data (
    dateKey SMALLINT,
    documentType TEXT ENCODING DICT
    itemKey SMALLINT,
    locationKey SMALLINT,
    measure1 DECIMAL(10,2),
    measure2 DECIMAL(10,2),
    measure3 DECIMAL(10,2),
    measure4 DECIMAL(10,2),
    measure5 DECIMAL(10,2),
    liquidity TEXT ENCODING DICT(32));
CREATE TABLE invent_table (
    itemKey SMALLINT,
    itemCounter SMALLINT,
    itemCode TEXT ENCODING DICT(32)
    itemName TEXT ENCODING DICT(32)
    vendorCode TEXT ENCODING DICT(3
    itemDirection TEXT ENCODING DIC
    itemCategory TEXT ENCODING DICT
    itemGroups TEXT ENCODING DICT(3
    itemSubgroup TEXT ENCODING DICT
    itemUnit TEXT ENCODING DICT(32)
    itemBrand TEXT ENCODING DICT(32
    categoryManager TEXT ENCODING D
    RatingProductPOCompanyTO SMALLI
    SalesABCc TEXT ENCODING DICT(8)
    MarkupABCc TEXT ENCODING DICT(8
    XYZ TEXT ENCODING DICT(32),
    XyzProductCVofMonth INTEGER);

I have Ubuntu server in the cloud with 32GB memory and NVidia GTX 1060 GPU with 6Gb memory

Given these circumstances I can run query with a join on these two tables, group by by item name, calculation of sales amount and sorted by sales amount in descending order with a good performance

SELECT
 itemName,
 (-sum(case when documentType = 'Sales' then measure2 else 0 end))*30 as SalesAmount
     FROM main_data
        JOIN invent_table ON main_data.itemKey = invent_table.itemKey
        WHERE documentType = 'Sales'
            GROUP BY itemName
            ORDER BY SalesAmount desc

But as soon as I add LIMIT and OFFSET clauses for pagination (or just LIMIT 1000), I get error message sorting would be too slow and empty result set.
So such a query does no run, for example:

SELECT
 itemName,
 (-sum(case when documentType = 'Sales' then measure2 else 0 end))*30 as SalesAmount
     FROM main_data
        JOIN invent_table ON main_data.itemKey = invent_table.itemKey
        WHERE documentType = 'Sales'
            GROUP BY itemName
            ORDER BY SalesAmount desc
            LIMIT 1000
            OFFSET 2000;

Why it is so? Apparently if server can render whole dataset in sorted order, it should be possible for it to return just a subset of this dataset to the browser?
How can I fix this situation?


#3

Just found interesting thing: Immerse query editor have no problems executing sorted query without limit and offset clauses. But it returns just 1000 rows, so it should use some method to limit result set. So I went to the browser developer console and looked up command that immerse send to the server. As it happens Immerse query editor use parameter to the execute_sql method to limit result set size and it works well on my sorted queries. So I’ve tried to do just the same thing with my connector and that works well too. That would close the case to my whole satisfaction, but unfortunately there is no similiar parameter in execute_sql thrift method that would implement offset clause. And without it I can not implement paging. So I’m back into my original position.
I’ve tried a hybrid solution - to use external limit parameter from connector and SQL embedded OFFSET clause, but that did not do the trick either. It looks like OFFSET clause just does not work for that query.
So for example query with no-op offset clause returns 23K rows

SELECT
 itemName,
 (-sum(case when documentType = 'Sales' then measure2 else 0 end))*30 as SalesAmount
     FROM main_data
        JOIN invent_table ON main_data.itemKey = invent_table.itemKey
        WHERE documentType = 'Sales'
            GROUP BY itemName
            ORDER BY SalesAmount desc
            OFFSET 0;

And query with offset 1 quietly returns zero rows without any errors

SELECT
 itemName,
 (-sum(case when documentType = 'Sales' then measure2 else 0 end))*30 as SalesAmount
     FROM main_data
        JOIN invent_table ON main_data.itemKey = invent_table.itemKey
        WHERE documentType = 'Sales'
            GROUP BY itemName
            ORDER BY SalesAmount desc
            OFFSET 1;

What can I do to help to solve this problem? I can provide my dataset, and any other help in diagnosting.
Any advice for possible workaround?


#4

I will give a response as a dba; I wouldn’t like to get query like that on databases I manage, because every time you launch the query the dB has to do the full work (join, group by and sorting) regardless the rows you are going to fetch. Additionally re-executing the query you would get wrong results if tables used by the query has been modified

I would prefere a temp table created with the full resultset of the query, then paged with offset and limit and dropped at the end of the session (if the software isn’t able to share between multiple ones) or other caching mechanisms application server side


#5

Thank you for your answer, but I’m a little puzzled with it.
Let me clarify first that I do not regard mapd here as some general kind of database. In my case i consider to use it is a backend for a very specialized BI solution. As such in my case database would be in a read only state in office hours, with nightly data reload tasks. So considering these constraints in my opinion there is no specific reason why query with identical parameters should return different result sets.
Regarding temp tables technique I wonder is there any support for such session based temp tables in mapd? I did not see any references to it before. Anyway it was my impression that even for ‘in CPU memory’ analytical databases it was cheaper to make a join and aggregation anew then make a materialized view for each possible resultset. And in case of GPU based database that balance is probably even more close to the dynamic execution side?


#6

I know mapd is very fast, and I guess you are passionate about this project almost as I am, if not more; when I try a new software, expecially databases, my first question is not what it can do but how fast is :slight_smile:

But i prefere to not waste resources even on very fast system , because in the long run it’s better for wall clock and system performance.

anyway i tried a query like the one of your; with a result cardinality of 350 rows and the query works with an importa performace it.

so with query

select city,(-sum(case state when ‘TX’ then arrdelay else 0 end))*10 from flights join airports on origin=iata group by city order by 2

i have the following response times

without limit: 40ms spike CPU usage 3%
limit 100 offset 0: 227ms spike CPU usage 35%
limit 100 offset 100: 400ms spike CPU usage 57%
limit 100 offset 200: 586ms spike CPU usage 103%
limit 100 offset 300: 770ms spike CPU usage 150%

without order by the performace isnt unaffected and the cpu isnt used

as you can see using a combination of order by, limit and offset cause the software to use a different code path pushing something on CPU side; so maybe with an higher cardinality


#7

Thank you for this detailed investigation.
As I see it - these results are consistent with my results on dataset of probably bigger size and bigger cardinality. Basically server does something strange and possibly wrong with queries that include ORDER BY, LIMIT and OFFSET clauses.
By the way, actually only ORDER BY on expression column have such effect. When I chage query and do a sort on a dimension field (the same field that is specified in the GROUP BY clause) - query works as expected.

 SELECT
     itemName,
     (-sum(case when documentType = 'Sales' then measure2 else 0 end))*30 as SalesAmount
         FROM main_data
            JOIN invent_table ON main_data.itemKey = invent_table.itemKey
            WHERE documentType = 'Sales'
                GROUP BY itemName
                ORDER BY itemName desc
                LIMIT 1000
                OFFSET 7000;

Results that you are present look like symptom of some bug for me. Logically there is no reason why query with LIMIT 100 OFFSET 0 should consume CPU, GPU or Memory significantly much more then the same query without LIMIT and OFFSET clauses. Actually we see that in Immerse web interface on the “SQL Editor” page the same logical query with externally set limit on 1000 rows did not get any performance hit, And that’s understanable, once we have some resultset in its totality, to get some slice of this total resultset should not be resource intensive operation. And there is no logical difference in getting first or last page from resultset.

Regarding a possible workaround - I think about it, but frankly I can not devise any scheme. Some sort of cache infrastructure at the application server layer is possible solution, but it is very difficult to implement properly and badly implemented cache with big resultsets could hinder performance very quickly. Moreover I have (maybe wrong) impression that MapD model is not very compatible with a model based on resultset caches. Cache is not a silver bullet: it tends to use the same resources as database itself - memory, CPU, data bus. We can see as such bottleneck could pop up in a traditional in-memory analytical systems


#8

yap the only workaround now is to cache client side, because querying a table created with a CTAS has the same behaviour; the only explanation is that software is taking another code path.

anyway i tried the same query with version mapd-ce-3.1.01-20170622-69314c5-Linux-x86_64-render (a release of july) and the query with order by, limit and offset reply as expected

limit 100 offset 0: 40ms spike CPU usage 3%
limit 100 offset 100: 40ms spike CPU usage 3%
limit 100 offset 200: 40ms spike CPU usage 4%
limit 100 offset 300: 42ms spike CPU usage 3%

i tried that release because at the start of September something has been changed on sorting to accomodate TOP_N queries, so i think something went wrong

maybe @Darwin can help us in sorting out this issue

i tried also the mapd-ce-3.2.0-20170817-8ab3117-Linux-x86_64-render and the query is fast

i getting the faulty behaviour since mapd-ce-3.2.2-20170909-373c12a-Linux-x86_64-render release.

so as a workaround i can suggest to downgrade the software to mapd-ce-3.2.0-20170817-8ab3117-Linux-x86_64-render if you dont want to cache client side; it’s definetly a bug or a regression


#9

Thank you very much for your help, @aznable. It is pretty reassuring to know for certain that current behaviour is indeed some deviance from the normal. Unfortunately I can not take your advise and downgrade, because changes that landed in version 3.3.0 are critically important to us. ( We have a classical star model scheme and it basically started to work for us just after the v3.3 upgrade. See MapD CE crashes on relatively small dataset with a star schema (resolved in v 3.3.0) ).

Regarding caches. As for possible workaround I actually never thought about client side cache as a possibility. I thought about the cache at the application server process, assuming that application server is running on the same computer (or same datacenter at least) as database server. Currently that is the case for our projects. Client side cache has one massive downside. Namely we should transfer big chunk of data to the client in the first place. When I tried that for modest dataset of 23K rows with just a two columns server side query time would be about one second and network traffic time could be like ten seconds. And I did not even mention yet various memory limits that browsers set for a process tab wise. So I do not think that resultset cache on the client side would be viable solution


#10

Hi,

There are several things going on here and as you have identified I think at least some of it is regressions in behaviour. I am working to try to uncover root causes.

Ultimately @vadimtsushko what you are asking for should work, if that is any consolation for now.

Whilst I try to get to the bottom of it, there is one thing you could try. Turn off static watchdog in your server for now. To do this set --enable-watchdog=false on server start up. No guarantees but it will remove some of the barriers you are hitting.

regards


#11

Thank you very much for answer @dwayneberry.
It was critically important to me to know that this crucial for us scenario should indeed work and current issues will be addressed in future. For now we have plenty work trying and testing different parts of our application with MapD backend.


#12

from database perspective the application server is a client, so when i said client side caching i meant application server not browser


#13

Understood. Thank you for clarification.


#14

Hi @dwayneberry
Is there any news about this problem?
Should I create issue on github with a reference to this thread? Any other actions on my behalf that could help?


#15

Hi @vadimtsushko,

Did you try turning off the static watchdog as @dwayneberry suggested?


#16

Hi,

I have been able to reproduce the bug and raised an internal issue, I probably should move that to the mapd-core issues to make it more visible externally. It will be prioritized and worked on as time allows.

As mentioned earlier, if you use --enable-watchdog=false on your server start up the problem you are seeing shouid go away.

regards


#17

@darwin @dwayneberry
Sorry, I did not tryed --enable-watchdog=false immediately. For some reason I assumed that would just make possible to use additional CPU memory and thereby hinder overall performance of system I’m testing.
Now when I’ve tried to disable that switch it looks like the query time does not depend from OFFSET parameter anymore. So I’ll use that switch for now.
Sorry for misunderstanding and thank you very much.