Select field, count(*) from table group by field, The result is confuse



What version are you using and what platform are you running on.

Is it possible to get a copy of the dataset, this is occurring on?



I use mapd 3.0. The data has something about my customer. I can’t copy it to you. sorry


Hi @llz, what hardware (particularly number and type of GPUs) are you running on?



Where did the version of the software come from (AWS, build or CE download)

What precise hardware platform are you running on



oeasy@oeasy:~/mapd-3.0.0-render/bin$ nvidia-smi
Mon Jun 5 13:04:24 2017
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| 0 GeForce GTX 108… Off | 0000:02:00.0 Off | N/A |
| 23% 33C P8 9W / 250W | 2810MiB / 11172MiB | 0% Default |
| 1 GeForce GTX 108… Off | 0000:03:00.0 Off | N/A |
| 23% 38C P8 10W / 250W | 2730MiB / 11172MiB | 0% Default |
| 2 GeForce GTX 108… Off | 0000:82:00.0 Off | N/A |
| 23% 31C P8 8W / 250W | 2730MiB / 11172MiB | 0% Default |
| 3 GeForce GTX 108… Off | 0000:83:00.0 Off | N/A |
| 23% 32C P8 9W / 250W | 2730MiB / 11172MiB | 0% Default |

| Processes: GPU Memory |
| GPU PID Type Process name Usage |
| 0 805 C+G …e/oeasy/mapd-3.0.0-render/bin/mapd_server 2808MiB |
| 1 805 C+G …e/oeasy/mapd-3.0.0-render/bin/mapd_server 2728MiB |
| 2 805 C+G …e/oeasy/mapd-3.0.0-render/bin/mapd_server 2728MiB |
| 3 805 C+G …e/oeasy/mapd-3.0.0-render/bin/mapd_server 2728MiB |
The version of software, I download from your website. mapd-ce-latest-Linux-x86_64-render.tar


I have try this sql on other table, the table has only 4000 records. The result is right.



Could you share the schema and some obsfucated data.

What is the OS you are running on? uname -a?



Description: Ubuntu 16.04.2 LTS
Release: 16.04
Codename: xenia

platform TEXT,
uid TEXT,
pid TEXT,
ref_a TEXT,
ref_plat TEXT,
os TEXT,
dt TEXT,
rate TEXT,
sz TEXT,
lg TEXT,
timer TEXT,
url TEXT,
start_time BIGINT,
level_time BIGINT,
ip TEXT,
vid TEXT,
sid TEXT,
u_create_date TEXT,
country TEXT,
area TEXT,
region TEXT,
city TEXT,
county TEXT,
isp TEXT,
s_type TEXT,
s_word TEXT,
version TEXT,
short_timer TEXT,
street TEXT,
community TEXT,
gps_iterms TEXT,
volume INT,
unit_id INT,
door_code TEXT,
signal TEXT,
network TEXT,
open_door INT,
open_status TEXT,
response TEXT,
data_source TEXT,
create_date TEXT



What is the RAM in this box? What CPU/s?

How did you install your nvidia driver, I see it is a newer than then default one I believe for ubuntu LTS from nvidia?

trying to solve the puzzle :slight_smile:



Can you tell us what the min and max are for the unit_id column? Also are there any nulls in that column?


Hi liz

Would it be possible to change your schema so that unit_id is declare as a TEXT field rather than INT.

Then reload and give the query another try?



My computer is 8 Core and RAM is 64G. unit_id has nulls in that column, min=0, max=971102.


mapdql> select uid, start_time from f_flow where platform = ‘door’ order by start_time desc limit 10;
Exception: Query would require a scan without a limit on table(s): f_flow
Can someone help me , what that mean?


Thanks, You advice is right, When i change the schema of unit_id as a TEXT field, It works well.


Btw @llz we’ve replicated the issue you described and should have a fix soon. Thank you for patiently walking us through it!

Encounter an exception: Query would require a scan without a limit on table(s)


Good to know, thanks.

We are looking in the bug you ran into with the INT field group by. Will let you know what we find.

As far as your second question goes around the Exception: Query would require a scan without a limit on table(s): f_flow. You have run into our watchdog. This is our current way to protect our users from themselves.
We have heuristics in our code that identify query patterns that we think will take an unreasonable amount of time to run, and rather than letting them run we put out notification like this one.

In your case here you have an option to disable watchdog (by adding parameter enable-watchdog=false on server startup), as it is not like you have billions of record to churn through ,and allow this query to run.

We also have a dynamic watchdog system, which we are transitioning to, where rather than failing due to a heuristic, it will let a query run. Then, only if the query takes too long, will the dynamic watchdog time the query out.

On a separate note the core reason for the watchdog heuristic triggering for this particular pattern is being worked on at the moment so hopefully will not need to be guarded in the near future.



I get, Thanks for your work.


When you say we can set enable-watchdog=false, where? I’m sorry because this is probably really apparent to everyone else.

The way I start the mapd server is systemctl start mapd_server and of course systemctl doesn’t recognize the parameter --enable-watchdog=false.



Parameters like enable-watchdog=false should be set in your mapd.conf file.

More details are available here


What's the use of scan_limit ?

Excellent. That worked. I had read and searched the linked document before, but it didn’t include enable-watchdog as a parameter. I should have tried it anyways. Thank you again.