Select field, count(*) from table group by field, The result is confuse


I use sql to select my tables, by the result is confuse me.
The table’s count of row is 128571365
Then select condition unit_id = 2238, the count is 369458. It is ok
But I select by group unit_id. The result confuse me .

where i has mistake? the unit_id = 2238 , count is 8590227878, it is bigger then the count of table, why?



Yes, those results don’t look right based on your details.

Would it be possible to try these same SQL queries from the mapdql command line tool, and include all the output here?



I have try this sql use mapdql, The result is same.


Could you please confirm the query is actually

Select unit_id, count(*) unit_count from f_flow where platform = 'door' group by unit_id limit 10;

That there is a * in the count(*)



mapdql> select count() unit_count from f_flow;
mapdql> select unit_id, count() unit_count from f_flow where platform = ‘door’ group by unit_id limit 10;
mapdql> select count(
) unit_count from f_flow where platform = ‘door’ and unit_id = 2238;



What version are you using and what platform are you running on.

Is it possible to get a copy of the dataset, this is occurring on?



I use mapd 3.0. The data has something about my customer. I can’t copy it to you. sorry


Hi @llz, what hardware (particularly number and type of GPUs) are you running on?



Where did the version of the software come from (AWS, build or CE download)

What precise hardware platform are you running on



oeasy@oeasy:~/mapd-3.0.0-render/bin$ nvidia-smi
Mon Jun 5 13:04:24 2017
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| 0 GeForce GTX 108… Off | 0000:02:00.0 Off | N/A |
| 23% 33C P8 9W / 250W | 2810MiB / 11172MiB | 0% Default |
| 1 GeForce GTX 108… Off | 0000:03:00.0 Off | N/A |
| 23% 38C P8 10W / 250W | 2730MiB / 11172MiB | 0% Default |
| 2 GeForce GTX 108… Off | 0000:82:00.0 Off | N/A |
| 23% 31C P8 8W / 250W | 2730MiB / 11172MiB | 0% Default |
| 3 GeForce GTX 108… Off | 0000:83:00.0 Off | N/A |
| 23% 32C P8 9W / 250W | 2730MiB / 11172MiB | 0% Default |

| Processes: GPU Memory |
| GPU PID Type Process name Usage |
| 0 805 C+G …e/oeasy/mapd-3.0.0-render/bin/mapd_server 2808MiB |
| 1 805 C+G …e/oeasy/mapd-3.0.0-render/bin/mapd_server 2728MiB |
| 2 805 C+G …e/oeasy/mapd-3.0.0-render/bin/mapd_server 2728MiB |
| 3 805 C+G …e/oeasy/mapd-3.0.0-render/bin/mapd_server 2728MiB |
The version of software, I download from your website. mapd-ce-latest-Linux-x86_64-render.tar


I have try this sql on other table, the table has only 4000 records. The result is right.



Could you share the schema and some obsfucated data.

What is the OS you are running on? uname -a?



Description: Ubuntu 16.04.2 LTS
Release: 16.04
Codename: xenia

platform TEXT,
uid TEXT,
pid TEXT,
ref_a TEXT,
ref_plat TEXT,
os TEXT,
dt TEXT,
rate TEXT,
sz TEXT,
lg TEXT,
timer TEXT,
url TEXT,
start_time BIGINT,
level_time BIGINT,
ip TEXT,
vid TEXT,
sid TEXT,
u_create_date TEXT,
country TEXT,
area TEXT,
region TEXT,
city TEXT,
county TEXT,
isp TEXT,
s_type TEXT,
s_word TEXT,
version TEXT,
short_timer TEXT,
street TEXT,
community TEXT,
gps_iterms TEXT,
volume INT,
unit_id INT,
door_code TEXT,
signal TEXT,
network TEXT,
open_door INT,
open_status TEXT,
response TEXT,
data_source TEXT,
create_date TEXT



What is the RAM in this box? What CPU/s?

How did you install your nvidia driver, I see it is a newer than then default one I believe for ubuntu LTS from nvidia?

trying to solve the puzzle :slight_smile:



Can you tell us what the min and max are for the unit_id column? Also are there any nulls in that column?


Hi liz

Would it be possible to change your schema so that unit_id is declare as a TEXT field rather than INT.

Then reload and give the query another try?



My computer is 8 Core and RAM is 64G. unit_id has nulls in that column, min=0, max=971102.


mapdql> select uid, start_time from f_flow where platform = ‘door’ order by start_time desc limit 10;
Exception: Query would require a scan without a limit on table(s): f_flow
Can someone help me , what that mean?


Thanks, You advice is right, When i change the schema of unit_id as a TEXT field, It works well.


Btw @llz we’ve replicated the issue you described and should have a fix soon. Thank you for patiently walking us through it!

Encounter an exception: Query would require a scan without a limit on table(s)