04-14-2021 02:42 PM
The data table should have a separate column for Time Stamp.
What you said is good for query by date, but still an issue for other queries.
04-14-2021 03:39 PM
Cardinal rule of relational databases - data should be fully decomposed at the table level. Any time I see a text field holding a datum which should be properly stored in one of the direct datatypes provided - an integer, a float, a boolean, or in the present case, a point in time - I cringe at the improper design decision made. Doubly so if the text field encodes more than one item of data.
Likely in your present situation, Meng, the server must do a table scan of every record and string-process that field to match it to the LIKE clause, regardless of the fact that an index is maintained on that field (since you said it was also the primary key).
I hold Dr. Powell in great respect since he's been active in these forums (and LAVA, too) for years, but I'll disagree here - once the "date" is converted to text, be it MMDDYYYY or YYYYMMDD, it's just text, and unlikely to be appreciably faster to scan either way. Any datetime field (there are several types), on the other hand, is a fully ordered domain and can be range-searched very efficiently.
By the way, there is also no real penalty for defining a primary key as a composite key - meaning the uniqueness of records is through some combination of fields taken together. I'll never understand why folks either a) use an autoindex, which means there's no hope that the rest of the "real" fields won't contain duplication, or b) meld separate fields into a text mashup.
</end soapbox mode>
Dave
04-15-2021 12:46 AM
You can use the Lab SQL toolkit for database operations.
04-15-2021 03:27 AM
Because YYYYMMDD is time ordered, it can benefit from the same range searchability, and so would have stopped the full-table scan that is the primary problem here. Even better would be true timestamp column with a datetime data type, as you say.
BTW, I'm pretty sure the OPs problem is solvable with a new time-sorted index or lookup table.
04-16-2021 03:04 PM - edited 04-16-2021 03:13 PM
@MengHuiHanTang wrote:
We have a data table that is growing every day, roughly 40M records as of today.
it causes some of queries to run slower and slower.
One of the query is as follows
select distinct ScanID
from tblClassifierResults
where ClassifierID LIKE 'CNTR%04052021%'
'ClassifierID' is the primary key of the table, so there is an automatic index for it.
The index may be usable for the CNTR-part of your string, but from % and out your index is useless. And if all or many ClassifierID start with CNTR, then the index will not be used at all.
Generally, indexes on text data can only be used with text which is matching start of the string. Indexes can't be used to search for part of text (03052021) located in the middle of the stored text.
If your where clause was " ClassifierID LIKE 'CNTR04052021%' ", it may be your database server could use the index.
Some databases has fulltext search, but i do not think your ClassifierID is suitable for that.
04-16-2021 04:18 PM
Thanks for the explanation.
The database was created 10+ years ago, so not anticipated so much data saved. Lesson learned.
04-17-2021 03:32 AM
@DavidBoyd wrote:
By the way, there is also no real penalty for defining a primary key as a composite key - meaning the uniqueness of records is through some combination of fields taken together. I'll never understand why folks either a) use an autoindex, which means there's no hope that the rest of the "real" fields won't contain duplication,
I do completely agree with your other statements, but i have some comments and questions for this one even if it may be off-topic
First, even if you have a specific ID column as primary key there is nothing which prevent having a unique index on the columns which defines the record uniquely.
A primary key is often used as a foreign key in other tables. If your primary key consist of three columns which is used as foreign keys in ten tables you need to have a copy of these three columns in ten tables. Is this practical or have i misunderstood anything?
For instance, would you use first name, last name, address as primary key in a members list? And if the member changes address you need to update all other tables (even if is made easier by letting the database do this automatically.
Did you really mean autoindex, or should it be autonumber?
04-19-2021 09:49 AM
RolfO,
Yes, thanks, I was really referring to the autonumber field and misspoke. I agree with you that it is completely reasonable to make a design decision in which an autonumber is the primary key, as long as other field(s) can be defined as having a unique index, which meets the requirement that records be fundamentally unique. Unfortunately I've run into too many examples where the autonumber is used as the knee-jerk answer to establishing the "must have a PK" with no real thought as to record uniqueness - at this point the database table is little more than a spreadsheet.
I have plenty of parent-child (one-to-many) table structures in which the PK is comprised of multiple fields AND those fields appear in the child tables as FK. I don't see this as a problem or a performance issue - I've always understood that SQL Server is efficiently modeling this in storage, and I generally permit cascading updates and cascading deletes in these relationships. In my line of work, I'm rarely, if ever, deleting records or modifying their PK fields anyway.
Because of this, I think autonumber is frequently a crutch that is improperly applied.
Just my thoughts/opinions here.
Dave