Indexes
Indexes
Because indexes are implemented as tables in Cloud Spanner, you’ll encounter the same issues with the indexed columns as you did with the table keys: An index on a column with poorly distributed values (such as a timestamp) will lead to the creation of a hotspot, even if the underlying table is using well-distributed keys.
Example of index that will hotspot and how to address it
CREATE TABLE Events (
UserId String(MAX),
Timestamp TIMESTAMP,
EventData)
PRIMARY KEY (UserId, Timestamp DESC);
CREATE INDEX EventsByTimestamp ON Events (Timestamp DESC);
- Add a synthetic shard key
TimestampShardId = CRC32(Timestamp) % 100
CREATE TABLE Events (
UserId String(MAX),
Timestamp TIMESTAMP,
TimestampShardId INT64,
EventData)
PRIMARY KEY (UserId, Timestamp DESC);
CREATE INDEX EventsByTimestamp ON Events (TimestampShardId,Timestamp);
Queries have to be modified now to factor in the shard IDs otherwise index won’t be used and it will be a table scan
Select * from Events@{FORCE_INDEX=EventsByTimestamp}
WHERE
TimestampShardId BETWEEN 0 AND 99
AND Timestamp > @lower_bound
AND Timestamp < @upper_bound;
NULL_FILTERED index
- Ignore NULL columns from an index
CREATE TABLE Employees (
CompanyUUID INT64,
EmployeeUUID INT64,
FullName STRING(MAX)
...
) PRIMARY KEY (CompanyUUID,EmployeeUUID)
CREATE INDEX EmployeesById
ON Employees (EmployeeUUID)
STORING (FullName);
Storing INDEX
- A form of Covering Index allowing for index only lookups
CREATE INDEX EmployeesById
ON Employees (EmployeeUUID)
STORING (FullName);
Force INDEX use
Select *
from [email protected]{FORCE_INDEX=EmployeesById}
Where EmployeeUUID=xxx;