CREATE TABLE IF NOT EXISTS cassandratutorial.manager (
id INT,
departmentId INT,
countryCode VARCHAR,
firstName VARCHAR,
lastName VARCHAR,
age int,
PRIMARY KEY(countryCode, departmentId, id)
) WITH CLUSTERING ORDER BY (departmentId DESC, id ASC);
Insert data to manager table, by executing below statements.
INSERT INTO cassandratutorial.manager (id, departmentId, countryCode, firstName, lastName, age) VALUES (1, 1, 'IN', 'Krishna', 'Gurram', 35);
INSERT INTO cassandratutorial.manager (id, departmentId, countryCode, firstName, lastName, age) VALUES (1, 2, 'IN', 'Krishna', 'Gurram', 35);
INSERT INTO cassandratutorial.manager (id, departmentId, countryCode, firstName, lastName, age) VALUES (1, 3, 'AU', 'Ram', 'Ponnam', 39);
INSERT INTO cassandratutorial.manager (id, departmentId, countryCode, firstName, lastName, age) VALUES (2, 1, 'US', 'Sushmita', 'Sen', 41);
INSERT INTO cassandratutorial.manager (id, departmentId, countryCode, firstName, lastName, age) VALUES (3, 2, 'IN', 'Sunil', 'Nanda', 29);
INSERT INTO cassandratutorial.manager (id, departmentId, countryCode, firstName, lastName, age) VALUES (4, 3, 'AU', 'Prapuran', 'dam', 36);
INSERT INTO cassandratutorial.manager (id, departmentId, countryCode, firstName, lastName, age) VALUES (5, 1, 'US', 'Prathuk', 'Kumar', 54);
cqlsh> SELECT * FROM cassandratutorial.manager;
countrycode | departmentid | id | age | firstname | lastname
IN | 2 | 1 | 35 | Krishna | Gurram
IN | 2 | 3 | 29 | Sunil | Nanda
IN | 1 | 1 | 35 | Krishna | Gurram
AU | 3 | 1 | 39 | Ram | Ponnam
AU | 3 | 4 | 36 | Prapuran | dam
US | 1 | 2 | 41 | Sushmita | Sen
US | 1 | 5 | 54 | Prathuk | Kumar
(7 rows)
PRIMARY KEY(countryCode, departmentId, id)
Since I specified countryCode, departmentId, id as primary key, I can query with below combinations.
a. countryCode
b. countryCode AND departmentId
c. countryCode AND departmentId AND id
cqlsh> SELECT * FROM cassandratutorial.manager WHERE countryCode='US';
countrycode | departmentid | id | age | firstname | lastname
US | 1 | 2 | 41 | Sushmita | Sen
US | 1 | 5 | 54 | Prathuk | Kumar
(2 rows)
cqlsh> SELECT * FROM cassandratutorial.manager WHERE countryCode='US' AND departmentId=1;
countrycode | departmentid | id | age | firstname | lastname
US | 1 | 2 | 41 | Sushmita | Sen
US | 1 | 5 | 54 | Prathuk | Kumar
(2 rows)
cqlsh> SELECT * FROM cassandratutorial.manager WHERE countryCode='US' AND departmentId=1 AND id=2;
countrycode | departmentid | id | age | firstname | lastname
US | 1 | 2 | 41 | Sushmita | Sen
(1 rows)
SELECT * FROM cassandratutorial.manager WHERE departmentId=1;
When you try to execute above query, you will end up in below error.
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"
cqlsh> SELECT * FROM cassandratutorial.manager WHERE departmentId=1;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"
Even though ‘departmentId’ is part of the primary key, Cassandra is unable to query in this field. Just read the error message, it is saying that, to execute this query Cassandra require additional filtering of the data and may not be able to execute the query in efficient way.
To execute this query add ‘ALLOW FILTERING’ clause at the end of query.
cqlsh> SELECT * FROM cassandratutorial.manager WHERE departmentId=1 ALLOW FILTERING;
countrycode | departmentid | id | age | firstname | lastname
IN | 1 | 1 | 35 | Krishna | Gurram
US | 1 | 2 | 41 | Sushmita | Sen
US | 1 | 5 | 54 | Prathuk | Kumar
(3 rows)
You can use ‘ALLOW FILTERING’ clause to query on columns that are not part of PRIMARY KEY.
cqlsh> SELECT * FROM cassandratutorial.manager WHERE age=35 ALLOW FILTERING;
countrycode | departmentid | id | age | firstname | lastname
IN | 2 | 1 | 35 | Krishna | Gurram
IN | 1 | 1 | 35 | Krishna | Gurram
(2 rows)
cqlsh> SELECT * FROM cassandratutorial.manager WHERE firstName='Ram' ALLOW FILTERING;
countrycode | departmentid | id | age | firstname | lastname
AU | 3 | 1 | 39 | Ram | Ponnam
(1 rows)
To execute these kind of queries(Queries on columns that are not part of primary key), Cassandra should invest lot of computing power. To improve the performance, better to create secondary index on a columns that you want to query on.
Syntax to create secondary index
<create-index-stmt> ::= CREATE ( CUSTOM )? INDEX ( IF NOT EXISTS )? ( <indexname> )?
ON <tablename> '(' <index-identifier> ')'
( USING <string> ( WITH OPTIONS = <map-literal> )? )?
<index-identifier> ::= <identifier>
| keys( <identifier> )
Example
CREATE INDEX firstNameIndex ON cassandratutorial.manager(firstName);
Above statement create an index on firstName field. Once you created the secondary index, you can run the query without ‘ALLOW FILTERING’ clause.
cqlsh> CREATE INDEX firstNameIndex ON cassandratutorial.manager(firstName);
cqlsh>
cqlsh> SELECT * FROM cassandratutorial.manager WHERE firstName='Ram';
countrycode | departmentid | id | age | firstname | lastname
AU | 3 | 1 | 39 | Ram | Ponnam
(1 rows)