In Cassandra, you can able to query on columns that are part of primary key or columns that has secondary index.
Let’s create a KEYSPACE cassandratutorial.
CREATE KEYSPACE cassandratutorial WITH REPLICATION =
{
'class' : 'SimpleStrategy',
'replication_factor' : 1
};
Let’s create a table manager in cassandratutorial KEYSPACE.
CREATE TABLE IF NOT EXISTS cassandratutorial.manager (
id INT,
departmentId INT,
countryCode VARCHAR,
firstName VARCHAR,
lastName VARCHAR,
age int,
PRIMARY KEY(countryCode, departmentId, id)
) WITH CLUSTERING ORDER BY (departmentId DESC, id ASC);
Insert data to manager table, by executing below statements.
INSERT INTO cassandratutorial.manager (id, departmentId, countryCode, firstName, lastName, age) VALUES (1, 1, 'IN', 'Krishna', 'Gurram', 35);
INSERT INTO cassandratutorial.manager (id, departmentId, countryCode, firstName, lastName, age) VALUES (1, 2, 'IN', 'Krishna', 'Gurram', 35);
INSERT INTO cassandratutorial.manager (id, departmentId, countryCode, firstName, lastName, age) VALUES (1, 3, 'AU', 'Ram', 'Ponnam', 39);
INSERT INTO cassandratutorial.manager (id, departmentId, countryCode, firstName, lastName, age) VALUES (2, 1, 'US', 'Sushmita', 'Sen', 41);
INSERT INTO cassandratutorial.manager (id, departmentId, countryCode, firstName, lastName, age) VALUES (3, 2, 'IN', 'Sunil', 'Nanda', 29);
INSERT INTO cassandratutorial.manager (id, departmentId, countryCode, firstName, lastName, age) VALUES (4, 3, 'AU', 'Prapuran', 'dam', 36);
INSERT INTO cassandratutorial.manager (id, departmentId, countryCode, firstName, lastName, age) VALUES (5, 1, 'US', 'Prathuk', 'Kumar', 54);
Get all the records
cqlsh> SELECT * FROM cassandratutorial.manager;
countrycode | departmentid | id | age | firstname | lastname
-------------+--------------+----+-----+-----------+----------
IN | 2 | 1 | 35 | Krishna | Gurram
IN | 2 | 3 | 29 | Sunil | Nanda
IN | 1 | 1 | 35 | Krishna | Gurram
AU | 3 | 1 | 39 | Ram | Ponnam
AU | 3 | 4 | 36 | Prapuran | dam
US | 1 | 2 | 41 | Sushmita | Sen
US | 1 | 5 | 54 | Prathuk | Kumar
(7 rows)
PRIMARY KEY(countryCode, departmentId, id)
Since I specified countryCode, departmentId, id as primary key, I can query with below combinations.
a. countryCode
b. countryCode AND departmentId
c. countryCode AND departmentId AND id
cqlsh> SELECT * FROM cassandratutorial.manager WHERE countryCode='US';
countrycode | departmentid | id | age | firstname | lastname
-------------+--------------+----+-----+-----------+----------
US | 1 | 2 | 41 | Sushmita | Sen
US | 1 | 5 | 54 | Prathuk | Kumar
(2 rows)
cqlsh> SELECT * FROM cassandratutorial.manager WHERE countryCode='US' AND departmentId=1;
countrycode | departmentid | id | age | firstname | lastname
-------------+--------------+----+-----+-----------+----------
US | 1 | 2 | 41 | Sushmita | Sen
US | 1 | 5 | 54 | Prathuk | Kumar
(2 rows)
cqlsh> SELECT * FROM cassandratutorial.manager WHERE countryCode='US' AND departmentId=1 AND id=2;
countrycode | departmentid | id | age | firstname | lastname
-------------+--------------+----+-----+-----------+----------
US | 1 | 2 | 41 | Sushmita | Sen
(1 rows)
SELECT * FROM cassandratutorial.manager WHERE departmentId=1;
When you try to execute above query, you will end up in below error.
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"
cqlsh> SELECT * FROM cassandratutorial.manager WHERE departmentId=1;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"
Even though ‘departmentId’ is part of the primary key, Cassandra is unable to query on this field. Just read the error message, it is saying that, to execute this query Cassandra require additional filtering of the data and may not be able to execute the query in efficient way.
To execute this query add ‘ALLOW FILTERING’ clause at the end of query.
cqlsh> SELECT * FROM cassandratutorial.manager WHERE departmentId=1 ALLOW FILTERING;
countrycode | departmentid | id | age | firstname | lastname
-------------+--------------+----+-----+-----------+----------
IN | 1 | 1 | 35 | Krishna | Gurram
US | 1 | 2 | 41 | Sushmita | Sen
US | 1 | 5 | 54 | Prathuk | Kumar
(3 rows)
You can also use ‘ALLOW FILTERING’ clause to query on columns that are not part of PRIMARY KEY.
cqlsh> SELECT * FROM cassandratutorial.manager WHERE age=35 ALLOW FILTERING;
countrycode | departmentid | id | age | firstname | lastname
-------------+--------------+----+-----+-----------+----------
IN | 2 | 1 | 35 | Krishna | Gurram
IN | 1 | 1 | 35 | Krishna | Gurram
(2 rows)
cqlsh> SELECT * FROM cassandratutorial.manager WHERE firstName='Ram' ALLOW FILTERING;
countrycode | departmentid | id | age | firstname | lastname
-------------+--------------+----+-----+-----------+----------
AU | 3 | 1 | 39 | Ram | Ponnam
(1 rows)
To execute these kind of queries(Queries on columns that are not part of primary key), Cassandra should invest lot of computing power. To improve the performance, better to create secondary index on a columns that you want to query on.
Syntax to create secondary index
<create-index-stmt> ::= CREATE ( CUSTOM )? INDEX ( IF NOT EXISTS )? ( <indexname> )?
ON <tablename> '(' <index-identifier> ')'
( USING <string> ( WITH OPTIONS = <map-literal> )? )?
<index-identifier> ::= <identifier>
| keys( <identifier> )
Example
CREATE INDEX firstNameIndex ON cassandratutorial.manager(firstName);
Above statement create an index on firstName field. Once you created the secondary index, you can run the query without ‘ALLOW FILTERING’ clause.
cqlsh> CREATE INDEX firstNameIndex ON cassandratutorial.manager(firstName);
cqlsh>
cqlsh> SELECT * FROM cassandratutorial.manager WHERE firstName='Ram';
countrycode | departmentid | id | age | firstname | lastname
-------------+--------------+----+-----+-----------+----------
AU | 3 | 1 | 39 | Ram | Ponnam
(1 rows)
No comments:
Post a Comment