Programming for beginners: cassandra

Showing posts with label cassandra. Show all posts

Thursday, 5 March 2020

Cassandra: Drop user defined type

‘DROP TYPE’ keyword is used to drop user defined type.

Syntax:

<drop-type-stmt> ::= DROP TYPE ( IF EXISTS )? <typename>

Example

DROP TYPE address;

cqlsh> CREATE TYPE cassandratutorial.address (
   ... street_name text,
   ...     street_number int,
   ...     city text,
   ...     state text,
   ...     country text,
   ...     zip int
   ... );
cqlsh> 
cqlsh> DESCRIBE TYPE cassandratutorial.address;

CREATE TYPE cassandratutorial.address (
    street_name text,
    street_number int,
    city text,
    state text,
    country text,
    zip int
);
cqlsh> 
cqlsh> DROP TYPE cassandratutorial.address;
cqlsh> 
cqlsh> DESCRIBE cassandratutorial.address;

'address' not found in keyspace 'cassandratutorial'

Previous Next Home

Cassandra: Alter user defined type

You can alter user define type using ‘ALTER TYPE’ statement.

Syntax

<alter-type-stmt> ::= ALTER TYPE <typename> <instruction>

<instruction> ::= ADD <field-name> <type>
                | RENAME <field-name> TO <field-name> ( AND <field-name> TO <field-name> )*

Example

ALTER TYPE address ADD country text

ALTER TYPE address RENAME zip TO zipcode AND street_name TO street

CQL Console Output

cqlsh> CREATE KEYSPACE cassandratutorial WITH REPLICATION = 
   ... { 
   ...   'class' : 'SimpleStrategy', 
   ...   'replication_factor' : 1 
   ... };
cqlsh> 
cqlsh> CREATE TYPE cassandratutorial.address (
   ... street_name text,
   ...     street_number int,
   ...     city text,
   ...     state text,
   ...     country text,
   ...     zip int
   ... );
cqlsh> 
cqlsh> DESCRIBE TYPE cassandratutorial.address;

CREATE TYPE cassandratutorial.address (
    street_name text,
    street_number int,
    city text,
    state text,
    country text,
    zip int
);
cqlsh> 
cqlsh> ALTER TYPE cassandratutorial.address ADD continent text;
cqlsh> 
cqlsh> DESCRIBE TYPE cassandratutorial.address;

CREATE TYPE cassandratutorial.address (
    street_name text,
    street_number int,
    city text,
    state text,
    country text,
    zip int,
    continent text
);

Previous Next Home

Cassandra: Working with user defined types

In this tutorial series, I am going to explain how to work with user defined types.

You are going to learn,

a. How to create user defined type

b. How to use user defined type

c. How to alter user defined type

d. How to drop user defined type

How to create user defined type?

Using 'CREATE TYPE' statement, you can create user defined types.

Syntax

<create-type-stmt> ::= CREATE TYPE ( IF NOT EXISTS )? <typename>
                         '(' <field-definition> ( ',' <field-definition> )* ')'

<typename> ::= ( <keyspace-name> '.' )? <identifier>

<field-definition> ::= <identifier> <type>

Example

CREATE TYPE cassandratutorial.address (
 street_name text,
    street_number int,
    city text,
    state text,
    country text,
    zip int
)

CQL Script

CREATE KEYSPACE cassandratutorial WITH REPLICATION = 
{ 
  'class' : 'SimpleStrategy', 
  'replication_factor' : 1 
};

CREATE TYPE cassandratutorial.address (
 street_name text,
    street_number int,
    city text,
    state text,
    country text,
    zip int
);

CREATE TABLE IF NOT EXISTS cassandratutorial.employee (
  id INT PRIMARY KEY, 
  firstName VARCHAR,
  lastName VARCHAR,
  addressInfo address,
  age int
);

INSERT INTO cassandratutorial.employee JSON '{"id" : 1, "firstName" : "Krishna", "lastName" : "Gurram", "age" : 30, "addressInfo" : {"street_name" : "Marthalli", "street_number" : 123, "city" : "Bangalore", "state" : "Karnataka", "country" : "India", "zip" : 560037}}';

Cqlsh console output

cqlsh> CREATE KEYSPACE cassandratutorial WITH REPLICATION = 
   ... { 
   ...   'class' : 'SimpleStrategy', 
   ...   'replication_factor' : 1 
   ... };
cqlsh> 
cqlsh> CREATE TYPE cassandratutorial.address (
   ... street_name text,
   ...     street_number int,
   ...     city text,
   ...     state text,
   ...     country text,
   ...     zip int
   ... );
cqlsh> 
cqlsh> CREATE TABLE IF NOT EXISTS cassandratutorial.employee (
   ...   id INT PRIMARY KEY, 
   ...   firstName VARCHAR,
   ...   lastName VARCHAR,
   ...   addressInfo address,
   ...   age int
   ... );
cqlsh> 
cqlsh> INSERT INTO cassandratutorial.employee JSON '{"id" : 1, "firstName" : "Krishna", "lastName" : "Gurram", "age" : 30, "addressInfo" : {"street_name" : "Marthalli", "street_number" : 123, "city" : "Bangalore", "state" : "Karnataka", "country" : "India", "zip" : 560037}}';
cqlsh> 
cqlsh> SELECT * FROM cassandratutorial.employee;

 id | addressinfo                                                                                                          | age | firstname | lastname
----+----------------------------------------------------------------------------------------------------------------------+-----+-----------+----------
  1 | {street_name: 'Marthalli', street_number: 123, city: 'Bangalore', state: 'Karnataka', country: 'India', zip: 560037} |  30 |   Krishna |   Gurram

(1 rows)

Previous Next Home

Cassandra: Drop Secondary Index

The DROP INDEX statement is used to drop an existing secondary index

Syntax

<drop-index-stmt> ::= DROP INDEX ( IF EXISTS )? ( <keyspace> '.' )? <identifier>

Example

DROP INDEX userIndex;

DROP INDEX userkeyspace.address_index;

Previous Next Home

Cassandra: Secondary Index

In Cassandra, you can able to query on columns that are part of primary key or columns that has secondary index.

Let’s create a KEYSPACE cassandratutorial.

CREATE KEYSPACE cassandratutorial WITH REPLICATION = 
{ 
  'class' : 'SimpleStrategy', 
  'replication_factor' : 1 
};

Let’s create a table manager in cassandratutorial KEYSPACE.

CREATE TABLE IF NOT EXISTS cassandratutorial.manager (
  id INT,
  departmentId INT,
  countryCode VARCHAR,
  firstName VARCHAR,
  lastName VARCHAR,
  age int,
  PRIMARY KEY(countryCode, departmentId, id)
) WITH CLUSTERING ORDER BY (departmentId DESC, id ASC);

Insert data to manager table, by executing below statements.

INSERT INTO cassandratutorial.manager (id, departmentId, countryCode, firstName, lastName, age) VALUES (1, 1, 'IN', 'Krishna', 'Gurram', 35);
INSERT INTO cassandratutorial.manager (id, departmentId, countryCode, firstName, lastName, age) VALUES (1, 2, 'IN', 'Krishna', 'Gurram', 35);
INSERT INTO cassandratutorial.manager (id, departmentId, countryCode, firstName, lastName, age) VALUES (1, 3, 'AU', 'Ram', 'Ponnam', 39);
INSERT INTO cassandratutorial.manager (id, departmentId, countryCode, firstName, lastName, age) VALUES (2, 1, 'US', 'Sushmita', 'Sen', 41);
INSERT INTO cassandratutorial.manager (id, departmentId, countryCode, firstName, lastName, age) VALUES (3, 2, 'IN', 'Sunil', 'Nanda', 29);
INSERT INTO cassandratutorial.manager (id, departmentId, countryCode, firstName, lastName, age) VALUES (4, 3, 'AU', 'Prapuran', 'dam', 36);
INSERT INTO cassandratutorial.manager (id, departmentId, countryCode, firstName, lastName, age) VALUES (5, 1, 'US', 'Prathuk', 'Kumar', 54);

cqlsh> SELECT * FROM cassandratutorial.manager;

 countrycode | departmentid | id | age | firstname | lastname
-------------+--------------+----+-----+-----------+----------
          IN |            2 |  1 |  35 |   Krishna |   Gurram
          IN |            2 |  3 |  29 |     Sunil |    Nanda
          IN |            1 |  1 |  35 |   Krishna |   Gurram
          AU |            3 |  1 |  39 |       Ram |   Ponnam
          AU |            3 |  4 |  36 |  Prapuran |      dam
          US |            1 |  2 |  41 |  Sushmita |      Sen
          US |            1 |  5 |  54 |   Prathuk |    Kumar

(7 rows)

PRIMARY KEY(countryCode, departmentId, id)

Since I specified countryCode, departmentId, id as primary key, I can query with below combinations.

a. countryCode

b. countryCode AND departmentId

c. countryCode AND departmentId AND id

cqlsh> SELECT * FROM cassandratutorial.manager WHERE countryCode='US';

 countrycode | departmentid | id | age | firstname | lastname
-------------+--------------+----+-----+-----------+----------
          US |            1 |  2 |  41 |  Sushmita |      Sen
          US |            1 |  5 |  54 |   Prathuk |    Kumar

(2 rows)
cqlsh> SELECT * FROM cassandratutorial.manager WHERE countryCode='US' AND departmentId=1;

 countrycode | departmentid | id | age | firstname | lastname
-------------+--------------+----+-----+-----------+----------
          US |            1 |  2 |  41 |  Sushmita |      Sen
          US |            1 |  5 |  54 |   Prathuk |    Kumar

(2 rows)
cqlsh> SELECT * FROM cassandratutorial.manager WHERE countryCode='US' AND departmentId=1 AND id=2;

 countrycode | departmentid | id | age | firstname | lastname
-------------+--------------+----+-----+-----------+----------
          US |            1 |  2 |  41 |  Sushmita |      Sen

(1 rows)

SELECT * FROM cassandratutorial.manager WHERE departmentId=1;

When you try to execute above query, you will end up in below error.

InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"

cqlsh> SELECT * FROM cassandratutorial.manager WHERE departmentId=1;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"

Even though ‘departmentId’ is part of the primary key, Cassandra is unable to query in this field. Just read the error message, it is saying that, to execute this query Cassandra require additional filtering of the data and may not be able to execute the query in efficient way.

To execute this query add ‘ALLOW FILTERING’ clause at the end of query.

cqlsh> SELECT * FROM cassandratutorial.manager WHERE departmentId=1 ALLOW FILTERING;

 countrycode | departmentid | id | age | firstname | lastname
-------------+--------------+----+-----+-----------+----------
          IN |            1 |  1 |  35 |   Krishna |   Gurram
          US |            1 |  2 |  41 |  Sushmita |      Sen
          US |            1 |  5 |  54 |   Prathuk |    Kumar

(3 rows)

You can use ‘ALLOW FILTERING’ clause to query on columns that are not part of PRIMARY KEY.

cqlsh> SELECT * FROM cassandratutorial.manager WHERE age=35 ALLOW FILTERING;

 countrycode | departmentid | id | age | firstname | lastname
-------------+--------------+----+-----+-----------+----------
          IN |            2 |  1 |  35 |   Krishna |   Gurram
          IN |            1 |  1 |  35 |   Krishna |   Gurram

(2 rows)
cqlsh> SELECT * FROM cassandratutorial.manager WHERE firstName='Ram' ALLOW FILTERING;

 countrycode | departmentid | id | age | firstname | lastname
-------------+--------------+----+-----+-----------+----------
          AU |            3 |  1 |  39 |       Ram |   Ponnam

(1 rows)

To execute these kind of queries(Queries on columns that are not part of primary key), Cassandra should invest lot of computing power. To improve the performance, better to create secondary index on a columns that you want to query on.

Syntax to create secondary index

<create-index-stmt> ::= CREATE ( CUSTOM )? INDEX ( IF NOT EXISTS )? ( <indexname> )?
                            ON <tablename> '(' <index-identifier> ')'
                            ( USING <string> ( WITH OPTIONS = <map-literal> )? )?

<index-identifier> ::= <identifier>
                     | keys( <identifier> )

Example

CREATE INDEX firstNameIndex ON cassandratutorial.manager(firstName);

Above statement create an index on firstName field. Once you created the secondary index, you can run the query without ‘ALLOW FILTERING’ clause.

cqlsh> CREATE INDEX firstNameIndex ON cassandratutorial.manager(firstName);
cqlsh> 
cqlsh> SELECT * FROM cassandratutorial.manager WHERE firstName='Ram';

 countrycode | departmentid | id | age | firstname | lastname
-------------+--------------+----+-----+-----------+----------
          AU |            3 |  1 |  39 |       Ram |   Ponnam

(1 rows)

Previous Next Home

Cassandra: Get the version of Cassandra installed in my system

Approach 1: Execute the command ‘nodetool -h localhost version’.

$ nodetool -h localhost version
ReleaseVersion: 3.11.4

Approach 2: Execute the command cqlsh, you can see the version information in console.

$ cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.4 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh>

Approach 3: Execute the command ‘SHOW version’ from cqlsh prompt, you can get the Cassandra version details.

cqlsh> SHOW version;
[cqlsh 5.0.1 | Cassandra 3.11.4 | CQL spec 3.4.4 | Native protocol v4]

Previous Next Home

Tuesday, 3 March 2020

Cassandra: How the data is written?

In this post, I am going to explain how Cassandra writes data to commit log, memtable and SStable.

When data is written to Cassandra node, it stores the data in an in-memory table called ‘memtable’ and parallelly it writes the same data to the commit log on the disk.

Why to write data to commit log?

Since if you write data to only memtable, then if a node is down because of power failure, all the data will be lost. To make sure of data durability, Cassandra write data to commit log.

Flush the data from memtable?

To make the things faster, Cassandra writes the data from memtable to SSTable on the disk. Flushed sstable files are immutable and no changes can be done. SStables will be merged once it reaches some threshold to reduce read overhead

Why are we writing to commit log, instead of directly writing to SSTable?

SSTable stores the data in sorted order of the rows, whereas commit log stores the data in the order it is processed by Cassandra. So it is always efficient to append the data than placing it in sorting order.

Apart from this, commit log is optimized for writing, writing to commit log is faster than SSTable. Cassandra internally keeps track on what data is written to SStable and truncate commit log according to this.

Previous Next Home

Cassandra: Clear cqlsh console

Execute the command ‘clear’ to clear cqlsh console.

Previous Next Home

Cassandra: tuple types

Tuples are ordered list of attributes with fixed structure. At the time of writing this, a tuple can able to support 32768 fields.

Syntax

<tuple-type> ::= tuple '<' <type> (',' <type>)* '>'

Example

CREATE TABLE IF NOT EXISTS cassandratutorial.employee (

id INT PRIMARY KEY,

name VARCHAR,

address tuple<VARCHAR, VARCHAR, VARCHAR>

);

Insert data into a tuple

You can insert data into a tuple, by placing data in-between ().

INSERT INTO cassandratutorial.employee (id, name, address) VALUES (1, 'Krishna', ('Marthali', 'Bangalore', 'India'));

cqlsh> CREATE TABLE IF NOT EXISTS cassandratutorial.employee (
   ...   id INT PRIMARY KEY, 
   ...   name VARCHAR,
   ...   address tuple<VARCHAR, VARCHAR, VARCHAR>
   ... );
cqlsh> 
cqlsh> INSERT INTO cassandratutorial.employee (id, name, address) VALUES (1, 'Krishna', ('Marthali', 'Bangalore', 'India'));
cqlsh> 
cqlsh> SELECT * FROM cassandratutorial.employee ;

 id | address                            | name
----+------------------------------------+---------
  1 | ('Marthali', 'Bangalore', 'India') | Krishna

(1 rows)

Can I create nested tuples?

Yes

Example

CREATE TABLE nested (k int PRIMARY KEY, t tuple <int, tuple<double, text>>);

Previous Next Home

Cassandra: Working with maps

Map is an object that maps keys to values. A map cannot contain duplicate keys: Each key can map to at most one value.

Roll Number	Name
123	Krishna
124	Ram
125	Anil
126	Kiran

In the above table, Roll number is the key, which maps to student name. I.e, Roll Number 123 maps to the student name Krishna, 126 maps to the student name Kiran.

Map in Cassandra always sorted by the keys.

How to create column of type map?

Using ‘map’ keyword, you can create a column of type map.

CREATE TABLE IF NOT EXISTS cassandratutorial.organization (

id INT PRIMARY KEY,

organization_name VARCHAR,

employee_names MAP<INT, VARCHAR>

);

How to insert data to a map?

Using JSON kind of syntax, you can insert data to a map.

INSERT INTO cassandratutorial.organization (id, organization_name, employee_names) VALUES (1, 'ABC Corporation', {1 : 'Krishna', 5 : 'Ram', 3 : 'Chamu', 4 : 'Sowmya'});

cqlsh> CREATE TABLE IF NOT EXISTS cassandratutorial.organization (
   ...   id INT PRIMARY KEY, 
   ...   organization_name VARCHAR,
   ...   employee_names MAP<INT, VARCHAR>
   ... );
cqlsh> 
cqlsh> INSERT INTO cassandratutorial.organization (id, organization_name, employee_names) VALUES (1, 'ABC Corporation', {1 : 'Krishna', 5 : 'Ram', 3 : 'Chamu', 4 : 'Sowmya'});
cqlsh> 
cqlsh> SELECT * FROM cassandratutorial.organization;

 id | employee_names                                    | organization_name
----+---------------------------------------------------+-------------------
  1 | {1: 'Krishna', 3: 'Chamu', 4: 'Sowmya', 5: 'Ram'} |   ABC Corporation

(1 rows)

Updating key-value pair

Using indexing notation, you can update key-value pair in the map.

UPDATE cassandratutorial.organization SET employee_names[1] = 'Hari Krishna Gurram' WHERE id = 1;

cqlsh> UPDATE cassandratutorial.organization SET employee_names[1] = 'Hari Krishna Gurram' WHERE id = 1;
cqlsh> 
cqlsh> SELECT * FROM cassandratutorial.organization;

 id | employee_names                                                | organization_name
----+---------------------------------------------------------------+-------------------
  1 | {1: 'Hari Krishna Gurram', 3: 'Chamu', 4: 'Sowmya', 5: 'Ram'} |   ABC Corporation

(1 rows)

Adding new key-value pair to the map

Using + operator, you can add new key-value pair to the map.

UPDATE cassandratutorial.organization SET employee_names = employee_names + { 2 : 'Lahari' } WHERE id = 1;

cqlsh> UPDATE cassandratutorial.organization SET employee_names = employee_names +  { 2 : 'Lahari' } WHERE id = 1;
cqlsh> 
cqlsh> SELECT * FROM cassandratutorial.organization;

 id | employee_names                                                             | organization_name
----+----------------------------------------------------------------------------+-------------------
  1 | {1: 'Hari Krishna Gurram', 2: 'Lahari', 3: 'Chamu', 4: 'Sowmya', 5: 'Ram'} |   ABC Corporation

(1 rows)

Deleting a key-value pair

Using DELETE query, you can delete a key-value pair from map.

DELETE employee_names[2] FROM cassandratutorial.organization WHERE id = 1;

cqlsh> DELETE employee_names[2] FROM cassandratutorial.organization WHERE id = 1;
cqlsh> 
cqlsh> SELECT * FROM cassandratutorial.organization;

 id | employee_names                                                | organization_name
----+---------------------------------------------------------------+-------------------
  1 | {1: 'Hari Krishna Gurram', 3: 'Chamu', 4: 'Sowmya', 5: 'Ram'} |   ABC Corporation

(1 rows)

Setting TTL (time to live) property to a key-value pair

You can set TTL in both INSERT and UPDATE statements.

UPDATE cassandratutorial.organization USING TTL 100 SET employee_names[1] = 'Sailaja' WHERE id = 1;

cqlsh> UPDATE cassandratutorial.organization USING TTL 100 SET employee_names[1] = 'Sailaja' WHERE id = 1;
cqlsh> 
cqlsh> SELECT * FROM cassandratutorial.organization;

 id | employee_names                                    | organization_name
----+---------------------------------------------------+-------------------
  1 | {1: 'Sailaja', 3: 'Chamu', 4: 'Sowmya', 5: 'Ram'} |   ABC Corporation

(1 rows)

In the above example I set 100 seconds time for ‘employee_names[1]’.

Previous Next Home

Cassandra: Working with set

Set is a collection of elements, which don't permits duplicate elements.

How to create a column of type set?

Use ‘set’ keyword to create a column of type set.

CREATE TABLE IF NOT EXISTS cassandratutorial.person (
  id INT PRIMARY KEY, 
  firstName VARCHAR,
  lastName VARCHAR,
  hobbies SET<VARCHAR>
);

How to insert elements to set?

By placing the elements in {}, you can insert elements to a set.

INSERT INTO cassandratutorial.person (id, firstName, lastName, hobbies) VALUES (1, 'Hari', 'Krishna', {'football', 'cricket', 'trekking'});

cqlsh> CREATE TABLE IF NOT EXISTS cassandratutorial.person (
   ...   id INT PRIMARY KEY, 
   ...   firstName VARCHAR,
   ...   lastName VARCHAR,
   ...   hobbies SET<VARCHAR>
   ... );
cqlsh> 
cqlsh> INSERT INTO cassandratutorial.person  (id, firstName, lastName, hobbies) VALUES (1, 'Hari', 'Krishna', {'football', 'cricket', 'trekking'});
cqlsh> 
cqlsh> SELECT * FROM cassandratutorial.person;

 id | firstname | hobbies                             | lastname
----+-----------+-------------------------------------+----------
  1 |      Hari | {'cricket', 'football', 'trekking'} |  Krishna

(1 rows)

Add elements to a set

Using + operator, you can add elements to a set.

UPDATE cassandratutorial.person SET hobbies = hobbies + {'tennis', 'blogging'} WHERE id = 1;

cqlsh> UPDATE cassandratutorial.person SET hobbies = hobbies + {'tennis', 'blogging'} WHERE id = 1;
cqlsh> 
cqlsh> SELECT * FROM cassandratutorial.person;

 id | firstname | hobbies                                                   | lastname
----+-----------+-----------------------------------------------------------+----------
  1 |      Hari | {'blogging', 'cricket', 'football', 'tennis', 'trekking'} |  Krishna

(1 rows)

Removing elements from a set

Using – operator, you can remove elements from the set.

cqlsh> UPDATE cassandratutorial.person SET hobbies = hobbies - {'tennis', 'blogging'} WHERE id = 1;
cqlsh> 
cqlsh> SELECT * FROM cassandratutorial.person;

 id | firstname | hobbies                             | lastname
----+-----------+-------------------------------------+----------
  1 |      Hari | {'cricket', 'football', 'trekking'} |  Krishna

(1 rows)

Previous Next Home