Tuesday, 16 February 2016

Julia: length Vs endof functions

length(s) return the number of characters in string s, endof(s) returns the last index of the string s.

What is exact difference?  
You can see actual difference when you are working with Unicode characters whose Unicode point value is > 1 byte. Julia supports all Unicode characters. Unicode characters are specified by using \u (or) \U.


For example,
julia> s = "\u2900 x \U2903 y"
"⤀ x ⤃ y"


All Non-ASCII characters are encoded using UTF-8 encoding. UTF-8 is variable length encoding which uses 8-bit code units. UTF-8 encodes all the Unicode characters using one to four 8-bit bytes. Unicode points with lower numeral number are encoded with fewer bytes.
julia> s = "\u2900x\u2903y"
"⤀x⤃y"

julia> s[1]
'⤀'

julia> s[2]
ERROR: UnicodeError: invalid character index
 in next at /Applications/Julia-0.4.1.app/Contents/Resources/julia/lib/julia/sys.dylib
 in getindex at strings/basic.jl:37

julia> s[3]
ERROR: UnicodeError: invalid character index
 in next at /Applications/Julia-0.4.1.app/Contents/Resources/julia/lib/julia/sys.dylib
 in getindex at strings/basic.jl:37

julia> s[4]
'x'


Observe the above code snippet, s[1] point to the symbol ‘’ which takes three bytes to represent, so s[2] and s[3] are invalid indexes for string s.
julia> s = "\u2900x\u2903y"
"⤀x⤃y"

julia> length(s)
4

julia> endof(s)
8






Previous                                                 Next                                                 Home

No comments:

Post a Comment