Ternary Search Trie (TST) is an advanced data structure to store simple key-value pairs that lends itself to really fast searches. TST typicall works well when:
- Search results are significantly less in number compared to total size of the data store.
- Keys are of String type
- For example, TST is a very effective solution when searching 1 phone record from a million or when searching 10 phone records based on a prefix match (say first 6 digits of the phone number)
TST does become slow when result set is huge since the response time is directly dependent on number of tree nodes matching the search filter.
More on TST – http://en.wikipedia.org/wiki/Ternary_search_tree
In this blog post, I have a response time comparison between TST and HashMap. Source code used for this comparison is available at – https://sourceforge.net/p/noreo/code/HEAD/tree/quickscan/
How to setup the code
- Import the code as Eclipse project
- Since we need at least million records to test this code effectively, you can download the same from
- Modify the csv file to include only 7 columns as mentioned in com.ivy.datastructures.simpledb.domain.Geoname
- Modify input file path in com.ivy.datastructures.simpledb.utils.TestFileHelper
Executing the test
To execute the test, simply run the 2 Junits in TestFileHelper. Both TST and HashMap versions will be executed on the same dataset and for the same search filter. Results of the different search patterns are actually interesting.
Search response times
With a relatively good computer at my disposal, the above tests with various inputs (resulting low to high number of matches) gives some interesting insights into the data store response time.
As you can clearly see, TST performance was significantly better than HashMap till the search filter resulted in a smaller set of matches. As the number of matching records grew, HashMap outperformed TST.
For a very good number of cases where you would want to search key-value based data, TST gives a better performance. Also, the fact that is supports prefix based searches makes it an even more efficient data structure.