Scalable Architecture, Storage and Visualization Approaches for Time Series Analysis Systems - Ed Duarte

Journal Paper

Scalable Architecture, Storage and Visualization Approaches for Time Series Analysis Systems

In order to adapt to the recent phenomenon of exponential growth of time series data sets in both academic and commercial environments, and with the goal of deriving valuable knowledge from this data, a multitude of analysis software tools have been developed to allow groups of collaborating researchers to find and annotate meaningful behavioral patterns. However, these tools commonly lack appropriate mechanisms to handle massive time series data sets of high cardinality, as well as suitable visual encodings for annotated data.

In this paper we conduct a comparative study of architectural, persistence and visualization methods that can enable these analysis tools to scale with a continuously-growing data set and handle intense workloads of concurrent traffic. We implement these approaches within a web platform, integrated with authentication, versioning, and locking mechanisms that prevent overlapping contributions or unsanctioned changes. Additionally, we measure the performance of a set of databases when writing and reading varying amounts of series data points, as well as the performance of different architectural models at scale.

Publication

This research paper was published in Communications in Computer and Information Science book series, volume 1255, pages 59–82. The full text can be purchased on Springer.

To cite this paper, you may use the following BibTex record:

@InProceedings{10.1007/978-3-030-54595-6_4,
author="Duarte, Eduardo and Gomes, Diogo and Campos, David and Aguiar, Rui L.",
editor="Hammoudi, Slimane and Quix, Christoph and Bernardino, Jorge",
title="Scalable Architecture, Storage and Visualization Approaches for Time Series Analysis Systems",
booktitle="Data Management Technologies and Applications",
year="2020",
publisher="Springer International Publishing",
address="Cham",
pages="59--82",
isbn="978-3-030-54595-6"
}

  1. Abadi, D.: Consistency tradeoffs in modern distributed database system design: cap is only part of the story. Computer 45(2), 37–42 (2012).
  2. Bader, A., Kopp, O., Falkenthal, M.: Survey and comparison of open source time series databases. In: Mitschang, B., et al. (eds.) Datenbanksysteme für Business, Technologie und Web (BTW 2017) - Workshopband, pp. 249–268. Gesellschaft für Informatik e.V, Bonn (2017).
  3. Bar-Or, A., Healey, J., Kontothanassis, L., Thong, J.M.V.: Biostream: a system architecture for real-time processing of physiological signals. In: The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, vol. 2, pp. 3101–3104, September 2004.
  4. Bhardwaj, A., et al.: Datahub: Collaborative data science & dataset version management at scale. arXiv preprint arXiv:1409.0798 (2014).
  5. Bhattacherjee, S., Chavan, A., Huang, S., Deshpande, A., Parameswaran, A.G.: Principles of dataset versioning: exploring the recreation/storage tradeoff. CoRR abs/1505.05211 (2015).
  6. Blount, M., et al.: Real-time analysis for intensive care: development and deployment of the artemis analytic system. IEEE Eng. Med. Biol. Mag. 29(2), 110–118 (2010).
  7. Duarte, E., Gomes, D., Campos, D., Aguiar, R.L.: Distributed and scalable platform for collaborative analysis of massive time series data sets. In: Proceedings of the 8th International Conference on Data Science, Technology and Applications - Volume 1: DATA, pp. 41–52. INSTICC, SciTePress (2019).
  8. Ellis, G., Dix, A.: A taxonomy of clutter reduction for information visualisation. IEEE Trans. Visual Comput. Graphics 13(6), 1216–1223 (2007).
  9. Eltabakh, M.Y., Aref, W.G., Elmagarmid, A.K., Ouzzani, M., Silva, Y.N.: Supporting annotations on relations. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2009, pp. 379–390. ACM, New York (2009).
  10. Fielding, R.: Representational state transfer. In: Architectural Styles and the Design of Netowork-based Software Architecture, pp. 76–85 (2000).
  11. Fowler, M.: Event sourcing. Online, Dec p. 18 (2005).
  12. Freedman, M.: Timescaledb vs. influxdb: purpose built differently for time-series data (2019).
  13. Gilbert, S., Lynch, N.: Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News 33(2), 51–59 (2002).
  14. Goldschmidt, T., Jansen, A., Koziolek, H., Doppelhamer, J., Breivold, H.P.: Scalability and robustness of time-series databases for cloud-native monitoring of industrial processes. In: 2014 IEEE 7th International Conference on Cloud Computing, pp. 602–609, June 2014.
  15. Guyet, T., Garbay, C., Dojat, M.: Knowledge construction from time series data using a collaborative exploration system. J. Biomed. Inf. 40(6), 672–687 (2007).
  16. Hadavandi, E., Shavandi, H., Ghanbari, A.: Integration of genetic fuzzy systems and artificial neural networks for stock price forecasting. Knowl.-Based Syst. 23(8), 800–808 (2010).
  17. Hampton, L.: Eye or the tiger: benchmarking cassandra vs. timescaledb for time-series data (2018).
  18. Harger, J.R., Crossno, P.J.: Comparison of open-source visual analytics toolkits, vol. 8294, pp. 8294–8294 - 10 (2012).
  19. Healy, P.D., O’Reilly, R.D., Boylan, G.B., Morrison, J.P.: Web-based remote monitoring of live EEG. In: The 12th IEEE International Conference on e-Health Networking, Applications and Services, pp. 169–174, July 2010.
  20. Healy, P.D., O’Reilly, R.D., Boylan, G.B., Morrison, J.P.: Interactive annotations to support collaborative analysis of streaming physiological data. In: 2011 24th International Symposium on Computer-Based Medical Systems (CBMS), pp. 1–5, June 2011.
  21. Huang, S., Xu, L., Liu, J., Elmore, A.J., Parameswaran, A.G.: Orpheusdb: bolt-on versioning for relational databases. PVLDB 10(10), 1130–1141 (2017).
  22. Jensen, S.K., Pedersen, T.B., Thomsen, C.: Time series management systems: a survey. IEEE Trans. Knowl. Data Eng. 29(11), 2581–2600 (2017).
  23. Kalakanti, A.K., Sudhakaran, V., Raveendran, V., Menon, N.: A comprehensive evaluation of NOSQL datastores in the context of historians and sensor data analysis. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 1797–1806, October 2015.
  24. Kalogeropoulos, D.A., Carson, E.R., Collinson, P.O.: Towards knowledge-based systems in clinical practice: Development of an integrated clinical information and knowledge management support system. Comput. Methods Programs Biomed. 72(1), 65–80 (2003).
  25. Kamburugamuve, S., Wickramasinghe, P., Ekanayake, S., Wimalasena, C., Pathirage, M., Fox, G.C.: Tsmap3d: browser visualization of high dimensional time series data. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 3583–3592 (2016).
  26. Keraron, Y., Bernard, A., Bachimont, B.: Annotations to improve the using and the updating of digital technical publications. Res. Eng. Design 20, 157–170 (2009).
  27. Kiefer, R. (2017). Timescaledb vs. postgres for time-series: 20x higher inserts, 2000x faster deletes, 1.2x-14,000x faster queries.
  28. Kreps, J.: The log: what every software engineer should know about real-time data’s unifying abstraction (2013).
  29. Mathe, Z., Haen, C., Stagni, F.: Monitoring performance of a highly distributed and complex computing infrastructure in LHCB. In: Journal of Physics: Conference Series, vol. 898, p. 092028. IOP Publishing (2017).
  30. Momjian, B.: Mvcc unmasked (2018).
  31. O’Neil, P., Cheng, E., Gawlick, D., O’Neil, E.: The log-structured merge-tree (LSM-tree). Acta Informatica 33(4), 351–385 (1996).
  32. O’Reilly, R.D.: A distributed architecture for the monitoring and analysis of time series data (2015).
  33. Pressly, Jr., W.B.S.: TSPAD: a tablet-pc based application for annotation and collaboration on time series data. In: Proceedings of the 46th Annual Southeast Regional Conference on XX, ACM-SE 46, pp. 527–528. ACM, New York (2008).
  34. Provos, N., Mazieres, D.: A future-adaptable password scheme (1999).
  35. Pungilă, C., Fortiş, T.F., Aritoni, O.: Benchmarking database systems for the requirements of sensor readings. IETE Tech. Rev. 26(5), 342–349 (2009).
  36. van Renesse, R., Schneider, F.B.: Chain replication for supporting high throughput and availability. In: Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation - Volume 6, OSDI 2004, p.7. USENIX Association, Berkeley(2004).
  37. Snodgrass, R.T.: Temporal databases. In: Frank, A.U., Campari, I., Formentini, U. (eds.) GIS 1992. LNCS, vol. 639, pp. 22–64. Springer, Heidelberg (1992).
  38. Sow, D., Biem, A., Blount, M., Ebling, M., Verscheure, O.: Body sensor data processing using stream computing. In: Proceedings of the International Conference on Multimedia Information Retrieval, MIR 2010, pp. 449–458, ACM, New York (2010).

More posts in 2018-2020 time series analysis and annotation system

More posts in publications