Figure 1 - uploaded by Muqiao Yang
Content may be subject to copyright.
Customer behavior tracking and mining.

Customer behavior tracking and mining.

Source publication
Preprint
Full-text available
Real-world graphs often contain spatio-temporal information and evolve over time. Compared with static graphs, spatio-temporal graphs have very different characteristics, presenting more significant challenges in data volume, data velocity, and query processing. In this paper, we describe three representative applications to understand the features...

Contexts in source publication

Context 1
... track users' browsing behaviors to achieve personalized recommendations. In a customer behavior tracking and mining application, people (customers) and locations can be modeled as graph vertices, while an edge linking a person vertex to a location vertex represents the event that the person visits the location at a certain time, as shown in Fig.fig. 1. This forms a spatio-temporal graph. People visiting similar locations at similar timestamps often have similar personal interests. In other words, it is desirable to discover groups of people vertices that have similar edge structures in the spatio-temporal graphs. Application 2: Clone-Plate Car Detection. A clone-plate car displays a ...
Context 2
... is because PAST utilizes columnar layout for all edges in a sub-partition, and compresses the columnar edge properties. JanusGraph consumes more space than Cassandra because it stores each edge twice at both the incoming vertex and the outgoing vertex. PAST and Cassandra take less space to store one replica than Greenplum because of compression. Fig.fig. 10 and Fig.fig. 11 compare the query performance for all systems, while varying the query time range and the threshold values, i.e. T Htime and T H dist . The Y-axis is executing time in the logarithmic scale. We do not run Q3 and Q4 on JanusGraph as it mainly focuses on simple traversal queries and employs Spark for complex queries. ...
Context 3
... utilizes columnar layout for all edges in a sub-partition, and compresses the columnar edge properties. JanusGraph consumes more space than Cassandra because it stores each edge twice at both the incoming vertex and the outgoing vertex. PAST and Cassandra take less space to store one replica than Greenplum because of compression. Fig.fig. 10 and Fig.fig. 11 compare the query performance for all systems, while varying the query time range and the threshold values, i.e. T Htime and T H dist . The Y-axis is executing time in the logarithmic scale. We do not run Q3 and Q4 on JanusGraph as it mainly focuses on simple traversal queries and employs Spark for complex queries. Therefore, Q3 and Q4 ...
Context 4
... are several missing points in Fig.fig. 10(c) and (d). The experiments corresponding to the missing points run over one day and have not completed. Cassandra+Spark and ST-Hadoop+Spark are overwhelmed by shuffling for Q3 and Q4 with large query time ranges. For Q4, most systems compute the velocity of edges of an object in sorted time order. In contrast, in GreenPlum, the SQL ...
Context 5
... Fig.fig. 10 and Fig.fig. 11, we see that PAST achieves 1-4 orders of magnitude better performance compared with the four existing solutions. The partition and query processing schemes in PAST can effectively reduce the amount of data accessed from the underlying storage and the data communication cost. The main bottleneck of Cassandra+Spark is the ...
Context 6
... Fig.fig. 10 and Fig.fig. 11, we see that PAST achieves 1-4 orders of magnitude better performance compared with the four existing solutions. The partition and query processing schemes in PAST can effectively reduce the amount of data accessed from the underlying storage and the data communication cost. The main bottleneck of Cassandra+Spark is the disk I/Os for ...
Context 7
... track users' browsing behaviors to achieve personalized recommendations. In a customer behavior tracking and mining application, people (customers) and locations can be modeled as graph vertices, while an edge linking a person vertex to a location vertex represents the event that the person visits the location at a certain time, as shown in Fig.fig. 1. This forms a spatio-temporal graph. People visiting similar locations at similar timestamps often have similar personal interests. In other words, it is desirable to discover groups of people vertices that have similar edge structures in the spatio-temporal graphs. Application 2: Clone-Plate Car Detection. A clone-plate car displays a ...
Context 8
... is because PAST utilizes columnar layout for all edges in a sub-partition, and compresses the columnar edge properties. JanusGraph consumes more space than Cassandra because it stores each edge twice at both the incoming vertex and the outgoing vertex. PAST and Cassandra take less space to store one replica than Greenplum because of compression. Fig.fig. 10 and Fig.fig. 11 compare the query performance for all systems, while varying the query time range and the threshold values, i.e. T Htime and T H dist . The Y-axis is executing time in the logarithmic scale. We do not run Q3 and Q4 on JanusGraph as it mainly focuses on simple traversal queries and employs Spark for complex queries. ...
Context 9
... utilizes columnar layout for all edges in a sub-partition, and compresses the columnar edge properties. JanusGraph consumes more space than Cassandra because it stores each edge twice at both the incoming vertex and the outgoing vertex. PAST and Cassandra take less space to store one replica than Greenplum because of compression. Fig.fig. 10 and Fig.fig. 11 compare the query performance for all systems, while varying the query time range and the threshold values, i.e. T Htime and T H dist . The Y-axis is executing time in the logarithmic scale. We do not run Q3 and Q4 on JanusGraph as it mainly focuses on simple traversal queries and employs Spark for complex queries. Therefore, Q3 and Q4 ...
Context 10
... are several missing points in Fig.fig. 10(c) and (d). The experiments corresponding to the missing points run over one day and have not completed. Cassandra+Spark and ST-Hadoop+Spark are overwhelmed by shuffling for Q3 and Q4 with large query time ranges. For Q4, most systems compute the velocity of edges of an object in sorted time order. In contrast, in GreenPlum, the SQL ...
Context 11
... Fig.fig. 10 and Fig.fig. 11, we see that PAST achieves 1-4 orders of magnitude better performance compared with the four existing solutions. The partition and query processing schemes in PAST can effectively reduce the amount of data accessed from the underlying storage and the data communication cost. The main bottleneck of Cassandra+Spark is the ...
Context 12
... Fig.fig. 10 and Fig.fig. 11, we see that PAST achieves 1-4 orders of magnitude better performance compared with the four existing solutions. The partition and query processing schemes in PAST can effectively reduce the amount of data accessed from the underlying storage and the data communication cost. The main bottleneck of Cassandra+Spark is the disk I/Os for ...

Similar publications

Article
Full-text available
Indexing of static and dynamic sets is fundamental to a large set of applications such as information retrieval and caching. Denoting the characteristic vector of the set by B, we consider the problem of encoding sets and multisets to support approximate versions of the operations rank(i) (i.e., computing ∑j≤iB[j]) and select(i) (i.e., finding min⁡...