What is Bound Join?
There is two binding method in SPARQL. Intermediate results incoming from a sub-query sent to a dataset, are substituted with appropriate variable contained in sub-query that will be sent to the next dataset.
In Nested Loop Join method, this substitution is done by substituting each intermediate result one by one. That causes an HTTP request for substitution of each result. Sending an HTTP request to the endpoint means a network latency, so if intermediate results grow bigger, network latency grows too.
In Bound Join method, substitution is done by substituting all intermediate results as batch. All results are binded to the next sub-query at a time. That means an HTTP request for each sub-query sent to a dataset. This method provides reducing network cost (latency).
Extending ARQ With Bound Join
ARQ is a java based SPARQL engine. It uses nested loop join method to bind intermediate results. We extended ARQ to make supplied bound join method.
We implemented several bound join mechanism. One of them is UNION bound join method. It is based on binding results with UNION blocks, but RDF servers (Virtuoso, 4Store) generally restrict the UNION block size. For the queries which construct big intermediate results, splitting intermediate results according a UNION size is required. This causes network latency too, but is a better method than nested loop join.
The other bound join method is FILTER-IN implementation. It is based on binding intermediate results into an IN-TYPE-FILTER block. This method generally runs fast, but SPARQL compiler of RDF servers (Virtuoso, 4Store) may not be optimized for this FILTER type. Queries that generate big intermediate results, could not be executed in a speedy manner.
Our last bound join implementation is FILTER-OR method. Intermediate results are binded into an OR-TYPE-FILTER block. This method is the speedest one we ever made. Virtuoso restricts OR-TYPE-FILTER block size too. So it is not useful to use this method on Virtuoso RDF server. But 4Store is optimized for OR-TYPE-FILTER queries, so if 4Store is used the SPARQL queries will be executed at ultimate speed. Using FILTER-OR bound join method is strongly recommended.
How To Use Bound ARQ?
1-) To use UNION bound join mechanism, union bound join factory must be registered before executing query:
To restrict UNION size of sub-query, union size parameter and union count value should be added into query execution context:
QueryExecution execution = QueryExecutionFactory.create(QueryFactory.create(federatedQuery),ModelFactory.createDefaultModel());
2-) To use FILTER bound join mechanism, filter bound join factory must be registered before executing query:
a-) To use FILTER-IN method filter type parameter and "in-filter" value should be added into query execution context:
b-) To use FILTER-OR method filter type parameter and "or-filter" value should be added into query execution context: