Document Oriented Query Partitioning Technique Implementation#319

pfu3tz

Includes extended support for MongoDB, Cosmos and ArangoDB.

Adding support for mongodb database version 4.4. With this commit random collections are created with optional validation options which can enforce the schema on inserts. Additionally the execution is logged in mongo shell code.

Random Document insertion with the option to follow the schema validation or set random types is now supported. Also an option has been added to randomly set an insert to null. Validation has to be turned off for these options though. Additionally, random indexes are created over random columns, either ascending or descending and when there are multiple indexes added for a single collection it creates a composite index out of the other ones.

The project stage is similar to the SELECT in SQL where we can specify which columns are returned. The Lookup stage is similar to the LEFT OUTER JOIN of SQL, for this we have to specifically make new random columns that specify the join column.

The query ast has multiple key difference to the SQL version. The core is the MongoDBSelect class that holds projection and lookup lists. The filter is then a tree similar to the SQL version. In order to execute and log, which are different because one are API calls and the other MongoDB shell commands there exist two visitors, the ToQueryVisitor and the ToLogVisitor.

Computed functions allows to perform arithmetic operations in queries. It is part of the projection pipeline stage, where a new field is added that holds the result. This update adds support for random computed fields with functions such as add, multiply, pow, sqrt and more.

…on in MongoDB Similar to binary comparison regular expression is a new Leaf node that can be added in the tree. To generate the pattern the existing random string generator is used.

… regular expression Changes to the MongoDBComparatorHelper allow for expected errors to be ignored if they are thrown in a query execution. This feature is mainly necessary for all the illformed regular expressions that the randomized string produces.

Due to the randomly generated computed field tree that has random types as leaf nodes, there are exceptions that we ignore such as wrong type, value not positive for sqrt for example and similar issues.

The options are flags that can be set in a string such as "im". The expression generator now generates a random valid options for the regular expression operator in MongoDB.

… documents in MongoDB This new variation introduces a variant where we execute the pipeline query once with a count as a last pipeline stage and compare the output to the result set without the count stage.

Until now not has been simulated by nor(id exists, bool_expression) and it has led to believe that the underlying structure works fine. Now after the rework the not gets evaluated and every logical operation inverted by the new NegateVisitor. At the lowest stage type problems remain.

This change reveals some issues with how we form queries for mongodb. The core of it seems to be that neither greater equal nor smaller include null values and it is rather hard to define a query that finds the ones that are null. The query that projects includes all.

…serting data For four data types of ArangoDB integer, double, string and boolean, this commit supports creating collection, keeping track of the schema, creating tables, collections and inserting randomized data.

ArangoDB now randomly generates queries that support binary comparison, binary logic operations such as or/and and the unary prefix not. Also everything is logged and the results are checked with the new ComparatorHelper.

A new generator and query has been added to support the new functionality.

Similar to MongoDB, in ArangoDB variables for computed values are created and with a special keyword LET calculated.

This oracle generates a random query, executes it and if the result set is not empty, chooses a document at random and removes it from the collection. The query is executed again to check if the document is really removed and at the end a new document is generated and inserted to make sure that the dataset is not decreasing in size.

When using cosmos make sure to set the configuration string in CosmosProvider.

mrigger

Thanks a lot for the PR and the clean code! I only have some minor nitpicks. Perhaps you can go over them and see which ones are worth addressing (also considering the risk of introducing an untested change).

Would you also be able to add a check for our CI (see https://github.com/sqlancer/sqlancer/blob/master/.github/workflows/main.yml)?

See Pull request sqlancer#319 on github.com/sqlancer for more information.

mrigger

LGTM. Thanks a lot!

pfu3tz added 23 commits March 22, 2021 23:03

Add regular expressions as part of the randomized expression generati… …

147ce91

…on in MongoDB Similar to binary comparison regular expression is a new Leaf node that can be added in the tree. To generate the pattern the existing random string generator is used.

Add expected errors for computed fields in project stage MongoDB …

bd525f6

Due to the randomly generated computed field tree that has random types as leaf nodes, there are exceptions that we ignore such as wrong type, value not positive for sqrt for example and similar issues.

Add logging of computed fields in MongoDB

6aa5385

Add randomized regex options for the regular expression node in MongoDB …

6020a07

The options are flags that can be set in a string such as "im". The expression generator now generates a random valid options for the regular expression operator in MongoDB.

Add a variation where the count is compared to the number of returned… …

5897c86

… documents in MongoDB This new variation introduces a variant where we execute the pipeline query once with a count as a last pipeline stage and compare the output to the result set without the count stage.

Fix MongoDB unsuccessful and successful query counts

498b9de

Complete log implementation for inserts and collection creation

66919d4

Support adding indexes when inserting data for ArangoDB …

acf3d34

A new generator and query has been added to support the new functionality.

Add support for computed functions for ArangoDB …

242c245

Similar to MongoDB, in ArangoDB variables for computed values are created and with a special keyword LET calculated.

Enable computation values in filter expressions for ArangoDB

6de066e

Make query generation null safe with option in MongoDB

97f5e0b

Add Cosmos as a database configuration …

b2afeef

When using cosmos make sure to set the configuration string in CosmosProvider.

mrigger reviewed Mar 24, 2021

View reviewed changes

Clean up and improve code for pull request …

e0b272d

See Pull request sqlancer#319 on github.com/sqlancer for more information.

mrigger merged commit 1be16f0 into sqlancer:master Mar 30, 2021

pfu3tz deleted the DBConnection branch March 30, 2021 09:39

mrigger mentioned this pull request Jan 11, 2022

Test or Remove the Document Stores (MongoDB, ArangoDB, and Cosmos) #442

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document Oriented Query Partitioning Technique Implementation#319

Document Oriented Query Partitioning Technique Implementation#319
mrigger merged 24 commits into
sqlancer:masterfrom
pfu3tz:DBConnection

pfu3tz commented Mar 22, 2021

Uh oh!

mrigger left a comment

Uh oh!

mrigger commented Mar 30, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pfu3tz commented Mar 22, 2021

Uh oh!

mrigger left a comment

Choose a reason for hiding this comment

Uh oh!

mrigger commented Mar 30, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants