Content uploaded by Patrick Stünkel
Author content
All content in this area was uploaded by Patrick Stünkel on Nov 10, 2017
Content may be subject to copyright.
DELTA - A tool for database refactoring
Lisa Leitlo, Marius Schultchen, Florian Selent,
Sascha Sternheim, Patrick Stünkel
Bericht Nr.: 02017/02
Impressum
Forschungsberichte der FHDW Hannover – Veröentlichungen aus dem Bereich Forschung und
Entwicklung der FHDW Hannover.
Herausgeber: Die Professoren der FHDW Hannover
Fachhochschule für die Wirtschaft Hannover
Freundallee 15
30173 Hannover
Kontakt: techrep@fhdw.de
ISSN 1863-7043
DELTA - A tool for database refactoring
Lisa Leitloff, Marius Schultchen, Florian Selent
Sascha Sternheim, Patrick Stünkel
March, 2017
Abstract
It is not the program code that makes a running IT system indispensable, it is the
data being managed. By contrast to code refactoring, improvement of relational
database schema design in operational databases containing lots of data is a rare
occasion in many businesses.
This is mainly due to the fact that the development of migration code between dif-
ferent database schemas is a hassle: First of all, migration code needs extra careful
development and testing in order to prevent information loss in the target database.
Thus in most cases, some extra code must be developed for a fallback solution.
Secondly, migration code is only used once and becomes obsolete immediately after
its successful execution.
However, there are certain concepts like splitting and merging tables, that can
be considered as abstract evolution patterns. Analyzing these recurring patterns
and providing a general solution for the corresponding data migration is part of this
document, as well as explaining the implementation of a tool, that supports the
developer in realizing changes that are based on these patterns.
1 Introduction
The idea of refactoring relational table schemas
1
is not revolutionary. With the book
“Refactoring Databases“ by Scott W. Ambler and Pramod J. Sadalage, published in 2006,
the first comprehensive catalogue of database refactorings appeared (see [AS06]).
The foreword in that book motivates the development of a tool for making refactoring
databases easier:
After my refactoring book appeared I was delighted to find sophisticated
tools appear that automated many refactoring tasks. I hope the same thing
happens with databases, and we begin to see vendors offer tools that make
continual migrations of schema and data easier for everyone.
-- Foreword in [AS06] by Martin Fowler
1
This report proposes improvements of relational tables. Therefore the term “schema” refers to the logical
definition of tables. It does not mean the database schemas which are e.g. defined per user in Oracle and
include the database artifacts.
1
First of all, we have to define the term “database refactoring“. We will use the following
definition by Scott W. Ambler in the further course of this report.
Definition 1
(Database refactoring)
.
“A database refactoring is a simple change to
a database schema that improves its design while retaining both its behavioural and
informational semantics.” [AS06, page 14]
Because applications depend on the database schema, refactoring needs a transition
period, where the refactored new as well as the old schema are accessible. New or
modified applications may already use the new schema, whereas the old schema is
still used by existing applications, which will not stop working due to the refactoring.
According to Ambler/Sadalage’s description of a need for transition periods, this following
requirement is developed:
Requirement 1
(Database evolution needs transition periods)
.
No Database is a stand-
alone applications. It is used by applications which display and edit the contained
information. The adoption of a new database schema in every database-calling ap-
plication can cost time and shall not (or cannot) be implemented at the same time.
Agile developments require frequent releases that change only parts of software and
therefore require transition periods, where the database is accessible through the old as
well as through the refactored schema.
Ambler and Sadalage propose the implementation of this transition time by managing
two separate schemas including two separate copies of data, which are replicated on
access.
In agile development the database schema often changes: keeping several copies
of the data and replicating every update to all copies is not feasible due to resulting
performance problems: having for example two concurrent database schemas doubles
transaction time and halves transaction throughput. Therefore this paper proposes a
solution with a single instance of data conforming always to the newest schema on
the one hand and providing backward compatiblity with the old schema by creating
updatable views
2
on the other hand (see section 2). Writing these updatable views
manually is a hard task, so automating these tasks by code generation is the preferred
way of dealing with it.
Requirement 2
(Migration code must be generated)
.
Nobody wants to write migration
code, just with the aim to use it only once. Instead, database evolution shall be guided
by patterns that allow automatic generation of migration and backward compatibility
code.
Tools that automate database refactoring will be tested against their specified refactoring
actions and thus provide stable and correct operations that do not need to be tested
each time applied. Developers, however, may decide that the newest schema is not
satisfactory and want to put the old schema in place again. Supporting this type of
decisions, the tool implementing database refactorings must be able to undo all database
changes.
2
The views will either be inherently updatable or made updatable through implementation of custom
instead-of-triggers.
2
Requirement 3
(Migration needs to be undoable)
.
In the database evolution process
there can be problems or decisions that require to fall back to the old schema. Thus an
“Undo”-feature is required. Being able to undo every change requires a detailed log of
what has been done on the one hand, and that every refactoring has a compensation
refactoring on the other hand.
Availability of software systems is critical for more and more companies. Current
database migrations require a full shutdown of the database to implement structural
changes and moving data to the new schema. Moving big data demands long time. This
problem must be solved:
Requirement 4
(System downtimes must be avoided)
.
Migration procedures that use
the database exclusively imply system downtimes. Long downtimes, however, must
be avoided by any means: They have a negative impact on productivity and sales,
especially in globally operating companies.
First ideas on how to migrate data concurrently to normal database operation are
discussed in section 4.
1.1 Existing database refactoring theory
Besides Ambler and Sadalage’s theory of database refactorings, a few other authors give
attention to this topic. Domingues, Kon and Ferreira discuss in [DKF11] that refactorings
using a synchronous replication mechanism, as described by Ambler and Sadalage,
between databases using replication triggers “present several difficulties”, e.g. the need
to avoid cyclic triggers and problems of increased transaction response time. Instead,
they focus on asynchronous attempts of replicating refactorings between different
databases containing different schema versions. With this approach response times can
be improved, arising replication conflicts must be solved nevertheless.
In an article on the workflow of database refactoring ( [DACS14] ), it is analyzed how
to compose the steps necessary for refactoring. The authors discuss the example of
introducing a surrogate key and which steps are necessary for implementing it. With
focus on performance aspects the refactoring provides shorter transaction times after
having applied this refactoring.
A performance driven approach is the tool for designing, analysing and refactoring
relational databases called Squash [BSSW07]. Taking database foreign key relations and
row counts into account, their tool allows tuning the database schema with respect to
performance.
The PRISM workbench described by Curino, Moon and Zaniolo provides a language of
Schema Modification Operators. When applying a series of modification operations
PRISM analyses their effect on existing queries and automatically migrates data to the
new schema. [CMZ08]
The paper [PdAJS12] discusses the evolution of databases using Petri-Nets. A “big
change” is composed out of many small changes. The big change which can be repre-
3
sented as a Petri-Net, the small changes are the transitions in the net. The dependencies
of small changes can be reflected by the flow relation in Petri-Nets and starting points
for parallelisation of small tasks can be determined in this structure. This approach
suggests reflecting all refactorings (e.g. described by Ambler and Sadalage) in petri nets
to compose them in a greater scale.
Skoulis, Vassialiadis and Zarras analyzed several open source projects with respect
to their database evolution practices. Their data shows that database refactoring is
necessary in projects and becomes more complicated in high-available applications.
[SVZ15]
More generally, Ambler proposes a test driven database approach and notices the need
of refactoring tools to simplify database operations [Amb07]. With regression tests,
these operations shall be tested automatically. Furthermore Ambler motivates the
development of a tool to support database elements (such as columns or tables) to be
marked as deprecated. In the database evolution process, these deprecated elements
commonly occur.
1.2 Existing tools
There is – for the time being – no dedicated tool, that we know of, which is designed to
refactor database schemas only. However, there are several tools that include certain
factors of what is considered to be in our definition of database refactoring.
SQL Prompt 7 by Red Gate
3
is an add-in for SQL Server Management Studio and Visual
Studio, that allows refactoring of database schemas and rewriting application code. It
supports simple renamings as well as a complex “Split Table” feature. In its documen-
tation, an example for refactoring a person table to support mutiple email addresses
is shown. This tool analyses dependencies of refactorings and rewrites tables, triggers
and views according to the database changes
4
. SQL Prompt 7 is non-extendible and
supports Microsoft SQL Server only.
Another commercial tool that supports Microsoft SQL Server only is ApexSQL
5
. It supports
renaming database columns by rewriting scripts and stored procedures to the new
name
6
. To realize that, ApexSQL shows dependencies of columns in stored procedures
that use it. Furthermore it allows replacing of natural keys with surrogate keys. More
complex refactorings that are supported are “Split Table“ and “Replacing one-to-many
relationships with association tables”.
Other tools that promise making database migration easier, like Liquibase
7
or Flyway
8
focus on the management of the migration scripts, i. e. they provide version control
3http://documentation.red-gate.com/display/SP7/Refactoring+your+code
4https://www.simple-talk.com/sql/sql-tools/sql- refactor-and-the-table-splitting-headache/
5http://www.apexsql.com/sql_tools_refactor.aspx
6http://solutioncenter.apexsql.com/rename-column-without-breaking-sql-database/
7www.liquibase.org
8http://flywaydb.org/
4
and automatic execution of the scripts in different database environments. Flyway
only supports native SQL-scripts. Liquibase supports native SQL too but also provides
an abstraction of elementary database commands using an XML syntax. Furthermore
Liquibase supports built-in refactoring functions for renaming columns, merging two
columns using a delimiter, as well as creating lookup tables, which contain distinct
values of one existing column.
2 Refactoring
Building on the understanding of database refactorings from section 1 we now in-
vestigate the structure and realization of these refactorings. In their seminal book
“Refactoring Databases” [AS06] Scott W. Ambler and Pramod J. Sadalage made two
important major contributions on this topic.
•
Firstly, they proposed a workflow on how to implement database refactorings in
practice.
•Secondly, they developed a thorough catalogue of database refactorings.
2.1 Refactoring Process
The process of database refactoring involves several phases. An ideally small set of
database changes, i. e. the refactorings, is firstly applied in a development environment.
Afterwards it is tested in integration sandboxes, before it finally becomes deployed into
production. This process is iterative and incremental as it comes from the ideas of agile
software development [Amb03].
After a set of changes was applied, a transition phase will start. In this phase the old
and new schema will coexist, such that business applications, which still rely on the
old schema, stay operational, compare Requirement 1. The old database schema will
become deprecated and application developers have to adapt their applications to the
new schema. After the transition phase the old schema is removed from the database
and only the new schema is valid.
It is important to recognize that this transition phase is only necessary if we consider a
multi-application database environment [AS06, p. 18f.]. In a single-application database
environment the refactoring of the database and the application can be done simul-
taneously. In multi-application scenario this is in most cases not feasible because of
practical and organizational limitations, e. g. the different applications belong to different
departments, they follow traditionally longer release cycles, etc. In this paper we will
always consider this complex scenario such that a transition period is needed.
For the transition phase, Ambler and Sadalage compare three possible methods to
preserve the behavioural and informational semantics [AS06, p. 64f.]:9
9
Note that there is also a fourth possibility: If the database is encapsulated with a stored procedure interface,
5
Trigger
In this case, the old and the new schema are running parallel to each other in
different databases or different namespaces of the same database and there are
triggers replicating data back and forth.
Views
In this case, there are updatable views representing the old tables and provide
backward-compatible access to the new tables.
Batch Jobs
In this case, the database is updated on a regular basis with a batch job,
which replicates the data from the old to the new schema and vice versa.
Trigger and batch jobs both produce duplicated data because the tables of the old and
new schema physically coexist. As a result the changes in one schema have to be
replicated to the other and vice versa. This situation is comparable to replication in a
distributed database [GHOS96]. The trigger approach is comparable to synchronous
replication, inheriting its strengths and weaknesses. The triggers keep the data up-to-
date but also may cause deadlocks, a performance bottleneck or even trigger cycles.
The batch job approach is comparable to asynchronous replication, also inheriting its
strengths and weaknesses. The batch job partially absorbs the performance impact
of the replication by moving the update to non-peak-periods. However the delayed
updates can cause referential integrity problems or even cause update conflicts. The
view approach does not introduce any duplicated data. The structure of the old schema
is virtually calculated on demand. Therefore views also provide up-to-date data. The
drawback of this approach is that it adds additional complexity and updateable views are
not always supported by database management systems, especially when underlying
tables have to be selected by a join. Ambler and Sadalage suggest to use the trigger
approach because of their positive experience with this method [AS06, p. 63].
In contrast to Ambler and Sadalage we consider views as the best method to restore the
old schema and preserve compatibility, because we avoid the problems of replication
and duplicated data. To avoid the lack of updateable views, we will provide instead-of-
trigger on every view to realize Insert-, Update- and Delete-statements. These triggers
and the view itself will be generated by the according refactorings, see requirement 2
in section 1. Figure 1 depicts the described refactoring process. An existing database
schema is altered through a refactoring, the table contents are adjusted to the new
structure and views with instead-of-triggers are installed for the transition phase.
2.2 Refactoring Catalogue
After discussing the process of database refactoring in general, we now examine the
refactoring catalogue suggested by Ambler and Sadalage. When studying this catalogue
we observe that a lot of refactorings do not confirm to the requirements of definition 1,
because they add new functionality or break existing functionality, e. g. Add Constraint
or Drop Constraint. We will therefore define our own refactoring catalogue, based on
the refactoring could simply adjust the implementation of the respective stored procedure. But as stored
procedures are not used in general, we will not investigate this option further.
6
Figure 1: Database refactoring architecture
the work of Ambler and Sadalage. It will only inherit the refactorings, which correspond
to the requirements from section 1. Furthermore we will assume a relational schema,
which has been designed as a persistence layer for an object-oriented-model. Therefore
several refactorings are left out because they are not applicable in this object-oriented-
scenario. In Appendix A we investigate which of the refactorings is not inherited into our
catalogue and explain the reasons why they are omitted. The following enumeration
gives an overview of the refactorings, described by Ambler and Sadalage, which we
adopt:
Introduce Calculated Column
Adds a new column to an existing table. The content
of this column is based on calculations involving the existing data. A possible use
case is performance tuning by providing pre-calculated values, which are derived
from other data. [AS06, p. 81f.]
Merge Columns
Merges two columns within one table into one by combining them
with a merge-strategy. A possible motivation might be, that two columns store
almost identical information. [AS06, p. 92f.]
Merge Tables
Merges two tables into a single table by joining their columns and
potentially merging records. A possible use case might be, that their usage by the
applications has become the same. [AS06, p. 96f.]
Move Column
Migrates a column with all its values to another table. This refactoring
might be applied for normalization or denormalization purposes. [AS06, p. 103f.]
Rename Column
Changes the name of an existing column. This might have technical,
functional or organizational reasons, e g. to avoid name clashes, the meaning
of the column has changed or the naming conventions have changed. [AS06, p.
109f.]
Rename Table
Changes the name of an existing table. Reasons are similar to rename
column. [AS06, p. 111f.]
Split Column
Splits an existing column into several columns. The data is also split
accordingly. A reason might be that more fine-grained data is needed. [AS06, p.
7
140f.]
Split Table
Splits an existing table by spreading its columns into different tables. Rea-
sons might be normalization purposes of performance improvement. [AS06, p.
145f.]
2.3 Refactoring Starting Point
As mentioned before, we assume that our database is designed as the persistence layer
for an object-oriented-model consisting of interrelated entities. An entity is mapped to a
table while attributes of an entity will be become columns in the corresponding table.
Additionally every table receives a numeric column
ID
, which acts as the primary key
for this table and is filled with surrogate keys to distinguish between different records.
The relations between entities are expressed by foreign key constraints between the
corresponding tables. If an entity points to another entity, the respective table receives
an additional numeric column, which has a foreign key constraint pointing on the
primary key column of the target table. This will represent a directed many-to-one
relation. By adding a
UNIQUE
- or
NOT-NULL
-constraint on this column, the multiplicity of
the source-end of this relation can be varied to become at-most-one or at-least-one. If
both constraints are combined, the multiplicity becomes exactly-one on the source end.
Bidirectional or set-valued relations are realized as associative tables.
Note that a database schema created in this way meets the requirements of the second
normal form [Cod71]:
•
As every attribute is mapped to a column and as we assume that the domain of an
attribute only consists of atomic values, the schema satisfies the requirements of
the first normal form.
•
By adding surrogate key for each entity table we automatically satisfy the require-
ments of the second normal form, because all attributes are dependant on the only
key candidate, i. e. the surrogate ID.
Figure 2: Table diagram
8
Figure 2 illustrates the syntax we will use hereafter to illustrate refactorings in general
and in practical use cases. The notation is based on the Information Engineering notation
introduced in [Mar90]. It depicts the schema level, i e. tables, their columns, and their
relations and the multiplicities of the relation, which is expressed with a so called “crows-
foot-notation”. Optionally we add a new graphical element: instance-tables. These
tables are used to depict to structure of the records on the value level of the database.
Value-tables are connected to the type-table with an instanceOf -arrow.
2.4 Refactoring Framework
We now introduce our own catalogue of refactorings, which is influenced by the work of
Ambler and Sadalage. In general our catalogue inherits the refactorings mentioned in
section 2.2. It additionally adds further preconditions to some of the refactorings and
even introduces new ones, which are inspired by or resemble more atomic variants of
the refactorings in [AS06]. Every refactoring in this catalogue satisfies the requirements
of section 1:
•
The refactorings do not add new functionality nor do they remove functionality
from the existing schema. They only change the structure of the data inside the
schema.
•Every refactoring is reversible by its inverse. 10
2.4.1 Rename Table or Column
The most simple refactoring is a simple rename of a table or a column. It was also
mentioned by Ambler and Sadalage. Because renaming is an isomorphic mapping of
the respective structure, there are no functional requirements nor records affected. The
only technical precondition enforces that the new name is unique in the respective name
space.
2.4.2 Introduce Calculated Column
This refactoring introduces a new column by calculating derived data, i. e. for every
record in the table a given function provides a value for the new column. This refactoring
is mentioned in [AS06] and is very abstract since almost everything can be provided as
a function. We think of the following calculation methods to be appropriate for a wide
array of use cases:
•a constant value,
•a copied value from another column,
10
Additionally some refactorings can create auxiliary structures. These structures do not add new information
as they consist of derived data. If such a refactoring is reverted, the auxiliary structures could be deleted
without any negative consequences.
9
•the current value of a sequencer or
•the result of an arbitrarily defined stored function in the database.
Figure 3 depicts the functionality of this refactoring. Note that this refactoring has no
inverse. But, as the calculated column contains derived data it simply can be dropped
without loss of information.11
Figure 3: Introduce calculated column refactoring
2.4.3 Merge and Split Columns
Two nullable columns of a table may be merged if the columns are of the same type
12
and there is at most one column filled within each row at value level. A discriminator
column is created to keep track of the origin of the value. Conversely a column can
be split into two nullable columns where at most one of the two columns is filled in
every row of the resulting table. The split function is determined by a descriminator
column. Therefore both refactorings are inverse to each other. These refactorings are
refinements of the respective refactorings introduced by Ambler and Sadalage. They,
however, give no specification of the split or merge function. We restrict the refactoring
to be only applicable if both columns represent a partition of a dataset. The refactoring
is depicted in figure 4.
2.4.4 Spin-off Empty Table
Every table is able to spin off an empty table containing no data except of it’s primary
key. This refactoring has two variants, namely “spin off” and “spin off inverse” which
differ in the direction of the generated foreign key that connects the existing table
11
If the chosen calculation function is not stateless, e. g. when using a sequencer, the later restoration of
a previously dropped calculated column will not result in the same values as before. The scope of this
problem is out of the scope of this report.
12 Same data types and same constraints
10
Figure 4: Merge and split column refactoring
(parent) with the split-off table (child), compare figure 5. In both cases the generated
foreign key relation is 1-to-1. This refactoring also has no inverse. But as the spin-off
table contains no real information it can safely be deleted.
13
This refactoring is not
directly introduced in [AS06], it is, however, a step in their composite refactoring split
tables.
Figure 5: Spin off empty table refactoring
2.4.5 Move Column
A column can be moved along or against a foreign key. The relation has to be 1-to-1
in both cases. Otherwise data can be lost if the multiplicity is less than one or we can
experience merge problems if it is greater than one. This refactoring is the inverse of
itself. This refactoring is in [AS06]. We added the precondition of the existence of an
one-to-one relation between the affected tables due to the aforementioned reasons.
Furthermore in our version, to reduce complexity, there is only one column moved per
refactoring step. Due to this restriction, it is not possible to move multi-column keys.
Note that this refactoring, in combination with spin-off-table, realizes the variants of
Merge and Split Table in [AS06].
13
One might argue to introduce a new refactoring to merge tables, which solely consist of a single id-column,
with other tables, we do not consider such a refactoring necessary at the moment.
11
Figure 6: Moving column refactoring schema
2.4.6 Merge and Split Table
Tables may also be merged and split horizontally. The merge of two tables is only possible
if all columns of the tables have the same type and the values in the primary key column
are pairwise disjoint. A discriminator is created to keep track of the origin. Conversely a
table can be split by a descriminator column into two similar tables containing distinct
records. Therefore both refactorings are inverse to each other. Figure 7 illustrates
this refactoring. This refactoring is not mentioned in [AS06]. Admittedly there exist
refactorings with the same name but they only involve a vertical split.
Figure 7: Merge and split table refactoring
2.4.7 Transcode Foreign Key
Any foreign key can be switched to point to another column within the same table, which
has a unique-constraint and an not-null-constraint. The old primary key then becomes a
regular column with a unique- and a not-null-constraint. On the value level, the contents
of the referencing column are transcoded.
14
This refactoring is inverse to itself. The
refactoring is depicted in figure 8. Again this refactoring is not mentioned [AS06]. We
introduced this refactoring to be able to alter relations between tables as well.
14 The restore a view on the old table, mapping tables are created to keep track of the transcode process.
12
Figure 8: Transcode foreign key refactoring
2.4.8 Compose Foreign Keys
If there is a chain of foreign key constraints between several tables, the foreign keys
can be composed.
15
It enables us to change foreign-key-contraints, which span multiple
tables. The effect of the refactoring is depicted in figure 9. The inverse of this refactoring
is a decomposition as depicted in the figure. As well as transcode-foreign-key this
refactoring is not mentioned in [AS06].
Figure 9: Compose foreign keys refactoring
3 Case Study
In section 2.4, we introduced a catalogue of general refactorings, which we claim to be
sufficient for sophisticated database transformations that preserve semantics. We will
illustrate this claim by a practical example. Consider the database schema in figure 10,
which is used as the data model for an insurance contract management application of a
small insurance company which sells car insurances.
15 This composition is analogous to the composition of functions know from Mathematics.
13
Figure 10: Practical example: basic model
Figure 10 shows the basic model, which will serve as the foundation for several refac-
torings. The model consists of the three tables
Person
,
Contract
and
CarInsurance
. It
supports several insurance contracts for the same person (policy holder). The table
Contract acts as junction table between the person and the insurance product. This
simple model lacks a sophisticated abstraction level, but might be sufficient for a small
application. Persons and their postal addresses are stored in the same table. Therefore,
this model does not support persons having multiple addresses, which is quite common
today. At this point we will use the two refactorings:
•SpinOffTable (see sec. 2.4.4)
•Move Column (see sec. 2.4.5)
which will help us moving the address of a person into a separate address table and will
enable database support for persons with multiple addresses. First we use
SpinOffTable
to create an empty Address-Table with a foreign key pointing to the Person-table. The
mapping between these two tables is one-to-one thus there is an address record for
every person record in the person table. The one-to-one-relationship allows us to move
the columns: street,city and zipCode into the newly created address-table by applying
MoveColumn three times.
Figure 11: Practical example: database schema after first refactoring
Figure 11 shows the database schema after the first refactoring. Affected columns
and tables are marked in red. Legacy application can still access the previous schema
14
version through the view
16
and can also modify the person table through instead-of-
triggers, compare section 2.3. As time goes by and the legacy applications are adjusted
to the new schema, the one-to-one-referential-constraint between the two tables can be
dropped so that the data model is finally capable of supporting persons with multiple
addresses.
For the next step of the incremental development of our database schema, let’s assume
that our insurance company has acquired another insurance company, which sells
liability insurances. Therefore, the database of the two companies has to be consolidated.
For the sake of simplicity, let’s assume the database schema of the acquired insurance
company is the same as in figure 10 but instead of a
CarInsurance
a contract points to
aLiabilityInsurance.
Before we start consolidating the two databases, we observe that our current schema
does not explicitly model a second partner of each contract. To introduce the second
partner role we must be able to distinguish between natural persons like our customers
and juristic persons like insurance companies. With the help of:
•SplitTable (see section 2.4.6)
we are able to split the person table into a
NaturalPerson
-table and a
LegalPerson
-
table. At this point we extend the functionality of
SplitTable
by assuming that original
could optionally be preserved physically.17
The decision whether a record belongs to the legal or natural-person-table is taken
from the result of a stored function that checks whether the gender column is
null
or
not. The newly created tables have a foreign key referencing the origin table, which
is identical to their id-column. This allows us to use
Move Column
again with a push-
down-semantic, compare [Fow99, p. 329] to move the columns, which are specific to
natural persons, into the according table. Note that
SplitTable
allows us to transform a
database schema which uses the pattern one-table-per-hierarchy
18
into a schema which
uses table-per-type19. The resulting situation is depicted in figure 12.
Before the next refactoring step can start, we will have another transition period in
which the new applications and legacy application will coexist. This transition period
can be used to insert two new records into the
Person
-/
LegalPerson
-table which will
represent our two insurance companies. As soon as the legacy applications are finally
abandoned or updated to the new schema, we are going to refactor our database again
to introduce multi-tenancy-support. For this purpose we will use:
•AddCalculatedColumn (see section 2.4.2)
We will use
AddCalculatedColumn
with a constant value (the id of the juristic person
which represents our car insurance company) to add a foreign key column of the second
16
In real databases the name of the view for the old model and the new model have to be distinct or have to
be in different databases or schemas. For sake of simplicity both elements have the same name.
17
This means that the original structure is not preserved by virtual calculation, i. e. a View, but by leaving the
old table physically untouched. In many cases this seems to be a bad idea because it creates redundant
data. In inheritance scenarios however this redundant information might be explicitly needed.
18 Commonly referred to as Single Table Inheritance [Fow02, p. 278]
19 Commonly referred to as Class Table Inheritance [Fow02, p. 285]
15
Figure 12: Practical example: database schema after second refactoring
contract partner to the contract-table later. Afterwards we can establish a foreign key
from contract towards the person-table. The new situation is depicted in figure 13.
Figure 13: Practical example: database schema after third refactoring
Now, that our schema is capable of multiple tenants we can start to integrate the
database of the acquired insurance company. As mentioned earlier we assume that
the acquired company used almost the same schema. So we will once again use Ad-
dCalculatedColumn to add the column
liabilityInsuranceId
, which will represent a
foreign key to the new
LiabilityInsurance
-table. We will assign the new column with
the constant value
null
. Finally all the records of the acquired company will be imported
into our database. The new situation is depicted in figure 14.
The two tables
CarInsurance
and
LiabilityInsurance
realize the abstract entity In-
surance with the pattern table-per-concrete-type
20
. When accessing the insurances
through the contract table, this seems to be a design flaw, (1) because both tables have
some columns in common and (2) the access via
Contract
needs an outer join over
both tables, since a contract can either refer to a car insurance or a liability insurance.
20 Commonly referred to as Concrete Table Inheritance [Fow02, p. f293]
16
Figure 14: Practical example: database schema after merging a second data stock
So in a final refactoring step, we transform this situation into an implementation of the
insurance entity by the pattern class-table-inheritance using the following refactorings:
•SpinOffEmptyTableInverse (see section 2.4.4)
•TranscodeForeignKey (see section 2.4.7)
•MergeColumn (see section 2.4.3)
•MergeTable (see section 2.4.6)
We will decompose the refactoring from concrete-table-inheritance to class-table-inheritance
into four steps. This process is depicted in figure 15.
•
First we use
SpinOffEmptyTableInverse
to spin an empty table off the
CarInsur-
ance
and the
LiabilityInsurance
. These tables represent the abstract particles.
The foreign key column which connects theses tables is therefore called
super
.
Note that we can specify the Sequence which is used to provide ids for the en-
tries in these tables. Therefore the ids of the entries in both tables are disjoint
when we use the same sequence. Additionally we use MoveColumn to move the
price-Column into the abstraction.
•
Now that we know that the ids of both abstract particles are disjoint and both tables
share the same structure, we can use
UnionTable
to merge both abstract particles
into a new
Insurance
-Table. The result is a union of all entries in
CarInsuranceA
and LiabilityInsuranceA.
•
In the next step we use
TranscodeForeignKey
to change the target
liabili-
tyInsuranceId
as well as the
liabilityInsuranceId
over the
super
-Foreign-Key
to the id of the new Insurance-Table.
•
The final step applies the
MergeColumn
-refactoring. The columns
carInsur-
anceId
and
liabilityInsuranceId
are disjoint by design, which means that, if
carInsuranceId
is set, then
liabilityInsuranceId
is null and vice versa. Also
both columns have a foreign key to the same column. Thus, all preconditions
for MergeColumn are satisfied such that both columns can be merged into one
insuranceId column.
17
Figure 15: Steps to transform concrete-table-inheritance to class-table-inheritance.
18
This examples demonstrates that the catalogue of refactorings introduced in section
2.4 provides all necessary means to migrate back and forth between different mapping
patterns of object-oriented systems to relational databases, compare [Fow02, pp. 216-
306]. Also the refactoring framework provides support for legacy applications during
transition periods through the use of views and instead-of-triggers. In general the
practical example demonstrates that the refactoring catalogue is capable of:
•
Decomposition of entity-tables to prepare transition from one-to-one-relation-ships
to one-to-many-relationships21.
•Transformation of a single-inheritance-entity into a class-table-inheritance-entity.
•Introducing multi-tenancy-support by new arbitrarily calculated columns.
•
Introducing new primary keys in order to be able to merge records of different
tables which might not have had unique ids so far.
•
Transformation of a concrete-table-inheritance-entity into a class-table-inheritance-
entity.
4 Migration Strategies
Migrating data is a necessary task when applying database refactorings to an existing
database because existing data has to be preserved. The requirement 4 to reduce
downtimes of productive systems shall be respected when defining a strategy for
migrating the schema as well as the data in the system.
4.1 Offline Migration [A]
The first migration strategy proposes a dedicated time period during which the database
is offline and the migration is applied exclusively. In the first step the schema modifica-
tions will be executed on the database
22
. In the second step all necessary migrations
scripts will be applied in order to move the data into the new schema. During migration
the database is offline and cannot process transactions from other users.
4.2 Stepwise Transactional Migration [B]
In strategy [A], one single migration period will apply the data migration actions in a
single unit. The migration strategy [B] tries to decrease the time period where the
database is inaccessible by other applications: in the first step the database schema
has to be changed according to the refactoring, after this step the full refactoring is
21
Note that if you extract a new table of the extracted entity again you can transform one-to-many-
relationships to many-to-many-relationships. The extracted table will serve a junction-table
22
This paper separates schema modifications (DDL) from data migration (DML) since it is necessary in Oracle
database systems. DBMS that support DDL-statements in transactions may combine these steps.
19
Figure 16: Stepwise Copy-Strategy [B1]
Figure 17: Stepwise Move-Strategy [B2]
broken up into several small transactions that migrate only parts of the data. Such a
decomposition of the complete migration into small parts, e.g. migrating rows one by
one, requires a database view that combines migrated and original data sets.
The idea is to migrate a row when it is accessed by the system (and not migrated already)
by using database triggers. Because there are rows which are accessed very rarely,
a concurrent batch job shall run small transactions to migrate unaccessed database
rows. Each migration transaction can be either copy or move existing data from the old
schema into the new schema or moving data from existing schema into the new one.
In a stepwise copy migration [B1] a new model will be created, which contains no data
in the beginning. The new view presents the new database schema and aggregates
data from the new model (for already migrated rows) as well as from the old data.
Furthermore a trigger on this new view migrates each row accessed for update into the
new model by copying. On the old model a trigger will replicate each database change
to the new model. For the user accessing the database, the old as well as the new
schema will be acessible, compare figure 16.
The stepwise move strategy [B2] (see figure 17) is similar to copying, but here the new
20
model is the leading provider for access on migrated rows. During migration (either by
access trigger or batch) the current row will be moved to the new model and deleted in
the old. Because this procedure stores migrated data in the new model only and not
(yet) migrated data in the old model, the old as well as the new view need to access both
models. The old model is only read by transactions and changes will only be performed
on the new model.
4.3 Two Phase Migration [C]
Strategy [B] shows which challenges have to be faced when cutting migration transac-
tions into pieces. Creating these small transactions which migrate only parts of the data
is difficult and therefore a migration strategy that allows single migration scripts has
to be achieved: in the first step of this strategy([C]) the database (or single tables) is
exported into a database system that is not accessed by users. On this database copy
the migration script is executed and upgrades to the new schema. In the still running
original database triggers intercept every action and log it. In a short downtime period
this log-information gets integrated into the copied database which will be set productive
for all applications afterwards. The integration of the logs is not difficult, because the
introduced mechanism of views and instead-of triggers mime the old schema exactly
and may be used during migration.
4.4 Comparison of Migration Strategies
Strategy [A] is best for small databases and applications that tolerate frequent but short
downtimes. It has downsides in large databases, however, with respect to requirement
4. Strategy [B] is a good option because it allows a concurrent migration in the live
system, which runs alongside the normal transactions – however, the view definition
requires detailed analysis of the type of refactoring and might become very complex,
when looking at aggregating queries which use both schemas extensively. Furthermore,
a seperate column for managing the current migration status of a column is required.
Strategy [C] combines short downtimes and is therefore better than [A], but cannot
reach [B] in this point. The definition of single migration scripts makes [C] the intended
migration strategy. 23
5 Refactoring Framework
This section deals with general considerations on how to design a refactoring framework
used in a refactoring tool for developers. Since larger migrations are composed of a
series of applications of smaller refactorings, it is essential that the framework supports
a preview of the refactored database. Each relational database stores two types of data,
23
The tool, which has been developed in the context of this paper, uses strategy [A] to perform migrations on
the database for simplicity.
21
which are affected by a refactoring: the table schema and stored data. Data in the
database will be migrated in the migration procedure (see section 4) and a representation
in a refactoring framework cannot be established as data “lives” and changes during
the development process. The database schema is fixed and a refactoring framework
can maintain a virtual database schema where each refactoring is applied to and where
the impact of refactorings can be previewed by the developer. A complex refactoring
requires multiple refactoring steps. To undo single steps after previewing the result
requires a stack of virtual schemas – this will be discussed in section 5.2.
Furthermore in future investigations, a dependency analysis between selected steps
in a complex refactoring can be done on the base of this layered virtual schema and
optimisations of refactoring steps may be discovered.
5.1 Virtual database schema
To support a simple
24
refactoring design process, a refactoring framework has to offer
a virtualisation of the actual database schema. After an application of a refactoring, it
has to reflect the changes of a refactoring in the virtual database schema. Therefore
a virtual database schema has to represent all schema elements of an actual schema
e.g. tables, views, columns, primary keys, foreign keys, unique constraints, sequences,
functions, datatypes and triggers. The schema elements themselves may consist of
other schema elements e.g. a table consists of a collection of columns, a primary key,
foreign keys and unique constraints.
A virtual database schema supports the refactoring development process as follows:
1. The actual schema has to be converted to a virtual one.
2.
The virtual schema will be presented to the designer in a comfortable graphical
user interface.
3.
The framework supports the designer by displaying possible refactorings for each
schema element.
4. The designer chooses a refactoring and applies it to the virtual schema.
5. The designer verifies the result.
6.
The designer continues the development of further refactorings until the migration
is complete.
7.
The refactoring framework applies the SQL-statements that are generated by the
framework for the designed refactoring to the actual database (this is the actual
migration).
8. The designer verifies the result.
24
A refactoring design process is simple, if the result of an application of a refactoring can be verified without
modifying a “real” database, which have to be otherwise restored after a mistake in the design process.
22
Figure 18: Refactoring framework layer example
In this approach, the virtual schema is changed by every refactoring. Thus, undoing
fthe last refactoring requires the reload of the actual schema and the reapplication of all
preceding refactorings to the virtual schema.
5.2 Layered virtual database schema
To prevent the reapplication of all refactorings after an undo of the last refactoring, it is
preferable to represent a virtual database schema as a stack of layers, each created by
the application of a refactoring. The first layer is given by the virtualized actual database
schema.
In contrast to a virtual database schema, the representation of a schema element is
divided into two parts.
1.
Every schema element has to be uniquely identified by an id, e.g. the name of
the table, view or foreign constraint. Schema elements without an id additionally
inherit the id of the “container” e.g. the id of a column is a tuple consisting of the
column id and the id of the table the column is part of.
2.
The actual schema element consists of properties (e.g. not null constraints) and/or
ids of other schema elements e.g. a table consists of a collection of column ids, a
primary key id, foreign key ids and unique constraint ids.
In addition to the usual schema elements, manager schema elements have to be added
to manage the top level schema elements (tables, views, sequences, functions and
triggers), for example the table manager consists of a set of table ids.
With these concepts the refactorings can produce separate layers. An example layer is
presented in figure 1825 which is constructed in the following way:
1.
All schema elements of the actual database schema are loaded and converted to
the described format. (See layer 1).
2.
A refactoring which, for example, deletes a table and creates a new column in an
other table creates a new virtual database layer. Afterwards it copies the table
manager without the id of the deleted table and adds it to the new virtual database
25
Every column represents a schema element in each layer. E.g. the table with the id “tableId1” consists of
two columns in layer 1.
23
Figure 19: Refactoring framework layer independence example
layer. In a following step it adds the id of the new column and the column itself
to the virtual database layer. Finally, it copies the table, add the new column to it
and inserts the table to the virtual database layer. (See layer 2).
3. The new virtual database layer gets stacked on top of the existing layers.
4.
The actual representation of the virtual database schema is now presented by
the chain of layers in the following way. To represent the tables the top layer
gets asked for the table manager which it provides directly, since it contains a
new version. Then for each id of the table manager the layer gets asked for the
corresponding table. Since the table id of the removed table is not contained in the
table manager, it is not represented by the virtual database schema anymore. The
table with its new column gets loaded from the top layer while the other columns
of that table are loaded from the underlying layers. (See actual representation)
5.
The next refactoring can be designed on the base of this new representation. The
application of the next refactoring will create a new layer.
Having created this layered virtual schema, the withdrawal of a refactoring is straight-
forward, i.e. the created layer of the refactoring has to be popped off the stack of
layers. Finally, this layered virtual database schema offers the possibility to examine the
sequential independence of the refactorings in a common way. For two refactorings to
be sequentially independent, the actual representations of the application of both refac-
torings in both orders have to be equal. An example situation is presented in figure 19.
Here the addition of two columns to two different tables are sequential independent.26
6 Conclusion
Finally, we summarize the results and observations we made throughout this paper
and give an outlook for future development plans. First of all, this paper provides a
catalogue of refactorings, see section 2.4), which are inspired by the theoretical work of
S. Ambler and P. Sadalage, compare section 2. The principal feature of these refactorings
26
Further aspects of the examination of sequence independence of refactorings will not be covered, since it
requires the unification of compensating views. The addition of two columns of different refactorings to the
same table will result in the creation of two views which are not independent from each other since one
view will restore the other view. Therefore two whole refactorings which are adding columns to the same
table are not sequential independent.
24
is that they do not cause information loss in contrast to some refactorings introduced
in [AS06], see appendix A for more details. The practical applicability is shown by a
case study in section 3. Secondly, this paper provides a hands-on-approach to provide
backward compatibility for legacy applications by restoring the old schema with views
and instead-of-triggers, see section 2.3. This approach is extended by considerations
of different migration scenarios to apply these refactorings on a big data stock, see
section 4. Additionally all presented refactorings were implemented inside an open-
source Java Database Migration tool, that is a side-result of this paper. The technical
and implementation details of this application are presented in appendix B.
The following subjects will be topic of future work and research:
Evaluating the application
The created refactoring tool has been successfully tested
on the presented use case and a small data stock. In the future this application
has to be tested with various different database models and larger amount of data.
Especially the performance of the instead-of-triggers with real legacy applications
is a special point of interest. For further information on open technical issues, see
section B.3.
Develop sophisticated database migration support
In section 4, several consid-
erations on different migration strategies were made. These strategies have to be
analysed in greater detail and have to be implemented in the refactoring-tool.
Analyse the composition and reordering of refactorings
Section 5 introduces a
sophisticated framework to cope with the technical aspects of the database by a
virtual database model and to divide refactorings into deltas (see appendix B.2.2)
which can be composed to more complex transformations. This approach has to
be developed further to be able to compose refactorings. The main benefit of such
compositions is that the execution of a composition might be less complex than the
execution of all the small contained transformations because some intermediate
steps are redundant, e. g. an intermediate renaming of table is redundant if
the table is renamed again in the next step. For further theoretical work on the
composition of refactorings see [Löw11]. Another important aspect which should be
taken into consideration is the reordering of the refactorings and deltas respectively.
If two refactorings are sequentially independent their order inside a composed
refactoring can be adjusted to achieve advantages like better performance, less
complex transformation steps etc.
References
[Amb03]
Scott W Ambler. Agile Database Techniques : Effective Strategies for the
Agile Software Developer. Wiley Publishing Inc., 2003.
[Amb07] Scott W Ambler. Test-driven development of relational databases. Software,
IEEE, 24(3):37–43, 2007.
25
[AS06]
Scott W Ambler and Pramod J Sadalage. Refactoring databases: Evolutionary
database design. Pearson Education, 2006.
[Blo08] J. Bloch. Effective Java. Java Series. Pearson Education, 2008.
[BSSW07]
Andreas M Böhm, Dietmar Seipel, Albert Sickmann, and Matthias Wetzka.
Squash: A tool for designing, analyzing and refactoring relational database
applications. In 17th International Conference on Applications of Declarative
Programming and Knowledge Management (INAP 2007) and 21st Workshop
on (Constraint), page 113. University of Würzburg, Germany, 2007.
[CMZ08]
Carlo A Curino, Hyun J Moon, and Carlo Zaniolo. Graceful database schema
evolution: the prism workbench. Proceedings of the VLDB Endowment,
1(1):761–772, 2008.
[Cod71]
Edgar F. Codd. Further normalization of the data base relational model.
Technical Report RJ909, IBM, 8 1971.
[DACS14]
Marcia Beatriz Pereira Domingues, Jorge Rady De Junior Almeida, Wilian Fran-
sça Costa, and Antonio Mauro Saraiva. A workflow for database refactor-
ing. International Journal of Innovative Computing, Information and Control,
10(6):2209–2220, 2014.
[DKF11]
Helves Humberto Domingues, Fabio Kon, and João Eduardo Ferreira. Asyn-
chronous replication for evolutionary database development: a design for the
experimental assessment of a novel approach. In On the Move to Meaningful
Internet Systems: OTM 2011, pages 818–825. University of Sao Paulo, Brazil,
2011.
[Fow99]
Martin Fowler. Refactoring : improving the design of existing code. The
Addison-Wesley object technology series. Addison-Wesley, Reading, Mass.
[u.a.], 1999.
[Fow02]
Martin Fowler. Patterns of Enterprise Application Architecture. Addison
Wesley, 2002.
[GHJV94]
E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns: Elements
of Reusable Object-Oriented Software. Pearson Education, 1994.
[GHOS96]
Jim Gray, Pat Helland, Patrick O’Neil, and Dennis Shasha. The dangers of
replication and a solution. SIGMOD Rec., 25(2):173–182, June 1996.
[Löw11]
Michael Löwe. Refactoring information systems - association folding and
unfolding. Forschungsberichte der FHDW Hannover, 2011.
[Mar90]
James Martin. Information Engineering, Planning &Amp; Analysis: Book 2.
Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1990.
[PdAJS12]
Marcia Beatriz Carvalho Pereira, Jorge Rady de Almeida Junior, and
Jose Reinaldo Silva. Evolution of databases using petri nets. In Anais do
26
XIX Congresso Brasileiro de Automática, CBA 2012., pages 5232–5238. Poly-
technic School of the University of Sao Paulo, Brazil, 2012.
[SVZ15]
Ioannis Skoulis, Panos Vassiliadis, and Apostolos V Zarras. Growing up with
stability: How open-source relational databases evolve. Information Systems,
53:363–385, 2015.
27
A Abandoned refactorings
DELTA implements only a small selection of the refactorings described by Scott W. Ambler
and Pramod J. Sadalage in their seminal book “Refactoring Databases” ( [AS06]). This
section justifies, why a majority of the refactorings in “Refactoring Databases” is not
implemented.
Information loss
Some of the refactorings in [AS06] result in an information loss, if they are not applied in
the right context. Thus, they do not constitute refactorings in isolation. Refactorings of
this category are:
Drop Default Value
,
Apply Standard Code
,
Introduce Common
Format
,
Drop Foreign Key Constraint
and
Introduce Hard Delete
. Also,
Drop
Table
and
Drop Column
can cause information loss, if the table or column does not
contain completely derived data.
Schema modifications
Many of the refactorings are schema modifications only and do not change the structure
of the data at all. Some of them are schema reductions, like
Drop Column Constraint
and
Drop Non-Nullable Constraints
, which is a special case of
Drop Column Con-
straint
. Other are schema extensions:
Introduce Column Constraint
,
Introduce
Default Value
,
Make Column Non-Nullable
,
Introduce Index
,
Encapsulate Table
With View
and
Add Foreign Key Constraints
. Because code refactoring is not in
the focus of DELTA (see below),
Replace Method(s) With View
is only a creation of a
new view and is consequently a schema extension. The same applies to
Replace View
With Method(s)
, which is the opposite of
Replace Method(s) With View
. Some
other refactorings like
Drop Table
and
Drop Column
are schema restrictions, if the
data being dropped is completely derived. If the table or column to drop, was created
by a refactoring, such as
Add Calculated Column
, the column can safely be deleted
without information loss.
Replace Column
and
Apply Standard Types
can be both
schema restrictions or schema extensions depending on the new type of the column.
Because all of these refactorings modify the stored data, the do not satisfy definition 1
in section 1.
Drop View
is also a schema modification, but one, that can be repaired by
using the defining select of the view instead of using the dropped view.
Not used for normalised tables
In section 2.3 we described, that we require the database to be at least in second normal
form. Thus, Replace LOB with Table and Split Column do not occur in our set-up.
28
Behaviour modification
Introduce Cascading Delete
causes a modification of the behaviour of the database,
because after this change “the database automatically deletes the appropriate child
records when a parend record is deleted” [AS06]. We don’t want to have behaviour
modifications because the applications, which are working with the database, have to
continue working stable after the change. All refactorings in the chapter
Transforma-
tions
of [AS06] describe changes to the semantics and the behavior of the database
schema. This is not in the focus of DELTA.
Not atomic implemented refactorings
Some of the refactorings can be implemented by using some other refactorings. There-
fore we do not implement them as is.
Introduce Surrogate Key
can be replaced
by
Add Calculated Column
and then
Change Foreign Key
(see Section 2.4).
Replace
Surrogate Key With Natural Key
can be replaced in the same way.
Consolidate
Key Strategy
can be substitute by using
Change Foreign Key
and
Compose Foreign
Key
several times.
Replace Type Code With Property Flags
can be replaced by
using
Add Calculated Column
.
Add Mirror Table
can be replaced by using
Add Cal-
culated Column
several times.
Replace One-To-Many with Associative Table
is not
needed, since it can be replaced by
Spin-off Empty Table
and
Move Column
: First you
spin off an empty table and move the foreign key into the empty table. Second you
have to copy the primary key of the referencing table as foreign key column into the
new table.
Move Data
moves the data contained in a table, either all or a subset of its
columns, to another table. This refactoring cam only be applied in the context of other
refactorings (for example Move Column, see Section B.2.1.3).
Trigger
Some of the refactorings use triggers for implementing something. We prefer stored
procedures instead. We do not implement refactorings with trigger, because we act on
the assumption that triggers should not be used this way in databases.
27
Therefore we
do not implement
Add Trigger For Calculated Column
and
Introduce Trigger For
History.
Code Refactorings
Some of the refactorings are code refactorings, because they do not change the database
structure, but the application source code.
Add CRUD Methods
,
Add Read Method
,
Introduce Calculation Method
,
Migrate Method From Database
(rehost an ex-
isting database method in the application),
Migrate Method To Database
and all
27 For implementing the refactorings triggers are useful and needed (see section 2.3).
29
refactorings of the chapter
Method refactorings
belong to this category. Code refac-
toring is not in the focus of DELTA.
Other reasons
Introduce Soft Delete
is the opposite of
Introduce Hard Delete
. Instead of deleting
a row there is a flag that indicates that a row has been deleted. This is a preparation
for
Introduce history
. The problem of implementing this refactoring is, that you
have to implement a cascade delete on your own, because the data otherwise are not
consistent under foreign key constraints. Therefore we do not implement this refactoring.
Introduce Read-Only Table
creates a read-only data store based on existing tables in
the database. This is not a refactoring of a productive transactional database system.
This can be used for data warehouse instead. Because DELTA is a refactoring tool for
online transcational processing systems (OLTP) it does not support this refactoring.
Use
Official Data Source
is not needed, because using an official data source for a given
entity should be naturally for a good programmer.
30
B DELTA
Applicability of the theoretical results of this report in practical projects cannot be
shown without proper tool support. The refactoring tool DELTA,
28
developed by the
master course HFP414 at FHDW Hannover, provides a first version of such a tool by
implementing the introduced refactoring catalogue of section 2.4 and the refactoring
framework of section 5: In the current version, it uses an offline migration strategy (see
section 4.1). More advanced techniques like the two phase migration strategy (see
section 4.3) and the composition/reordering of refactorings (see section 6) are planned
for forthcoming releases
DELTA is developed with the Java
29
programming language in version 1.8 and supports
the Oracle Database30 currently. The tool is currently not available on one of the major
open-source-hosting platforms. We are currently working on an open-source release.
31
The following sections describe the architecture and the available refactorings of DELTA.
B.1 Architecture
The architecture of DELTA is made up of three layers: the database layer, the model
layer and the view layer.
B.1.1 Database layer
The first layer is accessible by the interface
Database
which extends the interfaces
DatabaseConnector
,
Operator
and
StatementExecutor
.
Database
represents the con-
nection to the database via the interface
DatabaseConnector
, provides access to the
database structure via the interface
Operator
and offers a possibility to execute SQL
statements via the interface StatementExecutor.
The
Operator
is used to build up the domain model (the schema elements) reflecting
the database structure, e.g. it reads out the static data dictionary of an Oracle DB.
Besides that, the
Operator
interface can be easily used to mock an actual database
implementation for test purposes.
The
Database
interface is implemented by the class
OracleDatabase
. Figure 20 shows
the important interfaces and classes of the database layer.
28
The name DELTA stems from the greek letter
∆
, which is often used in engineering to denote the change of
a physical quantity. In our tool the name DELTA is used to denote an atomic change of the database, which
can be composed to form our refactorings.
29 https://www.java.com/
30 http://www.oracle.com/index.html
31 A pre-release version can be requested at any time at patrick.stuenkel@htp-tel.de.
31
Figure 20: Database layer
B.1.2 Model layer
The second layer is divided into two packages. One package for schema elements and
one package for deltas and refactorings.
B.1.2.1 Schema elements
The package
schemaElements
represents the domain
model of a database structure. It contains e.g. a
Table
class, a
Column
class and a
TableManager
class. At the same time, it provides the necessary separation of schema
elements into id part and actual schema element needed for an layered virtual database
schema (compare section 5.2). These structures are presented in Figure2132.
To represent the data type of a
Column
there is the interface
DataType
with imple-
mentations like
NumberDataType
and
Varchar2DataType
which can be constructed by a
factory
33
which parses the string representation of the data type offered by a call to the
function getColumns of the Operator.
For a coherent representation of all different schema elements in a treeview structure,
there is the operation
getChildren
in
Element
. It takes a
SchemaElementProvider
as
input and returns a collection of
SchemaElement
s. In most cases these elements are
SchemaElementProxies referencing an actual element34.
The
SchemaElementProvider
is the abstraction for a layered virtual database. With
the operations
getElement
and
getKeys
it provides the view to the actual represen-
tation of the stacked database layers. Besides that, it offers the operation
getNext
which provides the last
SchemaElementProvider
, i.e. the actual representation of the
stacked database without the top layer. There are two different implementations of the
SchemaElementProvider.
The
StandardSchemaElementProvider
administers an
ElementStore
for each subclass
of
ElementID
. An
ElementStore
is a container in which corresponding
Elements
are
32
Unimportant ID elements e.g. ID elements which are not referenced by other elements are omitted for the
sake of clarity and ease of comprehension.
33 Further information for the factory pattern can be found in [GHJV94].
34
This proxy pattern [GHJV94] is necessary since the tree structure gets lazy loaded, because it would
unnecessarily prolong the system start if the whole database structure gets loaded at once.
32
Figure 21: Schema elements
stored and from which
Elements
get deleted. Therefore all
ElementStores
of an
Stan-
dardSchemaElementProvider represent a layer.
The
DatabaseSchemaElementProvider
represents the actual database. For performance
reasons, it manages a
CachedElementContainer
for each class of an
ElementID
. The
actual
Elements
are loaded by a corresponding
DatabaseElementProvider
which uses
an implementation of the
Operator
interface to load the information from the database.
The corresponding class diagram is shown in Figure 22. Common implementations are
abstracted into the AbstractSchemaElementProvider which uses ElementProviders.
B.1.2.2 Deltas and Refactorings
The second key concept of the refactoring tool
architecture, is the notion of
Delta
and
Refactoring
. A
Delta
represents an arbitrary
change to a virtual database layer, while a
Refactoring
represents a change that is
not noticeable for an application knowing the old version of the schema only. Thus
all possible
Refactoring
s are a subset of all possible
Deltas
. Consequently the inter-
face
Refactoring
extends the interface
Delta
in Figure 23
35
and represents a marker
35 Figure 23 represents the most commonly used Deltas only.
33
Figure 22: Schema element provider
interface36.
A
Delta
has a name and can be applied to a database layer i.e. an
SchemaElement-
Provider
to produce an
AppliedDelta
.
Deltas
can be composed
37
for reusability
whereby the application of a
DeltaComposition
produces an
AppliedDeltaComposi-
tion
. An
AppliedDelta
represents the change to a database layer in two dimensions.
It represents the new actual database layer by the association
actualProvider
and it
contains a collection of SQL statements. These statements produce the corresponding
database schema when applied to the actual database.
The impact of a
Delta
on these dimensions can vary considerably which is shown by the
following sample Deltas:
CreateColumn
creates a column only in the virtual database layer since it is always
part of a
DeltaComposition
which produces a whole
Table
with a
CREATE TABLE
statement.
InsertIntoTable
only produce an
INSERT INTO
statement without element addition to
or removal from a virtual database layer respectively.
AddNewColumn
alters an existing layer with an
ALTER TABLE
statement. In addition,
it adds a new
Column
to the virtual database layer and updates the virtual repre-
sentation of the Table.
CreateInsteadOfTrigger creates a script for an INSTEAD OF TRIGGER38. Furthermore
it adds a corresponding
Trigger
to the virtual database layer and updates the
TriggerManager.
To understand the creation of a real
Refactoring
the following example describes the
36 Further information on marker interfaces can be found in [Blo08].
37 Further information on the composite pattern are available in [GHJV94].
38
Instead of trigger can be used to modify complex views, see.
http://docs.oracle.com/cd/B19306_01/
appdev.102/b14251/adfns_triggers.htm
34
Figure 23: Deltas
refactoring RenameTable more precisely.
Example 1
(RenameTable)
.
The
RenameTable
refactoring is a
DeltaComposition
com-
posed of the Deltas SimpleRenameTable and CreateView.
When applied, the
RenameTable
delta produces the simple statement
ALTER TABLE
’oldTableName’ RENAME TO ’newTableName’;
. The changes to the virtual database
layer need more effort and can also be decomposed into the following steps:
RemoveOldAndAddNewTable
First it is necessary to get a renamed copy of the
old
table
. Therefore all
ColumnIDs
of the table have to be renamed since a
ColumnID
contains the
TableID
of the table which contains it
39
. The
PrimaryKeyID
,
UniqueConstraintIDs
and
ForeignKeyIDs
can be adopted without renaming since
they are uniquely identified in the whole database schema. The new table can be
added while the old table can be deleted via the
ElementStore<TableID, Table>
of the StandardSchemaElementProvider.
DeleteAndCreateColumns
All old columns have to be removed from the
ElementStore
<ColumnID, Column>
of the
StandardSchemaElementProvider
and copies of the
columns with the renamed ColumnID have to be added.
RemoveOldAndAddnewTableToTableManager
To represent the renaming of the ta-
ble in the actual presentation of the schema the
TableManager
has to be copied
with the removed and the added
TableID
and inserted to the
ElementStore
<TableManagerId, TableManager>40 of the StandardSchemaElementProvider.
UpdateForeignKeys
The changes of the
ColumnIDs
demand changes of the
For-
eignKeys
since they consist of
ColumnIDs
of the table which contains the
For-
eignKeys
. Therefore the
ForeignKeys
get copied and added to the
ElementStore
39 The reason for that is, that a column name is not unique in a database schema but only in a table.
40 The TableManagerId is a singleton (see [GHJV94] for further details on singletons) for the whole database
schema.
35
<ForeignKeyId, ForeignKey>
of the
StandardSchemaElementProvider
. Note that
no
ForeignKey
have to be deleted since the id of a foreign key does not change,
since it is unique in the whole database schema.
UpdateUniqueConstraints
This step is analogous to UpdateForeignKeys but with
UniqueConstraints.
UpdatePrimaryKeys
This step is analogous to UpdateForeignKeys and UpdateUnique-
Constraints but with PrimaryKeys.
The
CreateView
delta produces the statement
CREATE VIEW
’oldTableName’
(
’all column
names of the table’
) AS SELECT (
’all column names of the table’
) FROM
’newTable-
Name’
;
. The changes to the virtual database layer are produced by the following
steps:
CreateNewView
creates a new
View
with ’oldTableName’ as
ViewID
and all column ids
of the
Table
with the
TableId
’newTableName’ and renames (back)
TableId
-Part
of the
ColumnIDs
from ’newTableName’ to ’oldTableName’. This
View
is added to
the ElementStore<ViewId, View> of the StandardSchemaElementProvider.
CreateNewColums
For the next step, copies of the columns with the renamed
Colum-
nID
have to be added to the
ElementStore<ColumnID, Column>
of the
Standard-
SchemaElementProvider.
AddViewToViewManager
The last step adds a new version of the
ViewManager
in-
cluding the previously added
ViewID
’oldTableName’ to the
ElementStore<ViewID
,View> of the StandardSchemaElementProvider.
The application of these two
Deltas
results in a
Refactoring
, since the old schema
is recreated by the view. Insertions, Updates and Deletions to the view will result in
corresponding changes to the new table41.
B.1.3 View layer
The third layer contains the main view
42
of the application (see Figure 24), which is
divided into five parts:
TOP
The top part includes the connection settings to the database and the authentica-
tion information.
CENTER
The center part shows the actual virtual database layer. It is presented as a
tree with the root element schema which includes the tables,views, etc. The tree
structure represents the complete structure of the schema, i.e. the decomposition
of the elements into sub-elements and the reference structure of foreign keys.
Since an expanded foreign key shows the referenced table which can be expanded
as well, the presented structure can be cyclic and some elements can occur
41
This is true, since this view is an updatable view supported by oracle (see
http://docs.oracle.com/cd/
B19306_01/appdev.102/b14251/adfns_triggers.htm).
42 Java FX is used as the graphical user interface framework (docs.oracle.com/javafx/).
36
Figure 24: Sample view
several times in the tree view.
43
A context sensitive menu represents the possible
refactorings which are applicable to the selected schema element.
RIGHT
After a selection has been made, the next refactoring can be configured and
submitted for execution in the right part of the main view.
LEFT
The submitted refactoring produces a new virtual database layer and updates the
center part of the view. In addition, the submitted refactoring (including its deltas)
is displayed on the left part of the main view as a tree.
BOTTOM
With the control buttons in the bottom part, the listed SQL script of the
refactorings can be exported or directly executed on the database. A progress bar
shows the progress of the execution.
The class diagram which realizes the view is shown in Figure 25. The five parts are
implemented by the NodeControllers
•ConnectionController (TOP),
•SchemaElementTreeController (CENTER),
•SchemaElementOperationController(RIGHT),
•RefactoringsController (LEFT) and
•ProgressObserver (BOTTOM).
43 This is the reason why SchemaElementProxies are included (cf. Section B.1.2.1).
37
Figure 25: View layer
Every
NodeController
manages a JavaFX
Node
which represents the GUI-Elements
described by an fxml-file. The
RootController
represents the complete layout which is
composed of the other five nodes.
The
MainApp
is the entry point of the application. It initializes the
NodeControllers
such
that the view is presented to the user. If the user chooses a schema element from the
view of the
SchemaElementTreeController
, the
OperationProvider
offers a collection
of all possible
Operations
for the schema element.
44
If an
Operation
gets chosen by
the user the corresponding
SchemaElementOperationController
is built and its
Node
gets presented in the right part of the view via the
PanePresenter
. The submission
of the filled
TextFields
of the
Node
triggers the creation of a
Refactoring
which is
added to the
ScriptPerformer
and updates the virtual database layer, i.e. rebuilds the
trees of the
SchemaElementTreeController
and the
RefactoringsController
. If the
corresponding SQL script of the refactoring gets evaluated on the database the progress
is presented by the
ProgressObserver
and the virtual database layer will be built again
with cleared caches45.
B.1.4 Layer integration
Since the
MainApp
is the entry point of the application, it integrates all described lay-
ers. Therefore, the class diagram in Figure 26 shows the associations of the
MainApp
into the different layers as well as the important operations it implements to provide
the layer integration. When the application starts, the
MainApp
creates the
Oracle-
Database46
and wraps it into a
DatabaseSchemaElementProvider
. This provider is used
44 This functionality is realized with the help of the visitor pattern which is described in [GHJV94].
45
The loaded schema elements of the database are usually cached (cf. section B.1.2.1). Thus after actual
changes to the database the cache has to be cleared for a valid representation of the schema
46 In future versions of the application further database implementations can be supported.
38
Figure 26: Layer integration
as the
oldProvider
for an empty
47 AppliedDeltaComposition
. The special
DeltaCom-
position ComposedRefactoring
represents the container in which new refactorings will
be added. On the base of the
ComposedRefactoring
, a new
AppliedDeltaComposition
will be created, if a new refactoring gets added. Additionally, the
MainApp
creates the
OperationsManager
with itself is implementor of the
PanePresenter
and
ScriptPer-
former
interfaces. Finally the view gets built with the help of the
NodeControllers
cf.
section B.1.3.
47
An empty
AppliedDeltaComposition
contains no
AppliedDeltas
and returns the
oldProvider
as the actual
virtual database schema.
39
B.2 Available Refactorings
The available refactorings in DELTA are introduced in this chapter. Each refactoring
consists of multiple smaller steps, the so-called deltas. These deltas, in contrast to the
refactorings, do not preserve the original functionality and semantics of the database
48
themselves, but they can be re-used in many refactorings.
B.2.1 Refactorings
In this chapter the implemented refactorings of section section 2.4 and their technical
realization are described in detail. The explanation is divided into the following parts:
1.
The steps of the refactoring are described and used deltas are referenced (see
section B.2.2).
2. The parameters are listed which are needed by the refactoring.
3.
The preconditions that must be fulfilled before the refactoring can be used are
specified.
B.2.1.1 Add Calculated Column
Add Calculated Column
realizes the refactoring
Introduce Calculated Column
(see
section 2.4.2). It adds a new column to a table. The values of the cells in the new column
can be calculated by one of the following ways:
•ConstantValue sets all cells to a constant value.
•CopyColumn
copies the value of a corresponding cell in another column of the
same table.
•Sequence sets the next available number of a sequence.
•Function
calculates the value in the new cell by a function which is applied to the
data in the same row of the table.
The following steps describe the implementation of the refactoring with the help of
deltas:
1.
The first step of the refactoring has to determine the data type of the added
column. The data type for
•ConstantValue
is given by a corresponding parameter (see section B.2.2.9).
•CopyColumn
is inherited by the data type of the column, which will be copied
(see section B.2.2.8).
•Sequence is NUMBER (see section B.2.2.9).
•Function is given by the return type of the function (see section B.2.2.36).
48 Because backward compatibility is not given, they are not called refactorings.
40
2. Because legacy applications shall work with the old database structure, the table
has to be renamed (see section B.2.2.35).
3. Then the new column is added to the table (see section B.2.2.4).
4.
The data in the individual cells of the new column is calculated variant depended.
•
For
ConstantValue
, the constant value is given by a parameter.(see section
B.2.2.29).
•
For
CopyColumn
, the value for each record is copied from the other column
(see section B.2.2.28).
•
For
Sequence
, the value for each cell is the next available sequence number
(see section B.2.2.32).
•
For
Function
, the value is calculated by a function
49
and corresponding
columns which represent the parameters of the function (see section B.2.2.31).
The function and the columns to be used are specified by appropriate param-
eters.
5.
To represent the old table schema to the legacy applications, a view (with the
original table name) is added which excludes the added column (see section
B.2.2.24).
6.
An insert into or update on the view requires different handling for each variant
with the help of instead-of-insert-triggers (see section B.2.2.2) and instead-of-
update-triggers (see section B.2.2.3).
•
For
ConstantValue
, an insert to the view fills the new column with the
constant value . An update to the view does not affect the table with the new
column.
•
For
CopyColumn
, an insert to the view fills the cell in the new column with
the value of the copied column. An update in the view results in a distribution
of the new value to two cells in the new table, namely to the cell in the original
column and the cell in the added column.
•
For
Sequence
, an insert to the view fills the new column with the next
available sequence value. An update to the view does not affect the table
with the new column.
•
For
Function
, the value is calculated by the provided function on the basis of
the values provided for the cells in the other columns. An update in the view
results in a recalculation of the value of the new column by the function on
the basis of the current values of the cells for the other columns.
7.
Additionally an optional unique constraint can be created for the calculated value
(see section B.2.2.17).
49 An available PL/SQL function in the database
41
Parameters
To add a calculated column, the following parameters have to be specified.
The first parameter is the name for the new table. The old name is used for the view
as described above. The second parameter is the name of the new column. The third
parameter chooses the type of calculation. Depending on the chosen type the following
additional parameters have to be added.
•
For
ConstantValue
, the data type of the constant value and the constant value
itself have to be given.
•
For
CopyColumn
, the name of the column that contains the originals for the
copies has to be specified.
•For Sequence, the sequencer to use has to be specified.
•
For
Function,
the PL/SQL-function to use and the columns of the table which are
used for the calculation as parameters.
Additionally a check box offers the possibility to constrain the new column as unique
with a specified constraint name.
Precondition
The preconditions for this refactoring are simple. The name of the
new table, the column and the optional unique constraint have to be unique in the
corresponding context.50
•For ConstantValue, the constant value has to match the chosen data type.
•
For
Function
, the count and the data types of the parameters have to correspond
to the count and the data types of chosen columns.
B.2.1.2 Merge Columns
Merge Columns
merges two columns within a table. It realizes the identically named
refactoring from section 2.4.3. The columns must have the same type. If one column is a
single-column foreign key, the other column has to be a foreign key column referencing
the same column in the same target table. At least one of the columns must be NULL for
each row of data. The pair of columns are called left and right in the following context.
1. The table is renamed (see section B.2.2.35).
2.
The data type of the merged column is stored to the element store (see sec-
tion B.2.2.8).
3. A new data column is added to the table (see section B.2.2.4).
4.
The integer type for the discriminator column, compare (5) below, is stored to the
element store (see section B.2.2.9).
5. A new discriminator column is added to the table (see section B.2.2.4).
50
In the following refactorings, it is assumed, that the names for new schema elements are always unique in
the corresponding context.
42
6.
Data from the left column is transferred to the new data column (see section B.2.2.27).
7.
Data from the right column is transferred to the new data column (see sec-
tion B.2.2.27).
8.
The old table is restored by a view with the old table name, which contains both
merged columns (see section B.2.2.18).
9. The left column is dropped from the table (see section B.2.2.25).
10. The right column is dropped from the table (see section B.2.2.25).
11.
For inserts. updates and deletes on the view three instead-of-triggers are created.
a)
The
insert trigger
of the view inserts the data set in the renamed table and
fills the value of the merges columns into the new column. The discriminator
column ist filled with the left or rigth column respectively.
b)
The
update trigger
of the view updates the data set in the renamed table.
If one of the merged columns gets updated in the view, the new column is
updated instead.
c)
The
delete trigger
of the view deletes the data set from the renamed table.
Parameters The refactoring Merge Columns needs six parameters.
•Table: The table which contains the merged columns
•Left column: The first column to be merged.
•Right column: The second column to be merged.
•Data name: The new column name that contains the merged data.
•Discriminator name: The new column name that contains the discriminator.
•New Table name: Name of the table after the refactoring.
Precondition
•
The most important precondition for this refactoring is, that the data-types have
to be the same for left and right column. If one of it is a foreign key column, the
other has to reference the same column.
•Both columns have to be in the same table.
•At least one of each column has to be NULL for each data row.
B.2.1.3 Move Column
Move Column
moves a selected column from one table to another table. It realizes the
same-named refactoring from our catalogue (see section 2.4.5). In the following, the
table from where the column is removed is called source table. The other table is called
43
target table. The column is moved along a foreign key, this means, that you can either
move one column from the table, that contains the foreign key column, to the table,
which is referenced by the foreign key or the other way around. For the time being, only
columns without any constraints can be moved. To move a column from one table to
another, you have to select the foreign key, that connects the source and the target
table. The refactoring executes the following steps:
1.
The data type of the moved column is stored to the element store (see sec-
tion B.2.2.8).
2. A new column is added to the target table (see section B.2.2.4).
3.
The new column is filled with values by a join over the foreign key to the source
table (see section B.2.2.30).
4.
Because legacy applications shall work with the old database structure, the target
table gets renamed (see section B.2.2.35).
5.
A view with the old table name of the target table, which contain all columns of
the target table without the moved column, is created to restore the old database
structure (see section B.2.2.24).
6. The column is dropped from the source table (see section B.2.2.25).
7. The source table is renamed (see section B.2.2.35).
8.
The old source table is restored by a view with the old table name, which con-
tains all columns of the original source table (including the moved column, see
section B.2.2.24).
9.
The created views are not directly updatable because they join two tables. There-
fore three instead-of-triggers are created for each view that handle updates, inserts
and deletes. They extend the abstract triggers for update (see section B.2.2.3),
insert (see section B.2.2.2) and delete (see section B.2.2.1)
a1)
Direction A: Move against foreign key: The
insert trigger
of the target table
view inserts the data set in the target table and fills the moved column initially
with a null value. The
insert trigger
of the source table view inserts all data,
except the value of the moved column, in the source table. In the target table
a new row with a primary key assigned by the sequencer is added and the
value of the moved column is stored in this row.
a2)
Direction B: Move with foreign key: The
insert trigger
of the target table
view inserts the data set in the target table and fills the moved column initially
with a null value. The
insert trigger
on the source table inserts a new row
in the underlying table and updates a (necessarily previously inserted) row in
the target table.
b)
The
update triggers
for both views update the tables. If the moved column
gets updated in the view of the source table, the target table is updated. An
44
update of the foreign key in the source table is not allowed.
c) The delete triggers on both views delete the data set from the tables.
Parameters The refactoring Move Column needs five parameters.
•Direction: The first parameter is the direction, in which a column is moved.
•
New names for target table and source table: These parameters provide the names
for the new tables. As mentioned above the refactoring renames the table in order
to allow legacy applications to work with the old database structure.
•
Column to be moved: This parameter provides the column, which shall be moved.
•
Sequencer: The last parameter is the sequencer for the target table. This se-
quencer is needed for the insert trigger into the view for the source table. If a new
data set gets inserted into the view of the source table, the data have to be split
up for both tables (value of moved column in target table and all other values in
source table). Until the insert into the view of the target table, we have to insert a
empty data set with only a primary key, a foreign key that pointed to the inserted
data set in the source table and the value of the moved column. For this insert into
the target table, a new primary key from the sequencer is needed. (This parameter
is only used if you move a column against a foreign key.)
Precondition
•
The most important precondition for this refactoring is, that the foreign key relation-
ship is restricted to multiplicity
one
on both sides. This means the foreign key is not
allowed to be nullable and the referenced column has to have a unique-constraint
on it.
•
Because the insert trigger of the view for the source table inserts a data set with
only the primary key, the foreign key and the value of the moved column (all other
values are unknown and therefore null) into the target table, the columns of the
target table have to be nullable besides the primary key and the foreign key.
B.2.1.4 Rename Column
Rename Column51
realizes the refactoring
Rename Column
from section 2.4.1. It can
rename any column in a database. The old table is restored by a view. The refactoring is
done this way:
1. It renames the table to the new table name (see section B.2.2.35).
2. It renames the column to the new column name (see action B.2.2.34).
51 The delta action “Rename Column” has the same name and is used during this refactoring.
45
3.
It represents the old table schema to the legacy applications as a view with the old
table name (see section B.2.2.19)52.
Parameters
This refactoring requires a new name for the column and a new name for
the table.
B.2.1.5 Rename Table
Rename Table53
realizes the refactoring
Rename Table
from section 2.4.1. It renames
a table and restores the old table as a view. The refactoring is simple:
1. It renames the table (see section B.2.2.35).
2.
It represents the old table schema to the legacy applications as a view with the old
table name (see section B.2.2.24).54
Parameters The only necessary parameter is the new name for the table.
B.2.1.6 Spin-Off-Empty-Table
From a table (called source table) an empty spin-off-table can be created in two different
ways. This refactoring therefore realizes the two variants of
Spin-off Empty Table
,
described in section 2.4.4. The first way (
Spin-Off-Empty-Table
) creates a table which
references back to a primary or unique key of the source table. Therefore the spin-off-
table is composed of two columns. The first column is the primary key column and the
second one is the foreign key column pointing back to the source table. The second way
of creating a spin-off-table (
Spin-Off-Empty-Table-Inverse
) creates a spin-off-table
which is referenced by the source table by a new foreign key. Therefore, a new column
is added to the source table and the spin-off-table contains a primary key column only.
The first four steps of both variants coincide. Thus, they are described only once. The
steps 5 and 6 are described separately for each refactoring.
1. A calculated column with a sequence (see section B.2.1.1) is added to the source
table to determine the primary keys of the new spin-off-table.
2.
The data type
NUMBER
is stated for the primary key column of the spin-off-table
(see section B.2.2.8).
3. The column gets created in the virtual database schema (see section B.2.2.7).
4. The primary key gets created (see section B.2.2.13).
52
Instead-of-triggers are not needed for this refactoring since Oracle is able to update a simple view without
joins.
53 The delta action “Rename Table” has the same name and is used during this refactoring.
54
Instead-of-triggers are not needed for this refactoring since Oracle is able to update a simple view without
joins.
46
5.
To create the new spin-off-table and fill the new columns, the order of the following
steps is important.
•Spin-Off-Empty-Table
a)
The column with which the spin-off-table references back have to be
added to it (see section B.2.2.6).
b)
The foreign key constraint can be stated on column (see section B.2.2.11).
c) The spin-off-table can be created (see section B.2.2.16).
d) The spin-off-table can be filled (see section B.2.2.33).
e)
The calculated column from above can be dropped (see section B.2.2.25),
since it was only used to provide the primary keys for the spin-off-table.
•Spin-Off-Empty-Table-Inverse
a) The spin-off-table gets directly created (see section B.2.2.15).
b)
The spin-off-table is filled with the values of the calculated column from
the source table (see section B.2.2.33).
c)
The foreign key constraint can be added to the calculated column of the
source table (see section B.2.2.5).
6. The views have to be restored for the refactorings.
•Spin-Off-Empty-Table
a)
The table can be directly renamed
55
by the corresponding refactoring
(see section B.2.1.5).
b)
The instead-of-insert-trigger is added (see section B.2.2.2). It has an
additional script which inserts a corresponding record into the spin-off-
table.
c)
The delete trigger is added (see section B.2.2.1). The instead-of-delete-
trigger has an before script which deletes a corresponding record from
the spin-off-table. Thus the foreign key remains valid.56
•Spin-Off-Empty-Table-Inverse
a) The table is renamed by the corresponding delta (see section B.2.2.35).
b)
A view is created which represents the legacy structure of the table (see
section B.2.2.24). It has to exclude the new column i.e. the calculated
column from above which references the primary key of the spin-off-table.
55
Since the
Spin-Off-Empty-Table
refactoring does not change the source table, one can argue that it
is not necessary to introduce a view to represent its old state. It is, however, not possible to use an
instead-of-trigger in Oracle on a table. Thus, it is necessary to use this indirection to be able to delete a
corresponding record of the spin-off-table since it references the source table by a foreign key.
56
Note that an update trigger is not necessary since we do not allow updates on the foreign keys since the
introduced indirection is one-to-one.
47
c)
The instead-of-insert-trigger is added (see section B.2.2.2). It declares
an additional variable
t$$DELTA$Primkey
which is set to the next free
sequence number from the sequence which is used to generate the
primary keys for the spin-off-table. Then a before script inserts into the
spin-off-table using the variable as primary key. Then the record for the
source table gets inserted which references to the inserted spin-off-table
record. Thus the foreign key constraint is still valid.
d)
The instead-of-delete-trigger is added (see section B.2.2.1). It declares
an additional variable
t$$DELTA$SPINPRIMCOL
in which the primary key
of the referenced spin-off-table gets saved in a before script. After the
record of the source table gets deleted, the corresponding record in the
spin-off-table gets identified by the variable and deleted by an additional
script.
Parameters
•
The
Spin-Off-Empty-Table
refactoring is applied on a primary or unique key of
the source table.
•The Spin-Off-Empty-Table-Inverse refactoring is applied on the source table.
The parameters for both refactorings (
Spin-Off-Empty-Table
/
Spin-Off-Empty-Table-
Inverse) are
•New table name
•Spin-Off-table name
•Primary key column name
•Primary key constraint name
•Sequence for primary key
•Foreign key column name
•Foreign key constraint name
B.2.1.7 Split Table
Split Table divides the data from a table, in the following called source-table, into two
separate groups and moves these data into two different tables. These tables are called
target-tables. It therefore realizes the refectoring with the same name from section 2.4.6.
To improve the usability, this refactoring has two work modes, which affect the columns.
1.
In the first work mode, all columns in the source-table will be moved to the target-
tables. Except the primary key column, which are copied only to the target-tables.
2. The second work mode copies the primary key to the target-tables only.
48
The primary key remains in the source-table, so that foreign keys, that referencing the
primary key in the source-table, stay intact. The division of the data is made along a
column in the source-table, called discriminator-column, and a function which contains
the logic to choose the correct target-table. The function must return a number and
accept one parameter with the data type of the discriminator-column. If the result is 0
the data will be moved into the first target-table otherwise it is moved to the second
target-table.
The refactoring performs the following steps.
1.
The two new target-tables are created via the delta CreateWholeTable (see section
B.2.2.23).
2.
After creating the target-tables the data is moved from the source-table to the
target-tables with the delta InsertIntoTableByFunction (see section B.2.2.33).
3.
Before recreating the old schema as a view, the source-table has to be renamed.
This is done by the delta RenameTable (see section B.2.2.35).
4.
The old database structure is restored from this refactoring with a view that is
created by the delta CreateViewWithUnionTable (see section B.2.2.22).
5.
At last the instead-of-triggers for the view are created, to handle data changes. For
inserts, an instead-of-insert-trigger is created (see section B.2.2.2). This trigger
inserts the new data into the source-table and also in the correct one of the target-
tables based on the value in the discriminator-column. An update changes the
data in the source-table and also in one of the target-tables in which the according
row exists. This is done with an instead-of-update-trigger (see section B.2.2.3). If a
row is deleted the associated rows in the source-table and the correct target-table
will be deleted with an instead-of-delete-trigger (see section B.2.2.1).
Parameters
1. The first two parameters are the new names of the two target-tables.
2.
The parameter discriminator-column contains the column of the source-table,
which is used for dividing the data in the new two target-tables.
3.
The next parameter contains the SQL function, which is used for dividing the data
of the discriminator-column.
4.
With the last parameter it can be controlled if the columns are copied from the the
source-table to the target-tables like it is described in the different work-modes.
Precondition
This refactoring is only possible, if the following conditions are fulfilled:
•
There is no other foreign key constraint that points on any other column of the
source-table than the primary key.
49
•
The source-table does not contain any other unique constraint besides the primary
key.
B.2.1.8 Change Foreign key
This refactoring realizes the two refactorings
Transcode Foreign Key
(2.4.7) and
Compose Foreign Keys
(2.4.8) from section 2, by changing the target of a foreign
key constraint. This affects the structural layer (the foreign key constraint itself) and
the behavioural layer as well (the data inside foreign key column). To be able to access
the data before the refactoring, this refactoring creates a mapping table which holds a
mapping relation between the old and the new values of the foreign key column in the
according table. For the new target constraint there are the following possible options:
•A unique constraint (first direction of Transcode Foreign Key) or
•A primary key constraint (second direction of Transcode Foreign Key) or
•A foreign key constraint (Compose Foreign Keys).
If a primary key or unique constraint is selected the foreign key is directly transcoded
to that primary key. If a foreign key is selected the foreign key will be transcoded to
target of that foreign key which means that the foreign key will be changed to point to a
different table than before. The refactoring is executed in the following steps:
1.
Create and fill a mapping table with the columns primary key, old foreign key and
new foreign key, see section B.2.2.12.
2. Drop old foreign key constraint, see section B.2.2.26.
3.
Update the values of the foreign key column with the new value from the mapping
table, see section B.2.2.38.
4.
Recreate the foreign key constraint with the new target constraint, see sec-
tion B.2.2.5.
5. Rename the table to the given new name, see section B.2.2.35
6.
Restore the old data as an updatable view with the information from the mapping
table, see section B.2.2.21.57
Parameters To switch a foreign key, the following parameters are needed:
•The Foreign-Key-Constraint which should be changed
•
A
Primary-Key-Constraint
, a
Unique-Constraint
or
Foreign-Key-Constraint
of the table, which is referenced by the old constraint. If a primary key of unique
constraint is chosen, the foreign key is transcoded to this constraint. If a foreign
key is chosen, the foreign key is transcoded to the target constraint of the selected
foreign key (composition of foreign keys).
57 This view is inherently updatable and does not need explicit defined instead-of-triggers.
50
•
The new name of the host table which is changed, in order to distinguish between
the new transformed table with the changed foreign key and the old table, that is
restored for backward compatibility.
Preconditions
For a successful execution the given new target constraint has to be
a member of the host table of the target constraint which is referenced by the foreign
key which is going to be changed. Also the columns of the changed foreign key and
the columns the new target constraint must match which means their count, order and
types correspond to each other.
B.2.1.9 Union Tables
The refactoring
Union Tables
realizes the refactoring
Merge Tables
from section 2.4.6.
It takes two tables with equal schema (called first-table and second-table) and calculates
their union in a new table (called new-table). A new column (called origin) is generated
in new-table that indicates whether the row belongs to the first or second table. This
column is introduced in both old tables and filled with the name of each table. The
data from the second-table is inserted in the first-table. Afterwords the second-table
is dropped and the first-table renamed. Foreign keys pointing at the second-table are
redirected to the first-table.
The following steps are performed by the refactoring.
1.
A new column (named Origin-Column) is introduced in both of the old tables (First-
Table and Second-Table). The value of this column is the name of the table for each
row. Therefore the deltas
CreateConstantDataType
(see B.2.2.9),
AddNewColumn
(B.2.2.4) and FillColumnByConstant (B.2.2.29) are used.
2.
The data of the second-table is inserted in the first-table and the second-table is
dropped. The delta UnionTables B.2.2.37 is used for this task.
3.
The first-table is renamed with the delta
RenameTable
described in section B.2.2.35
to the name specified in parameter New-Table.
4.
For restoring the old tables, the delta
CreateViewForUnionTables
described in
section B.2.2.20 is used.
Parameters
•
The
First-Table
is one of the two tables that takes pat in the union. It is filled
automatically when selecting “Union Tables” in the context menu.
•The parameter Second-Table provides the second table for the union.
•Origin-Column
specifies the name of the column, which is introduced to describe
whether the row is from the first or second table.
•The parameter New-Table defines the name of the new table.
51
Precondition
There are preconditions that need to be fulfilled before executing the
Union Table Refactoring.
•The two tables need to have the same schema.
•The primary keys of the two tables need to be disjoint.
B.2.2 Deltas
In the following all available deltas are described. They represent an elementary change
of the database, which do not necessary have to preserve information or existing
behaviour. The formerly introduced refactorings are compositions of these deltas. A
delta can either change the physical database without affecting the virtual database
layer
58
, or it can only change the virtual database layer, or it can change the physical as
well as the virtual database layer. When a refactoring is selected in the GUI, a sequence
of respective deltas is created. Simultaneously these deltas apply their changes to the
virtual layer in order. The user is then able to chain other deltas to this sequence by
selecting other refactorings. Eventually the user is able to perform this chain: Every
delta produces its physical changes within a SQL-Script, which is applied to the database.
The following list provides an alphabetical index of the implemented deltas so far. The
description of a delta comprises the following pars:
1. The name of the delta,
2. A short description of the delta,
3. the parameters of the delta,
4. the changes to the physical database described by the created SQL,59 if any,
5. the changes to the virtual database layer, if any.
B.2.2.1 AbstractCreateInsteadOfDeleteTrigger
The delta creates an instead-of-
delete-trigger which distributes the deletes on a view to the underlying table.60
Parameter
Type Name Description
TableId61 oldTableId The TableId of the old view
TableId newTableId he TableId of the new table
58 See section 5.2
59
For the sake of simplicity we will use some abstractions in the given definition of the SQL code. The varying
part of the script is denoted by variable names in arrow brackets. These names can either be the name
of a parameter, or a derived information from parameter (dot-notation) or some information dynamically
computed by the respective delta.
60
Note that this descriptions refers to an abstract delta. In the prototype there are concrete implementations
of this delta for each of the major refactorings. The concrete implementations control the behaviour of the
trigger by providing a
declareDivision
,
beforeScriptDivision
,
whereClause
and
afterScriptDivision
as PL/SQL code.
61
Note that we do not provide Parameters directly in DELTA. We are using a
Supplier
-Interface instead. This
52
Database
1CREATE OR REPLACE TRIGGER <triggerName>
2/∗
3<triggerComment>
4∗/
5INSTEAD OF DELETE ON <oldTableId>
6REFERENCING OLD AS o
7FOR EACH ROW
8<dec l areDi v isio n>
9BEGIN
10 <beforeScriptDivision>
11 DELETE FROM <newTableId> WHERE <whereClause>;
12 <afterScriptDivision>
13 END;
Virtual Database Layer
•
Adds an instead-of-delete-trigger-object to the virtual database layer. The columns
of the old and new table are calculated by the
excludes
and their replacement
function.
B.2.2.2 AbstractCreateInsteadOfInsertTrigger
The delta creates an instead-of-
insert-trigger which distributes the inserts on a view to the underlying table.62
Parameter
Type Name Description
TableId oldTableId The TableId of the old view
TableId newTableId The TableId of the new table
ColumnIds excludes
ColumnIds which will be exclude from the normal logic.
Columns in this collection can be handled by overriding
the method getReplacementForExludedColumn.
Database
1CREATE OR REPLACE TRIGGER <triggerName>
2/∗
3<triggerComment>
4∗/
5INSTEAD OF INSERT ON <oldTableId>
6REFERENCING NEW AS n
allows to calculate the parameter lazily. The most simple supplier is based on a constant value. But also
more dynamic suppliers are possible. For example a supplier that provides a primary-key-id given the
table-id of the containing table or a supplier that provides the id of the referenced table given the id of a
foreign key.
62
Again this is an abstract delta. The concrete implementation has to provide a
declareDivision
, a
before-
ScriptDivision
, a
afterScriptDivision
, as well as the
columnsOfNewTable
and the
columnOfNewTable
,
which are computed based on the given excludes.
53
7FOR EACH ROW
8<dec l areDi v isio n>
9BEGIN
10 <beforeScriptDivision>
11 INSERT INTO <newTableId> (<columnsOfNewTable> )
12 VALUES (<columnsOfOldTable> ) ;
13 <afterScriptDivision>
14 END;
Virtual Database Layer
•Adds an instead-of-insert-trigger to the virtual database layer.
B.2.2.3 AbstractCreateInsteadOfUpdateTrigger
The delta creates an instead-of-
update-trigger which delegates an update on a view to the underlying table.63
Parameter
Type Name Description
TableId oldTableId TableId of the old table
TableId newTableId TableId of the new table
ColumnIds excludes ColumnIds which will be exclude from the update.
Database
1CREATE OR REPLACE TRIGGER <triggerName>
2/∗
3<triggerComment>
4∗/
5INSTEAD OF UPDATE ON <oldTableId>
6REFERENCING NEW AS nOLD AS o
7FOR EACH ROW
8<dec l areDi v isio n>
9BEGIN
10 <beforeScriptDivision>
11 UPDATE <newTableId> SET <updateScript> WHERE <whereClause>;
12 <afterScriptDivision>
13 END;
Virtual Database Layer
•Adds an instead-of-update-trigger to the virtual database layer.
B.2.2.4 AddNewColumn
The delta adds a column to a table in the database and in
the virtual database layer.
63
This is also an abstract delta. The concrete implementation has to provide the
updateScript
, the
where-
Clause, the beforeScriptDivision, the afterScriptDivision and the declareDivision.
54
Parameter
Type Name Description
ColumnId columnId The ColumnId with the information for the column to add.64
Database
1ALTER TABLE <columnI d . t ab le Id>
2ADD <colum nId . name> <colu mnId . column . dataTy pe> ;
Virtual Database Layer
1.
Adds the new column to the Column
ElementStore65
of the virtual database layer.
2.
Adds the new column to the columns of the destination table in the virtual database
layer.
B.2.2.5 AddNewForeignKeyConstraint
The delta adds a new foreign key constraint
from one column to an existing table.
Parameter
Type Name Description
ColumnId columnId
The column on which the foreign
key is created.
AbstractConstraintId referencedConstraint
The target constraint the foreign
key refers to, which could be a Pri-
maryKey or a UniqueConstraint.
ForeignKeyId foreignKeyName The id for the new foreign key.
Database
1ALTER TABLE <columnI d . t ab le Id> ADD CONSTRAINT <foreignKeyName>
2FOREIGN KEY (<columnId>)
3REFERENCES <referencedConstraint . table>
4(<ref er en ce dC on st r ai nt s . column> ) ;
Virtual Database Layer
1. Adds the foreign key to the virtual database layer.
2.
Creates a connection between the new foreign key and the target referenced
constraint, that the new foreign key refers to in the virtual database layer.
B.2.2.6 CopyColumnsFromConstraintForNewTable (Only virtual)
The delta copies
the columns of the given constraint, from where it is defined on, to another table and
64
Note that, in our model a column id is fully qualified, therefore the name of containing table can easily
determined.
65 See section B.1.2.1
55
renames the columns in the target table. The constraint itself is not copied to the target
table.
Parameter
Type Name Description
TableId newTableId
The target table into which the
columns are copied.
AbstractConstraintId constraintId
The constraint that defines the set
of the columns to be copied.
String fkColumnNamePrefix
The new prefix for the names of the
columns in the target table.
Virtual Database Layer
•The new columns are added to the element store of the virtual database layer.
B.2.2.7 CreateColumn (Only virtual)
The delta creates a new column in the virtual
database layer. The
DataType
for the new column must be present in the virtual database
layer prior to the execution of this delta, for example by executing section B.2.2.8 first.
Parameter
Type Name Description
ColumnId columnId The ColumnId with the information for the new column.
Boolean notNull Flag that indicates if the new column is nullable.
Virtual Database Layer
•Adds a new column, with the columnId to the virtual database layer.
B.2.2.8 CreateDatatypeByCopyFromColumn (Only virtual)
The delta creates a
copy of a data type of a specified column in the virtual database layer. This delta only
creates an association between a data type and a newly created column.
Parameter
Type Name Description
ColumnId newColumnId
The id of the new column for which the data type should
be created.
TableId oldTableId
The id of the table in which should be searched for the
column with the same name, as the newColumnId.
Virtual Database Layer
1.
Determines the column out of the virtual database layer the data type of which is
copied.
2.
Adds a copy of the data type of the template column to the
ElementStore
of the
virtual database layer.
56
B.2.2.9 CreateConstantDataType (Only virtual)
This delta adds an association
between the specified data type and the given column id to the virtual database layer.
Parameter
Type Name Description
DataType dataTypeToSet
The data type which should be add to the virtual
database layer.
ColumnId newColumnId
The id of the column which should get the new datatype.
Virtual Database Layer
1. Adds the new data type to the ElementStore of the virtual database layer.
2.
Assign the data type as the data type of the specified column in the virtual database
layer.
B.2.2.10 CreateForeignKey (Only virtual)
The delta creates a foreign key in the
virtual database layer.
Parameter
Type Name Description
ForeignKey foreignKey
The foreign key object which contains all information for
creating the foreign key.
Virtual Database Layer
•Adds the foreign key to the virtual database layer using the information specified
in parameter foreignKey.
B.2.2.11 CreateForeignKeyForConstraint (Only virtual)
The delta creates a for-
eign key using the information of a destination constraint. It also adds the columns in
the target table that are needed for the foreign key.
Parameter
Type Name Description
ForeignKeyId foreignKeyId
The id for the new foreign key.
TableId newTableId
The target table for the for-
eign key.
AbstractConstraintId constraintId
The destination constraint
66
the foreign key is referencing.
String foreignColumnNamePrefix
The new prefix name for the
columns of the foreign key.
Virtual Database Layer
1.
Creates the columns for the foreign key in the table specified in parameter
newTableId.
66 The destination contstraint could be a primary key or a unique key.
57
2.
Creates the foreign key on the created columns and adds it to the
ElementStore
of the virtual database layer.
B.2.2.12 CreateMappingTable The delta creates a mapping table for a foreign key
that is transcoded to another primary or unique key constraint. The mapping table
contains the primary key columns of the table containing the foreign key, the columns
of the old foreign key and the column of the new target constraint.
Parameter
Type Name Description
ForeignKeyConstraintId changedForeignKey
The Id of the Foreign Key which is
changed.
ConstraintId newTarget
The new target constraint. If the
new target is a primary key or
unique constraint, the foreign key
is transcoded directly to this con-
straint. If the new target is a for-
eign key constraint, the old for-
eign key is transcoded to the tar-
get constraint of this foreign key.
TableId mappingTableName
The identifier for the created map-
ping table.
Database
1CREATE TABLE <mappingTableName> (
2ID , OLD_<changedFor eignKey . column>,
3NEW_<changedF oreignKey . column> )
4CONSTRAINT <mappingTableName>_pk PRIMARY KEY ( I D ) ;
5
6INSERT INTO <mappingTableName> (
7<changedForeignKey . ta bl e . primaryKey . column>,
8OLD_<changedFo reignKey . column> ,
9NEW_<changedF oreignKey . column> )
10 SELECT A.<changedForeignKey . t ab le . p rimaryKe y .column> ,
11 A.<cha ngedFore inKey . column> ,
12 R.<newTarget . column>
13 FROM <change dForeignKey . ta bl e> A
14 LEFT OUTER JOIN <newTarget . t a bl e> R
15 ON A.<changedF oreignKey . column> =
16 R.<newTarget . ta bl e . primaryKey . column>
Virtual Database Layer
•Adds the new mapping table with its columns to the element store.
58
B.2.2.13 CreatePrimaryKey (Only virtual)
The delta creates a primary key in the
virtual database layer which consists of the specified set of columns.
Parameter
Type Name Description
PrimaryKeyId primaryKeyId The id of the new primary key.
ColumIds columns
The set of columns on which the primary key is de-
fined.
Virtual Database Layer
1. Adds the primary key to ElementStore of the virtual database layer.
2.
Assigns the columns specified in parameter columns to the primary key in the
virtual database layer.
B.2.2.14 CreateSequence
The delta creates a new sequence with the given name.
Parameter
Type Name Description
SequenceId sequenceIdToCreate Id for the new sequence
Database
1CREATE SEQUENCE <sequenceIdToCreate>;
Virtual Database Layer
•Adds the new sequence to the virtual database layer.
B.2.2.15 CreateTable
The delta creates a script that creates a table using the in-
formation, like primary key or column specifications, that are available in the virtual
database layer of the table. This means that all the information about a table has to be
created in the virtual database before the table itself gets created by this delta.
Parameter
Type Name Description
Table table The table which is created.
Database
1CREATE TABLE <tabl e . i d>
2(<ta bl e . columnsWith Type>)
3<ta ble . con strain ts>;
Virtual Database Layer
59
•
Adds the new table to the virtual database layer, which contains references to the
existing elements.67
B.2.2.16 CreateTableOnConstraints
The delta creates a table which based on the
given information, i. e. given constraints. It represents a shortcut for the Delta
Cre-
ateTable
from section section B.2.2.15 by skipping the preceding creation of the table
object in virtual database layer.
Parameter
Type Name Description
TableId tableId The id of the table.
PrimaryKeyId primaryKeyId The primary key for the new table.
UniqueConstraintIds uniqueConstraintIds
A collection of unique constraints that
are added to the table.
ForeignKeyIds foreignKeyIds
A collection of foreign keys that are
added to the table.
Database68
1CREATE TABLE <tableId> (<t ab e lI d . columns>)
2<primaryKey>
3<uniqueConstraints>
4<foreignK eys>;
Virtual Database Layer
•Adds the created table to the ElementStore of the virtual database layer.
B.2.2.17 CreateUniqueConstraint
The delta creates an unique constraint on a spec-
ified table.
Parameter
Type Name Description
UniqueConstraint uniqueConstraint
The uniqueConstraint object with the infor-
mation for creating.
Database
1ALTER TABLE <uniqueConstraint . tab le>
2ADD CONSTRAINT <uniqueConstraint>
3UNIQUE (<uniqueConstraint .column>)
Virtual Database Layer
67
The elements are created with a matching
tableId
for the new table. During the creation of the elements
the containing table does not exists yet.
68
Note that the information for
primaryKey
,
uniqueConstraints
and
foreignKeys
can easily determined in
the virtual database by a lookup with their respective ids.
60
•Adds the unique constraint to the element store in the virtual database layer.
B.2.2.18 CreateViewForMergeColumn
The delta creates a view with almost the
same schema as a given new table, instead of the value and type columns, it contains
two columns first and second. the values in the original value columns are distributed to
the first and second column as specified by the discriminator column type.
Parameter
Type Name Description
TableId viewName The name for the new view
TableId newTableId The table containing the value and type columns.
ColumnId first The first column that is restored.
ColumnId second The second column that is restored.
ColumnId value
The column that contains the values that are distributed to
first and second.
ColumnId type
The discriminator column that specifies the split into first
and second.
Database
1CREATE VIEW <viewName> (<oldColumns>)
2AS SELECT (
3<newTableColumnWithoutMergedColumn> ,
4(CASE WHEN <type> =<value> THEN <mergedColumn> ELSE NULL END)
5AS < f i r s t>,
6(CASE WHEN <type> =<value> THEN NULL ELSE <mergedColumn> END)
7AS <second>
8)
9FROM <newTableId>;
Virtual Database Layer
1. Creates the view in the virtual database layer.
2. Adds the columns of the view to the virtual database layer.
3. Assigns the created columns to the new view in the virtual database layer.
B.2.2.19 CreateViewForRenamedColumns
The delta creates a view with almost
the same schema as a given table in which the column names are changed as specified
in the parameter newViewMapping.
Parameter
61
Type Name Description
ViewId viewName The name for the new view
TableId newTableId
The table for which the view gets con-
structed.
Map<String, String> newViewMapping
A mapping from the column names in
the table to the column names in the
new view.
Database69
1CREATE VIEW <viewName> (<oldColumns>)
2AS SELECT <mappedTableNames> FROM <newTableId>;
Virtual Database Layer
1. Creates the view in the virtual database layer.
2. Adds the columns of the view to the virtual database layer.
3. Assigns the created columns to the new view in the virtual database layer.
B.2.2.20 CreateViewForUnionTables
The delta creates a view for a table with iden-
tical format. The view reduces the original table by selecting rows with the value
viewName in the typeColumn only.
Parameter
Type Name Description
ViewId viewName The name for the new view
TableId newTable The table for which the view is constructed.
ColumnId typeColumn
The column which provides the information for the reduc-
tion of the original table.
Database
1CREATE VIEW <viewName> (<newTable . columns>)AS SELECT <newTable . columns>
2FROM <newTable>
3WHERE <typeColumn> =<viewName>;
Virtual Database Layer
1. Creates the view in the virtual database layer.
2. Adds the columns of the view to the virtual database layer.
3. Assigns the created columns to the new view in the virtual database layer.
69
In the following code the column names of the new table are translated back to the old names by the use of
the given map newViewMapping.
62
B.2.2.21 CreateViewWithMappingTable
The delta creates a view for a table by
substituting the values of the transcoded foreign key columns with the values of the
given mapping table.
Parameter
Type Name Description
TableId viewName The name of the new created view.
TableId hostTable
The Id of the table for which this view is created.
ForeignKeyId transcodedForeign
The Id of the foreign key whose columns have
been transcoded.
TableId mappingTableId
The Id of the mapping table which is needed
to restore the old values of the transcoded
columns.
Database70
1CREATE VIEW <viewName> AS SELECT (<hostTableColumnSubstituted>)
2FROM <hostTable> A
3INNER JOIN <mappingTableId> M
4ON A.<ho stTable . pr imaryKey . column> =
5M.<hos tT abl e . p rimaryK ey . column>
Virtual Database Layer
1. Creates the view in the virutal database layer.
2.
Creates the columns of the new view, adds them to virtual database layer and
assigns them to the view.
B.2.2.22 CreateViewWithUnionTable
Within this delta a view is created that per-
forms the union of two given tables. This is only possible if the two given tables have
the same format.
Parameter
Type Name Description
TableId viewName The name of the created view.
TableId firstTableId The first table which should be union.
TableId secondTableId The second table which should be union.
Database
1CREATE VIEW <viewName> (<f i r s t Ta b l e I d . columns>)
2AS SELECT <f i r s t Ta b l e I d . columns> FROM <firstTableId>
3UNION SELECT <se cond Tabl eId . columns> FROM <secondTableId>;
70
The value for
hostTableColumnSubstituted
is calculated by taking all columns from the host table
A
but
replacing the column with the foreign-key-constraint
transcodedForeign
with the respective column from
the mapping table M.
63
Virtual Database Layer
1. Creates the view in the virtual database layer.
2.
Creates the columns of the new view, adds them to virtual database layer and
assigns them to view.
B.2.2.23 CreateCompleteTable
This delta creates a table with a single primary key,
columns, foreign keys and unique constraints as specified by corresponding parameters.
It is a composition of the following other deltas. The difference to the delta CreateTable,
see section B.2.2.15, is that all subordinated elements, like the primary key or columns,
are be also created.
•CreateConstantDataType defined in section B.2.2.9
•CreateColumn defined in section B.2.2.7
•CreatePrimaryKey defined in section B.2.2.13
•CreateForeignKey defined in section B.2.2.10
•CreateTable defined in section B.2.2.15
Parameter
Type Name Description
TableId newTableId The id of the new table.
PrimaryKey primaryKey The primary key of the new table.
Columns columns
A collection of columns for the new table.
UniqueConstraints uniqueConstraints
A collection of unique constraints which
will be applied to the new table.
ForeignKeys foreignKeys
A collection of foreign keys which will be
applied to the new table.
B.2.2.24 CreateCompleteViewForRenamedTable
The delta creates a view with
sub-schema of a given table with an different name. The delta allows the specification
of a set of columns that are excluded from the generated view. .
Parameter
Type Name Description
String oldTableID
The old table id, which contains the name of
the view.
TableId newTableId
The id of the table for which the view is cre-
ated.
Collection<ColumnId> excludes
A sub-set of the columns of the table identi-
fied by
newTableId
which are not added to
the schema of the generated view.
64
Database71
1CREATE VIEW <oldTableId> (<substitutedColumns>)
2AS SELECT <substitutedColumns> FROM <newTableId>;
Virtual Database Layer
1. Creates a view and adds it to the virtual database layer.
2.
Creates the columns, assigns it to the new view, and adds them to the virtual
database layer.
B.2.2.25 DropColumn This delta removes a single column from a table.
Parameter
Type Name Description
ColumnId columnID The id of the column which is dropped.
Database
1ALTER TABLE <colu mndId . t ab le> DROP COLUMN <columndId>;
Virtual Database Layer
1.
Removes references to the deleted column from the containing table in the virtual
database layer.
2.
Removes the deleted column from the element store of the virtual database layer.
B.2.2.26 DropConstraint This delta removes a constraint from a table.
Parameter
Type Name Description
ConstraintId constraintId The id of the constraint which is dropped.
Database
1ALTER TABLE <constra in t I d . table> DROP CONSTRAINT <constraintId>;
Virtual Database Layer
1.
Removes the reference to the deleted constraint from the containing table in the
virtual database layer.
2.
Removes the deleted constraint from the element store of the virtual database
layer.
71
In the following snippet, the value for
substitutedColumns>
is calculated by taking all columns of the new
table and removing the columns, which should be excluded.
65
B.2.2.27 FillColumnAndDiscriminatorByColumn (Only database)
Fills the data
column with the values in the source column and the discriminator column with a
constant value, whereby a row gets updated if the value of the right column is NULL.
Parameter
Type Name Description
ColumnId data The column which gets filled with the data.
ColumnId discriminator The column into which the constant value is written.
String constant
The constant value which is written into the discriminator
column.
ColumnId left
The column which provides the values that are written
into the data column.
ColumnId right
The column which determinates if a row is updated or not.
Database
1UPDATE <data . t ab le> SET <data> =<source . t able>.<source>,
2<discriminator> =<constant> WHERE <rig h t> IS NOT NULL;
B.2.2.28 FillColumnByColumn (Only database)
Sets the value of each row of a
column to the value of a provider column.
Parameter
Type Name Description
ColumnId columnToFill The column which is filled with the data.
ColumnId columnToCopyId
The column which provides the values that are written
into the columnToFill.
Database
1UPDATE <columnToFill . tab le>
2SET <columnToFill> =<columnToCopyId . t a bl e>.<columnToCopyId>;
B.2.2.29 FillColumnByConstant (Only database)
The delta updates all cells in a
provided column to a specified constant value.
Parameter
Type Name Description
ColumnId columnId The column which is updated.
String72 constant The value which is written into the column.
Database
1UPDATE <columndId . t ab le> SET <columnId> =<constant>;
72 We are using String here because every other database data type can be represented as a string.
66
B.2.2.30 FillColumnByFK (Only database)
The delta updates all cells of a column
by setting them to the value of a referenced column. The name of the given column and
the name of the column, which is referenced by the given constraint, have to be the
same.
Parameter
Type Name Description
ColumnId columnId The id of the column which shall be updated.
ForeignKeyId fkId
The id of the foreign key which references the column in
the target table with the rows in the source table.
Database
1UPDATE <columndId . t ab le> SET <columnId> =
2(SELECT <columnId> FROM <fkI d . referenced . t able> WHERE
3<f k I d . column> =<f k I d . r ef ere nc ed . column>;
B.2.2.31 FillColumnByFunction (Only database)
The delta fills each cell in a col-
umn with the result of a function that is applied to the values in the row that contains
the cell.
Parameter
Type Name Description
ColumnId columnToFill
The id of the column whose cells are up-
dated.
FunctionId functionToUse
The id of the function which generates the
new values.
Collection<ColumnId> columnsToUse
The columns which provide the input pa-
rameters for the function.
Database
1UPDATE <columnId . ta bl e> SET <columnId> =
2<functionToUse> (<columnsToUse> ) ;
B.2.2.32 FillColumnBySequence
The delta updates each cell in a column with a
provided value of an existing sequencer.
Parameter
Type Name Description
ColumnId columnId The column which is updated.
SequenceId sequenceId The sequencer which is used for the new values.
Database
1UPDATE <columnId . ta bl e> SET <columnId> =<sequenceId>.NEXTVAL;
67
B.2.2.33 AbstractInsertIntoTable (Only database)
This delta represents the ab-
stract supertype to all delta, which insert new values in a given table. Concrete imple-
mentations can determine the inserted values by specifying a
SELECT
-Statement for the
valueDetermination
-part, which can either be a call to a function or select from another
table.
Parameter
Type Name Description
TableId toInsert The id of table where the inserting happens.
ColumnIds filledColumns The columns of the table, which are filled.
Database
1INSERT INTO <toInsert> (<filledColumns>)
2SELECT <valueDetermination>;
B.2.2.34 RenameColumn This delta renames a column of a table.
Parameter
Type Name Description
ColumnId oldColumnId The id of the column which is renamed.
ColumnId newColumnId The new id of the column.
Database
1ALTER TABLE <oldColumnI d . ta bl e> RENAME COLUMN <oldColumnId> TO <newColumnId>;
Virtual Database Layer
1.
Removes the changed table and the assigned columns from the virtual database
layer.
2.
Adds the changed table, the renamed column and all other not renamed columns
to the database layer.
B.2.2.35 RenameTable This delta changes the name of a table.
Parameter
Type Name Description
TableId oldTableId The table id of the table which will be renamed.
TableId newTableId The table id with the new name.
Database
1ALTER TABLE <oldTableId> RENAME TO <newTableId>;
Virtual Database Layer
68
1.
Removes the table with the old name from the virtual database layer and adds a
new table with the renamed name.
2.
Deletes all columns and constraints assigned to the table with the old name and
recreates them assigned to the table with the new name.
B.2.2.36 ReturnTypeOfFunction (Only virtual)
Uses the return data type of a
function as the new data type of a specified column in the virtual database layer.
Parameter
Type Name Description
FunctionId functionId The id of the function for which the return is set.
ColumnId newColumnId
The id of the column from which the data type will be
taken.
Virtual Database Layer
1.
Assign the return type of the specified function as the new data type for a column
in the virtual database layer.
B.2.2.37 UnionTables
The delta performs the union of two tables with same schema.
The first table is updated to the union, the second table is deleted.
Parameter
Type Name Description
TableId firstTable
The id of the first table. The data from the second table will
be inserted into this table.
TableId secondTable
The id of the second table, where the data to be inserted
stems from.
Database73
1ALTER <fi r s t Ta b l e> DROP CONSTRAINT <foreignKeyReferencingFirstTable>;
2ALTER <secondTable> DROP CONSTRAINT <foreignKeyReferencingSecondTable>;
3
4INSERT INTO <f ir s t Ta b l e> (<f i r st T a b l e . columns>)
5SELECT <secondTable>. columns
6FROM <secondTabel>;
7
8DROP TABLE <secondTable>;
9
10 ALTER <fi r s t Ta b l e> ADD CONSTRAINT <foreignKeyReferencingFirstTable>;
11 ALTER <fi r s t Ta b l e> ADD CONSTRAINT <foreignKeyReferencingSeconTable>;
Virtual Database Layer
73
Note that, as there are probably multiple foreign-key consraints referencing the affected tables, the drop
operation has to be performed multiple times also.
69
•Removes the secondTable from the virtual database layer.
•
Removes the foreign-key constraints to the second table from the virtual database
layer.
•
Adds the new foreign-key constraints to the first table in the virtual database layer.
B.2.2.38 UpdateColumnWithMappingTable
The delta updates the specified col-
umn of a table by a mapping table: Each old value in the column is exchanged by the
new values taken from the mapping table.
Parameter
Type Name Description
ColumnId columnId The id of the column whose values are exchanged.
TableId mappingTableId
The id of the mapping table, which provides the new
column content.
Database
1UPDATE <columnId . ta bl e> SET <columndI> =
2(SELECT NEW_<columndId> FROM <mappingTableId> MWHERE
3<columndId . ta bl e . prim aryKey . column> = M. I D ) ;
Virtual Database Layer
•
The virtual database schema is not affected because it does not have any informa-
tion about the values in the database.
B.3 Open improvements in DELTA
This section gives an overview over open issues in the tool current implementation and
possible improvements.
•
To extend the field of application of DELTA the tool must support more Database
Management Systems like MySQL, PostgreSQL, SQL Server or DB2
•
The graphical user interface needs improvements to enhance usability. For ex-
ample, every refactoring must match an already existing element in the virtual
database layer. It should be selectable from a suitable drop-down-list. For the time
being, some refactorings require the name of an existing element which is not very
comfortable and error-prone.
•
The tool shall provide useful and standardised suggestions for the names of new
elements.
•
To assist the user, the graphical user interface should validate the given parameters
early. For example the name for a new table must not be used in the database
already, the length of given parameters has to be checked and data types of
70
columns or functions must match expected. The preconditions of a refactoring
listed in section B.2.1 sometimes require an in-depth analysis of all records in the
respective tables. For the sake of simplicity DELTA currently only supports the
check of the preconditions on a structural layer. In a future release, a mechanism
for checks on the record layer have to be implemented.
•
The implemented virtual database layer provides the ground for many useful
functions that are not implemented yet. These include:
–An Undo or Redo function
–A reordering of the refactorings to improve for example the performance.
•
During execution of refactorings against the database, the error handling has to
be improved. In case of an error, this shall enable a fast detection of the failing
refactoring. Currently, only the error message is displayed without any further
information.
•
The tool shall be extended by an export function for SQL scripts for the selected
refactoring. Only direct application is supported in the current version.
•
For the developer, who plans a huge refactoring, the tool sall provide a comfortable
way to save and re-use functions for refactoring and refactoring fragments.
•Move Column
does not support the move of constraints, for example foreign key
constraints. This shall be possible if the constraint contains only one column. The
values of the this column can be moved now, but additionally the constraint has to
be moved, which is not implemented yet. If you want to move a constraint which
contains more than one column the move column refactoring has to be expanded
such that it is possible to move multiple columns at once.
•
Automated tests for generated triggers shall be implemented. An application
accessing the old schema via views shall be tested in an integration test to simulate
legacy application behaviour.
•
The final tool shall not be restricted to refactorings only. It shall support the
complete database change process. This can be achieved by different modes, like
a mode for extending the database, in which not only refactorings are available.
•
The final tool shall create its artifacts in the context of a new schema.
74
With this
feature objects in the old schema can have the same names as objects in the new
schema. For the time being, this is not possible such that a lot of renaming and
name-engineering is required.
74 A new database user in Oracle.
71
ISSN 1863-7043