Quantcast
Channel: OracleDB – The Wiert Corner – irregular stream of stuff
Viewing all articles
Browse latest Browse all 20

Generating a million sequential numbers on the fly in a Firebird query – some solutions and speed measurements

$
0
0

The testing was done with Firebird 2.5.x x86 on Windows 7 x64.

Where other relational database platforms have plenty of opportunities to generate sequences (see for instance the below links on Oracle and SQL Server), with Firebird you can use a WITH RECURSIVE query construct that normally is being used to manage tree structures ([WayBackPkLab – Firebird: Tree data mangement with recursive CTE).

However, that uses query stack which has a depth limit of 1024 levels. When you reach the limit, Firebird gives you an error like this:

with 
  recursive 
  sequence(n) as (        
    -- When you select more than 1024 values, this error occurs:
    -- Error while fetching data:  Too many concurrent executions of the same request    
    select 0 -- start
    from rdb$database
    union all
    select sequence.n + 1
    from sequence
    where sequence.n < 1023 -- finish
  )
select sequence.n 
from sequence
--where sequence.n in (24, 38) 
order by sequence.n

It however is a pretty quick and CU bound solution: on my system ([WayBackAMD A8-7600 @ 3.1 Ghz), it runs 1000 records within ~0.1 seconds.

In such a short time, it’s hard to see how the speed is bound, so I wanted to go for some orders of magnitude more. In ~0.1 seconds, the processor executes about 0.3 * 10^9 cycles generating 1000 numbers which is ~ 300-thousand cycles per number. That sounds like a lot of cycles for so few numbers. Would this become a better ratio for more numbers?

So I dug a bit deeper to see if I could generate 1 million numbers sequential numbers. It turns out there are various ways to combine various common table expressions (CTE). Firebird 2.1 added the CTE capability [WayBack] Common Table Expressions (“WITH … AS … SELECT”); Available in: DSQL, PSQL; Added in: 2.1 and it’s pretty cool.

The basic idea of getting many numbers is to make a CROSS JOIN between queries A and B so you get COUNT(A.) times COUNT(B.) records then narrow it down to the actual amount of numbers you want.

Do not confuse the CROSS and FULL OUTER versions of JOIN: [WayBackSQL Server: What is the difference between CROSS JOIN and FULL OUTER JOIN? – Stack Overflow.

[WayBackSELECT supports CROSS JOIN as of Firebird 2.0

Usually, a CROSS JOIN is semantically equivalent a JOIN without an ON clause [WayBackSQL Cross Join – w3resource so you get the [WayBackCartesian product – Wikipedia. But not always: [WayBacksql – What is the difference between using a cross join and putting a comma between the two tables? – Stack Overflow, so be careful.

Below are some tries that vary in:

  1. how to get the base queries before they get assembled into the final cartesian product
  2. in how many steps to assemble the cartesian products

A complicating factor is Firebird (nor InterBase) do allow you to select values without a table or view name. For that, most people query literal values from the rdb$database table and you can even use it to call functions exactly one time:

The rdb$database table is called a “Singleton Table. In relational terms it is a relation with primary key ∅ (the empty set).”

Note it is different from the [Ardhive.is] LÖNE singleton dining table IKEA April 1st prank picture at the top of the post.

Other relational databases systems have them too:

  • DUAL (Oracle),
  • SYSIBM.SYSDUMMY1 (DB2, Derby)
  • RDB$DATABASE (Interbase, Firebird).

Source: [WayBackoracle – Term for a one-row table? – Database Administrators Stack Exchange

A notable exception is SQL Server, so people make their own singleton table: [WayBackScriptling: Creating a Singleton table-row

Before digging into the million numbers, a small example on how to generate a range of dates from just 1000 numbers:

with
  recursive
  tripledigits(n) as (
    select 0 -- start
    from rdb$database
    union all
    select tripledigits.n + 1
    from tripledigits
    where tripledigits.n < 999 -- finish
  ) 
select date 'now' - tripledigits.n 
from tripledigits 
where 1=1
  and tripledigits.n >= 100 
  and tripledigits.n < 105 
order by tripledigits.n

I ran this a while ago, with these results: 5 dates from and including a 100 to but not including 105 days ago.

SUBTRACT
----------
6-1-2017
5-1-2017
4-1-2017
3-1-2017
2-1-2017 

Getting a million numbers

Here is what I came up with:

with 
  recursive
  tripledigits(n) as (        
    select 0 -- start
    from rdb$database
    union all
    select tripledigits.n + 1
    from tripledigits
    where tripledigits.n < 999 -- finish
  ),
  -- prepared in 0.043-0.078 seconds, processed in 3.705-5.023 seconds - rows fetched: 250
  sextupledigits(n) as (
    select tripledigitsBare.n 
          + tripledigits000.n * 1000
    from tripledigits tripledigitsBare
    cross join tripledigits tripledigits000
    order by tripledigitsBare.n 
          + tripledigits000.n * 1000
  )
select sextupledigits.n 
from sextupledigits
--where sextupledigits.n in (24, 38) 
order by sextupledigits.n
/*
PLAN SORT (SORT (JOIN ((SEXTUPLEDIGITS TRIPLEDIGITS RDB$DATABASE NATURAL)
PLAN JOIN ()(SEXTUPLEDIGITS TRIPLEDIGITS RDB$DATABASE NATURAL)
*/

The journey there

My start query was this:

with 
  digits(n) as ( -- for sextupledigits, this takes ~35-45 seconds
          select 0 from rdb$database             
    union select 1 from rdb$database
    union select 2 from rdb$database
    union select 3 from rdb$database
    union select 4 from rdb$database
    union select 5 from rdb$database
    union select 6 from rdb$database
    union select 7 from rdb$database
    union select 8 from rdb$database
    union select 9 from rdb$database
  ),
  tens(n) as (
    select 10*digits.n
    from digits
  ),
  hundreds(n) as (
    select 100*digits.n
    from digits
  ),
  tripledigits(n) as (
    select digits.n 
          + tens.n
          + hundreds.n
    from digits
    cross join tens
    cross join hundreds
    order by digits.n 
          + tens.n
          + hundreds.n
  ), 
  -- prepared in 0.065-0.0368 seconds, processed in 34.241-53.835 seconds - rows fetched: 250
  sextupledigits(n) as (
    select tripledigits.n 
          + tripledigits1000.n * 1000
    from tripledigits
    cross join tripledigits as tripledigits1000
    order by tripledigits.n 
          + tripledigits1000.n * 1000
  )
select sextupledigits.n 
from sextupledigits
--where sextupledigits.n in (24, 38) 
order by sextupledigits.n 

/*
PLAN SORT (SORT (JOIN (SORT (JOIN ((SEXTUPLEDIGITS TRIPLEDIGITS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS DIGITS RDB$DATABASE NATURAL)(SEXTUPLEDIGITS TRIPLEDIGITS TENS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS TENS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS TENS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS TENS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS TENS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS TENS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS TENS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS TENS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS TENS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS TENS DIGITS RDB$DATABASE NATURAL)(SEXTUPLEDIGITS TRIPLEDIGITS HUNDREDS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS HUNDREDS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS HUNDREDS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS HUNDREDS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS HUNDREDS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS HUNDREDS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS HUNDREDS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS HUNDREDS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS HUNDREDS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS HUNDREDS DIGITS RDB$DATABASE NATURAL), SORT (JOIN ((SEXTUPLEDIGITS TRIPLEDIGITS1000 DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 DIGITS RDB$DATABASE NATURAL)(SEXTUPLEDIGITS TRIPLEDIGITS1000 TENS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 TENS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 TENS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 TENS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 TENS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 TENS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 TENS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 TENS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 TENS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 TENS DIGITS RDB$DATABASE NATURAL)(SEXTUPLEDIGITS TRIPLEDIGITS1000 HUNDREDS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 HUNDREDS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 HUNDREDS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 HUNDREDS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 HUNDREDS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 HUNDREDS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 HUNDREDS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 HUNDREDS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 HUNDREDS DIGITS RDB$DATABASE NATURAL)
PLAN (SEXTUPLEDIGITS TRIPLEDIGITS1000 HUNDREDS DIGITS RDB$DATABASE NATURAL))))))))
*/

Then I started optimizing the digits(n) part with this recursive solution:

with
  recursive 
  digits(n) as ( -- for sextupledigits, this takes ~10-16 seconds
    select 0 -- start
    from rdb$database
    union all
    select digits.n + 1
    from digits
    where digits.n < 9 -- finish
  ),

This used 60% less time because the query PLAN became much simpler:

PLAN SORT (SORT (JOIN (SORT (JOIN ((SEXTUPLEDIGITS TRIPLEDIGITS DIGITS RDB$DATABASE NATURAL)
PLAN JOIN ()(SEXTUPLEDIGITS TRIPLEDIGITS TENS DIGITS RDB$DATABASE NATURAL)
PLAN JOIN ()(SEXTUPLEDIGITS TRIPLEDIGITS HUNDREDS DIGITS RDB$DATABASE NATURAL)
PLAN JOIN (), SORT (JOIN ((SEXTUPLEDIGITS TRIPLEDIGITS1000 DIGITS RDB$DATABASE NATURAL)
PLAN JOIN ()(SEXTUPLEDIGITS TRIPLEDIGITS1000 TENS DIGITS RDB$DATABASE NATURAL)
PLAN JOIN ()(SEXTUPLEDIGITS TRIPLEDIGITS1000 HUNDREDS DIGITS RDB$DATABASE NATURAL)
PLAN JOIN ())))))))

So I started to focus on making the PLAN simpler step by step.

So I moved away from digits, tens, hundreds, tripledigits and sextupledigits by doubledigits, doubledigits00, quadrupledigits and sextupledigits:

with 
  recursive
  doubledigits(n) as (
    select 0 -- start
    from rdb$database
    union all
    select doubledigits.n + 1
    from doubledigits
    where doubledigits.n < 99 -- finish
  ),
  doubledigits00(n) as (
    select doubledigits.n * 100
    from doubledigits
  ),
  quadrupledigits(n) as (
    select doubledigits.n 
          + doubledigits00.n
    from doubledigits
    cross join doubledigits00
    order by doubledigits.n 
          + doubledigits00.n
  ), 
  -- prepared in 0.061-0.078 seconds, processed in 6.493-8.381 seconds - rows fetched: 250
  sextupledigits(n) as (
    select doubledigits.n 
          + quadrupledigits.n * 100
    from doubledigits
    cross join quadrupledigits
    order by doubledigits.n 
          + quadrupledigits.n * 100
  )

This reduced the PLAN from the previous 7-JOINone into a 4-JOIN one:

PLAN SORT (SORT (JOIN ((SEXTUPLEDIGITS DOUBLEDIGITS RDB$DATABASE NATURAL)
PLAN JOIN (), SORT (JOIN ((SEXTUPLEDIGITS QUADRUPLEDIGITS DOUBLEDIGITS RDB$DATABASE NATURAL)
PLAN JOIN ()(SEXTUPLEDIGITS QUADRUPLEDIGITS DOUBLEDIGITS00 DOUBLEDIGITS RDB$DATABASE NATURAL)
PLAN JOIN ())))))

Though lessening the number of joins makes the query simpler to parse, the growing number of results in each query could increase complexity. I wondered when this would stop, so first tried to lessen the JOIN complexity even further:

 with 
  recursive
  tripledigits(n) as (        
    -- When you select more than 1024, this error occurs:
    -- Error while fetching data:  Too many concurrent executions of the same request    
    select 0 -- start
    from rdb$database
    union all
    select tripledigits.n + 1
    from tripledigits
    where tripledigits.n < 999 -- finish
  ),
  tripledigits000(n) as (
    select tripledigits.n * 1000
    from tripledigits
  ),
  -- prepared in 0.0055-0.126 seconds, processed in 3.617-4.575 seconds - rows fetched: 250
  sextupledigits(n) as (
    select tripledigits.n 
          + tripledigits000.n
    from tripledigits
    cross join tripledigits000
    order by tripledigits.n 
          + tripledigits000.n
  )

The reduction from doubledigits, doubledigits00, quadrupledigits and sextupledigits to tripledigits, tripledigits000 and sextupledigits both decreased the processing time and the PLAN size:

PLAN SORT (SORT (JOIN ((SEXTUPLEDIGITS TRIPLEDIGITS RDB$DATABASE NATURAL)
PLAN JOIN ()(SEXTUPLEDIGITS TRIPLEDIGITS000 TRIPLEDIGITS RDB$DATABASE NATURAL)
PLAN JOIN ())))

The next step was cutting the JOIN complexity by one third by going from tripledigits, tripledigits000 and sextupledigits to just tripledigits and sextupledigits:

with 
  recursive
  tripledigits(n) as (        
    select 0 -- start
    from rdb$database
    union all
    select tripledigits.n + 1
    from tripledigits
    where tripledigits.n < 999 -- finish
  ),
  -- prepared in 0.043-0.078 seconds, processed in 3.705-5.023 seconds - rows fetched: 250
  sextupledigits(n) as (
    select tripledigitsBare.n 
          + tripledigits000.n * 1000
    from tripledigits tripledigitsBare
    cross join tripledigits tripledigits000
    order by tripledigitsBare.n 
          + tripledigits000.n * 1000
  )

What you see is that it doesn’t get faster, actually it can even become a tiny bit slower, this despite the smaller PLAN:

PLAN SORT (SORT (JOIN ((SEXTUPLEDIGITS TRIPLEDIGITS RDB$DATABASE NATURAL)
PLAN JOIN ()(SEXTUPLEDIGITS TRIPLEDIGITS RDB$DATABASE NATURAL)

Growing this to 10 million increases the query time 20 fold, not 10-fold:

-- via :
-- - https://stackoverflow.com/questions/39751643/fastest-way-to-generate-range-of-numbers
-- - https://stackoverflow.com/questions/42750834/how-to-create-range-from-1-to-100-in-firebird-select-statement
-- - https://firebirdsql.org/refdocs/langrefupd21-select.html 

with 
  recursive 
  digits(n) as (
    select 0 -- start
    from rdb$database
    union all
    select digits.n + 1
    from digits
    where digits.n < 9 -- finish
  ),
  tripledigits(n) as (        
    select 0 -- start
    from rdb$database
    union all
    select tripledigits.n + 1
    from tripledigits
    where tripledigits.n < 999 -- finish
  ),
  -- prepared in 0.116-2.226 seconds, processed in 75.695-99.443 seconds - rows fetched: 250
  septupledigits(n) as (
    select tripledigitsBare.n 
          + tripledigits000.n * 1000
          + digits.n * 1000000
    from tripledigits tripledigitsBare
    cross join tripledigits tripledigits000
    cross join digits
    order by tripledigitsBare.n 
          + tripledigits000.n * 1000
          + digits.n * 1000000
  )
select septupledigits.n 
from septupledigits
--where septupledigits.n in (24, 38) 
order by septupledigits.n

This is probably because PLAN shows a JOIN complexity that is twice as large:

PLAN SORT (SORT (JOIN ((SEPTUPLEDIGITS TRIPLEDIGITS RDB$DATABASE NATURAL)
PLAN JOIN ()(SEPTUPLEDIGITS TRIPLEDIGITS RDB$DATABASE NATURAL)
PLAN JOIN ()(SEPTUPLEDIGITS DIGITS RDB$DATABASE NATURAL)
PLAN JOIN ())))

 

 

What fails

Initially when trying the fastest solution, I came up with the below query. But it fails.

I’m not sure why this fails with Dynamic SQL Error SQL error code = -204 alias TRIPLEDIGITS conflicts with an alias in the same statement:

with recursive 
  tripledigits(n) as (        
    select 0 -- start
    from rdb$database
    union all
    select tripledigits.n + 1
    from tripledigits
    where tripledigits.n < 999 -- finish
  ),
  sextupledigits(n) as (
    select tripledigits.n 
          + tripledigits000.n * 1000
    from tripledigits
    cross join tripledigits tripledigits000 -- causes "Dynamic SQL Error SQL error code = -204 alias TRIPLEDIGITS conflicts with an alias in the same statement"
    order by tripledigits.n 
          + tripledigits000.n * 1000
  )
select sextupledigits.n 
from sextupledigits
order by sextupledigits.n

As I could not find anything on the cause, for instance not with “conflicts with an alias in the same statement” CTE so I’ve reported it in the Firebird tracker as it happens with a simplified query as well:

with 
  recursive 
  tripledigits(n) as (        
    select 0
    from rdb$database
    union all
    select tripledigits.n + 1
    from tripledigits
    where tripledigits.n < 999
  )
select tripledigits.n 
      + tripledigits000.n * 1000
from tripledigits
cross join tripledigits tripledigits000 -- causes "Dynamic SQL Error SQL error code = -204 alias TRIPLEDIGITS conflicts with an alias in the same statement"
order by tripledigits.n 
      + tripledigits000.n * 1000

Report: [WayBack[#CORE-5519] WI-V2.5.7.27050 generates “alias X conflicts with an alias in the same statement” on using both non-aliased and aliased CTE – Firebird RDBMS Issue Tracker

Other platforms

Repository: https://gitlab.com/wiert.me/examples/sql-examples/tree/master/sequences

–jeroen

References:

Comment on G+ https://plus.google.com/+JeroenPluimers/posts/5UtE9DaHQAv

I’m not 100% sure of what you want to accomplish. But would not GTTs or CONTEXT variables be a better solution?


Viewing all articles
Browse latest Browse all 20

Trending Articles