Home How to generate a date range + count earlier dates from another table in PostgreSQL?
Reply: 6

How to generate a date range + count earlier dates from another table in PostgreSQL?

Matic Jurglič
1#
Matic Jurglič Published in 2017-09-13 21:54:42Z

I have the following table:

links:

created_at           active 
2017-08-12 15:46:01  false
2017-08-13 15:46:01  true
2017-08-14 15:46:01  true
2017-08-15 15:46:01  false

When given a date range, I have to extract time series which tells me how many active links were created on a date equal or smaller than current (rolling) date.

Output (for date range 2017-08-12 - 2017-08-17):

day          count
2017-08-12   0 (there are 0 active links created on 2017-08-12 and earlier)
2017-08-13   1 (there is 1 active link created on 2017-08-13 and earlier)
2017-08-14   2 (there are 2 active links created on 2017-08-14 and earlier)
2017-08-15   2 ...
2017-08-16   2
2017-08-17   2

I came up with the following query for generating dates:

SELECT date_trunc('day', dd):: date
FROM generate_series
    ( '2017-08-12'::timestamp 
    , '2017-08-17'::timestamp
    , '1 day'::interval) dd

But the rolling counts confuse me and am unsure how to continue. Can this be solved with a window function?

Erwin Brandstetter
2#
Erwin Brandstetter Reply to 2017-09-17 01:26:54Z

This should be fastest:

SELECT day::date
     , sum(ct) OVER (ORDER BY day) AS count
FROM   generate_series (timestamp '2017-08-12'
                      , timestamp '2017-08-17'
                      , interval  '1 day') day
LEFT   JOIN  (
   SELECT date_trunc('day', created_at) AS day, count(*) AS ct
   FROM   tbl
   WHERE  active -- fastest
   GROUP  BY 1
   ) t USING (day)
ORDER  BY 1;

dbfiddle here

count() only counts non-null rows, so you could use count(active OR NULL). But the fastest option for counting is to exclude irrelevant rows with a WHERE clause to begin with. Since we are adding all days with generate_series() anyway, this is the best option.

Compare:

  • For absolute performance, is SUM faster or COUNT?

Since generate_series() returns timestamp (not date) I use date_trunc() to get matching timestamps (very slightly faster).

Gordon Linoff
3#
Gordon Linoff Reply to 2017-09-13 21:57:04Z

I would just use aggregation and cumulative sums -- assuming you have at least one per day:

select date_trunc('day', created_at)::date as created_date,
       sum(active::int) as actives,
       sum(sum(active::int)) over (date_trunc('day', created_at)) as running_actives
from t
group by created_date;

You only need to generate the dates if you have holes in the data. If you do, though, I would recommend including where active -- you can include it now, I just want to be sure there are no holes.

shA.t
4#
shA.t Reply to 2017-09-16 09:42:16Z

I think a query like this can help you:

;with t as (SELECT date_trunc('day', dd):: date
FROM generate_series
    ( '2017-08-12'::timestamp 
    , '2017-08-17'::timestamp
    , '1 day'::interval) dd
)
select distinct t.date_trunc
  , count(case when links.active = 'true' then 1 end) over (order by links.created_at) count
from t
left join links
on t.date_trunc = cast(links.created_at as date)
order by t.date_trunc;

SQL Fiddle Demo

peufeu
5#
peufeu Reply to 2017-09-16 12:16:01Z

If you have missing days in your table, you'll need to use a generate_series() to create them. Since this is basically putting together the two previous answers, credit is given ;;)

However, this join is better done after the GROUP BY, which will only return one row per day, instead of before, which would result in a larger JOIN.

WITH dailydata AS (
  SELECT 
    d::DATE, COALESCE(n,0) n
  FROM
    generate_series( 
      '2000-01-01'::DATE, 
      '2000-10-01'::DATE,
      '1 DAY'::INTERVAL ) d
  LEFT JOIN
    (SELECT created_at::DATE d, count(*) AS n
    FROM links WHERE active
    GROUP BY d) data
    USING (d)
)
SELECT d, n, sum(n) OVER (ORDER BY d) FROM dailydata;
wildplasser
6#
wildplasser Reply to 2017-09-16 13:01:06Z
CREATE TABLE links
        ( created_at           timestamp
        , active boolean
        );
INSERT INTO links(created_at,active)VALUES
 ('2017-08-12 15:46:01', false)
,('2017-08-13 15:46:01', true)
,('2017-08-14 15:46:01', true)
,('2017-08-15 15:46:01', false)
        ;

WITH cal AS (
        select gs AS deet
        FROM generate_series('2017-08-11'::date,'2017-08-16'::date, '1day'::interval)gs
        )
SELECT cal.deet
        , SUM(1) FILTER (WHERE l.active =True) OVER(ORDER BY l.created_at) AS cumsum
FROM cal
LEFT JOIN links l ON date_trunc('days', l.created_at)= cal.deet
ORDER BY created_at
        ;
Steve Chambers
7#
Steve Chambers Reply to 2017-09-16 13:52:03Z

Demo

http://rextester.com/OGZV44492

SQL

SELECT date_trunc('day', dd):: date AS day,
       (SELECT COUNT(*) FROM links
        WHERE active = true
          AND date(created_at) <= date_trunc('day', dd)) AS "count"
FROM generate_series
    ( '2017-08-12'::timestamp 
    , '2017-08-17'::timestamp
    , '1 day'::interval) dd

Explanation

The SQL above does a simple subselect to count the number of rows in the links table whose date part is less than or equal to each date in the generated range.

You need to login account before you can post.

About| Privacy statement| Terms of Service| Advertising| Contact us| Help| Sitemap|
Processed in 0.389246 second(s) , Gzip On .

© 2016 Powered by cudou.com design MATCHINFO