Hello good people of stackoverflow!
I need help with writing a query, to identify online time grouped by users, and also to identify which are the periods where there were 0 users online and for how long.
Note that I have only read access for this database.
The challenge I am facing right now, is that there can be simultaneous sessions recorded for a user.
For example within a day’s observation period(there is also a primary key of log_id that follows the chronological order of login_ts for every new session started):
User | login_ts | logout_ts |
---|---|---|
Kadaj | 2023-05-01 08:00:00 | 2023-05-01 13:00:00 |
Kadaj | 2023-05-01 10:00:00 | 2023-05-01 12:00:00 |
Kadaj | 2023-05-01 11:00:00 | 2023-05-01 15:00:00 |
Kadaj | 2023-05-01 16:00:00 | 2023-05-01 18:00:00 |
Challenge 1 – Total Online Time in Day
Observation period: ‘2023-05-01 00:00:00’ to ‘2023-05-01 23:59:59’ or ‘2023-06-01 00:00:00’
If only 1 session is allowed per user I could have gotten the sum of the duration of each session:
User | login_ts | logout_ts | duration |
---|---|---|---|
Kadaj | 2023-05-01 08:00:00 | 2023-05-01 13:00:00 | 5 |
Kadaj | 2023-05-01 10:00:00 | 2023-05-01 12:00:00 | 2 |
Kadaj | 2023-05-01 11:00:00 | 2023-05-01 15:00:00 | 3 |
Kadaj | 2023-05-01 16:00:00 | 2023-05-01 18:00:00 | 2 |
Which will make up of 12 hours, but because the sessions overlap each other the actual online time is: 8am to 3pm + 4pm to 6pm = 9 hours
Expected Results
User | duration |
---|---|
Kadaj | 9 |
Yazoo | 10 |
Loz | 3 |
Challenge 2 – No Users Online Time
Assuming I want to find out for 2023-05-01(Observation period same as above), what were the periods where there were no users online:
Expected results
no_user_start_time | no_user_end_time | duration |
---|---|---|
2023-05-01 00:00:00 | 2023-05-01 08:00:00 | 8 |
2023-05-01 15:00:00 | 2023-05-01 16:00:00 | 1 |
2023-05-01 18:00:00 | 2023-05-01 23:59:59 (or 2023-06-01 00:00:00) | 6 |
The examples I have given is to simplify the obstacle I am trying to cross, I have used a series of sorting with LAG, LEAD, GENERATE_SERIES, PARTITION functions but I keep hitting roadblocks and cannot come up with the logic to achieve my results(depending if a second/third session wraps around the original or if it overlaps at the head or tail of the session, lag and lead can produce very different results).
The table is much much larger than my example with more users and also have concurrent login sessions with overlapping timestamps.
I have something that works for Challenge 1, just looking for a second opinion to improve my query.
I need help with Challenge 2 where I am not sure how I can approach the observation period where there are no active login sessions.
All second opinions are greatly appreciated, please enlighten me on your thought process please 😀