Last Updated: 2021-08-08
Data: NSW COVID-19 tests data from two different sources
In this post, I am putting together two sources of data on Covid-19 testing in NSW. To clarify, let me use an example:
Imagine that 1,000 tests are done today in NSW.
There is a dataset released by Data.NSW has one record per test and it gives you some information about the date of the test and the postcode/LGA where the test was taken (not where the person lives, but where the test was done). So, that dataset, for today, would have 1,000 rows and what I do is to count the number of those rows and then report 1,000 tests for today. You can access the data here. One key challenge with this data is that it is released with a one-day lag and also the last observation is usually incomplete. This means when it is released today, your last reliable data point is for three days ago.
I call this trend “Tests Done”.
Next, from those 1,000 tests done today, 700 of them may be processed by 8pm today and the person who was tested is notified. That number, 700, is then reported by NSW Health or Premier, as the number of tests reported today. However, since that number is announced tomorrow morning, the number is actually assigned to tomorrow’s line. Then, if the other 300 are processed after 8pm today and people are notified later before tomorrow 8pm, this number, 300, will be reported as the number of tests the day after tomorrow.
This data can be accessed via CovidLive.
I call this trend “Tests Reported”.
So, all in all, 1,000 tests are done today, 700 of them are reported tomorrow at the press conference and 300 of them are reported the day after that.
Do we need both of these numbers?
Yes. We need the first one to track testing trends to better understand how the public is responding and then we need the second ones to understand the positivity rate (positive cases out of the number of tests processed).
A small but important change to the data
I have made a small but important change to the data. I have shifted the Reported Tests trend by one day. The reason is that when you see the number 700 for tomorrow, that is actually the number of tests that were processed today by 8pm. So, just to assign the value back to the actual date, I have made this change to the data.
Here are a couple of plots using these two trends, I would add more figures to this post in the coming days.