Last Updated: 2021-01-19
Data: NSW COVID-19 tests data from two different sources
In this post, I am putting together two sources of data on Covid-19 testing in NSW. To clarify, let me use an example:
Imagine that 1,000 tests are done today in NSW.
There is a dataset released by Data.NSW which has one record per test and it gives you some information about the date of the test and the postcode/LGA where the test was taken (not where the person lives, but where the test was done). So, that dataset, for today, would have 1,000 rows and what I do is to count the number of those rows and then report 1,000 tests for today. You can access the data here. One key challenge with this data is that it is released with a one day lag and also the last observation is usually incomplete. This means when it is released today, your last reliable data point is for three days ago.
I call this trend “Tests Done”.
Next, from those 1,000 tests done today, 500 of them may be processed by 8pm today and the person who was tested is notified. That number, 700, is then reported by NSW Health or Premier, as the number of tests reported today. However, since that number is announced tomorrow morning, the number is actually assigned to tomorrow’s line. Then, if the other 300 are processed after 8pm today and people are notified later before tomorrow 8pm, this number, 300, will be reported as the number of tests the day after tomorrow.
This data can be accessed via CovidLive.
I call this trend “Tests Reported”.
So, all in all, 1,000 tests are done today, 700 of them are reported tomorrow at the press conference and 300 of them are reported the day after that.
Do we need both of these numbers?
Yes. We need the first one to track testing trends to better understand how the public is responding and then we need the second ones to understand the positivity rate (positive cases out of number of tests processed).
A small but important change to the data
I have made a small but important change to the data. I have shifted the Reported Tests trend by one day. The reason is that when you see the number 700 for tomorrow, that is actually the number of tests that were processed today by 8pm. So, just to assign the value back to the actual date, I have made this change to the data.
Here are a couple of plots using these two trends, I would add more figures to this post in coming days.
Daily Number of Tests Done vs. Number of Tests Reported
As the first figure shows, while the trend for reported tests was suggesting that the number of tests in NSW started to drop on Christmas Day, however, the decline seems to have started on the 23rd of December. On Christmas Day, when 40,000 tests were processed and reported, only about 7,700 tests were done in NSW. That means that majority of the tests that were reported between 23rd and 25th were actually done in prior days.
One concern can be the fact that the number of tests done in NSW is currently very close to the levels prior to the outbreak. As I have shown here the current test numbers are very close to what NSW had during their July outbreak and what VIC had back in June. The other issue is the fast decline in the number of tests outside the Northern Beaches area. See my plots here on that.
When I shared an earlier version of this plot on Twitter, there was a comment about the fact that potentially many people in NSW decided to have a test 21st and 22nd before visiting others on Christmas Day and as the data shows a large portion of them should have been notified on time.
I will write more about this data here.
Cumulative Number of Tests Done vs. Number of Tests Reported
Some of you may want to see the cumulative number of tests for this period. Here is the plot for that, showing that overall the trends converge and as a result there are not major issues with the data.