Question: In snowflake s web interface, Create a worksheet that selects data from the medical labs table DXRX _ DATA _ PROD.MEDICAL _ LABS.PATIENT _ PROCEDURES.

In snowflakes web interface, Create a worksheet that selects data from the medical labs table
DXRX_DATA_PROD.MEDICAL_LABS.PATIENT_PROCEDURES. From this data table we would like to select columns:
DV_PATIENT_ID
TEST_ORDER_ID
ACCESSION_NUMBER
REFERRING_PROVIDER_NPI_NUMBER
ORDERING_PROVIDER_NPI_NUMBER
TEST_ORDERED_DATE
TEST_REPORTED_DATE
PERFORMING_ORGANIZATION_NAME
PANEL_NAME
TEST_NAME
TEST_CODE
RESULT_NAME
RESULT_CODE
RESULT_VALUE
RESULT_COMMENTS.
1. We want to filter for rows that contain HER2 or HER-2 in the results fields for tests reported
in 2023. This filter infers that the patients have been tested for the biomarker HER2. In cases
where there is no test report date, please make use of the ordered date. This filter should be
used in the further activities. Create a new column that selects test_order_id as default, if
missing then uses the accession number and name the column as TEST_ID.
The solution for this question is :
WITH her2_patients AS(
SELECT
DV_PATIENT_ID,TEST_ORDER_ID,ACCESSION_NUMBER,REFERRING_PROVIDER_NPI_NUMBER,
ORDERING_PROVIDER_NPI_NUMBER,TEST_ORDERED_DATE,TEST_REPORT_DATE,
PERFORMING_ORGANIZATION_NAME,PANEL_NAME,TEST_NAME,TEST_CODE,
RESULT_NAME,RESULT_CODE,RESULT_VALUE,RESULT_COMMENTS,
COALESCE (TEST_ORDER_ID,ACCESSION_NUMBER) AS TEST_ID,
COALESCE (TEST_ORDERED_DATE,TEST_REPORT_DATE) AS TEST_DATE
FROM DXRX_DATA_PROD.MEDICAL_LABS.PATIENT_PROCEDURES
WHERE
(
REGEXP_LIKE(RESULT_NAME, '.*HER-?2.*|.*ERBB-?2.*','i') OR
REGEXP_LIKE(RESULT_VALUE, '.*HER-?2.*|.*ERBB-?2.*','i') OR
REGEXP_LIKE(RESULT_COMMENTS, '.*HER-?2.*|.*ERBB-?2.*','i')
)
AND
(
TEST_DATE LIKE '2023%'
)
)
SELECT *
FROM her2_patients;
From the data retrieved in exercise 1, you will have the same test results reported over
multiple dates and sometimes performing organisation names. This duplication happens in a
system were the same result maybe captured multiple times and seems to cluster within
~35 days periods. How would we collapse this data to show 1 record where the results are
the same? Please explain the approach in bullet points and show in SQL. Please show 3
examples of before and after your deduplication process for individual patients results.
a) The question is done on Snowflake SQL Worksheet. So the SQL query should be compatible to it.
b) Please explain how the question works and also the logic behind the question?
c) Please give me a solution and the explanation for this question as well.
I can give some snippet of a possible solution :
Q10 AS (SELECT DV_PATIENT_ID, TEST_CODE, TEST_NAME, ORDERING PROVIDER_NPI_NUMBER, PERFORMING ORGANIZATION_NAME, RESULT_CODE, RESULT_NAME,
RESULT_VALUE, RESULT_COMMENTS, TEST_DATE,
-- Add a column to calculate the date difference in days using LAG and DATEDIFF
DATEDIFF (day, LAG(TEST_DATE) OVER (PARTITION BY DV_PATIENT_ID, TEST_CODE, TEST_NAME, ORDERING PROVIDER_NPI_NUMBER, RESULT_CODE,
RESULT_NAME, RESULT_VALUE, RESULT_COMMENTS ORDER BY TEST_DATE), TEST_DATE) AS DATE_DIFF
FROM her2_tests)
-Filter out the rows where the date difference is less than 35 days
SELECT FROM Q10
WHERE DATE_DIFF IS NULL OR DATE_DIFF >=35
ORDER BY DV_PATIENT_ID, TEST_CODE, TEST_NAME, ORDERING PROVIDER_NPI_NUMBER, PERFORMING ORGANIZATION_NAME, RESULT_CODE, RESULT_NAME,
RESULT_VALUE, RESULT_COMMENTS, TEST_DATE;
explanation:
LAG function to lag the test date over the partition by clause on on all the columns which make up the
duplicate values
CHECK for date difference >=35, or NULL for the first case where no
209.6 k to 96.8k rows after dedupliucation based on columns: DV_PATIENT_ID, TEST_CODE, TEST_NAME,
ORDERING PROVIDER_NPI_NUMBER, RESULT_CODE, RESULT_NAME, RESULT_VALUE,
RESULT_COMMENTS.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!