Question: Please read the following statement and the case study and answer the question followed by it. CASE STUDY: Optimisation of addresses for European football The
Please read the following statement and the case study and answer the question followed by it.
CASE STUDY:
Optimisation of addresses for European football
The organisation of the European Football Championships applied strict guidelines during the sale of tickets to the international public in 2008. Each individual was allowed to buy only one ticket for themselves and one for a guest, and the personal data of both were to be registered. To avoid tickets falling into the hands of hooligans and unofficial traders, Human Inference was asked for assistance.
We knew beforehand people would try to get hold of more than one ticket, said Jos de Kruif, ticketing sales manager. The simple de-duplication tool the organisation used to identify those people in the database produced a list of suspicious cases. Customer service, however, kept on facing weird requests.
What EURO 2008 wanted then was first validation and standardisation of the names and addresses in the database. The tickets would be sent to customers shortly before the game. Therefore it was of utmost importance for the personalised tickets to be distributed to the right addresses and the right persons.
Names were set in a standard format with a program named HIquality Name and it was the function of HIquality Name to apply capitals in the proper way. Many names were corrected. Hundreds of names were rejected because the family name was missing or a company name was mentioned. Controlling and standardising the addresses was done with the help of HIquality Address. HIquality Address compared the filled-in address with the most similar address in the postcode databases of a specific country. An error margin was incorporated and all prevented addresses with minor differences were printed on a list. For countries without available postcode databases other solutions were invented.
Many of the addresses were standardised in the format the organisation desired. There were numerous reasons for adjusting the addresses see Table 6.1.
To de-duplicate the database, the Football Championship organisation applied HIquality Identify. Experience shows that most fraudulent people only make minor adjustments in their personal data; the system HIquality Identify assesses this similarity between records. The database for sold tickets contained, before analysis, approximately 400,000 records.
The de-duplication exercise produced 9 per cent potential duplicate names and addresses. From a safety control perspective this was a significant filtering outcome. The same names with different first letters were found many times. First and last names were inverted or the names were represented with and without the maiden name, a neighbours house number was given, rather than their own and their own slightly adjusted birth date was given.
To prevent organised fraud, not only individual records were compared but also groups of records. In case person 1 resembles person 4, person 2 matches person 4 and person 5 shows similarities with person 1, the group relations were presented. In this way a street in Reykjavik was identified where several persons together had requested 60 tickets.
There were three sources of personal data: the internet, a data-entry agency and the organisations own office. Since all three sources had their own database characters, diacritics were presented in varying ways. After importing the data in an SQL file the became a or and became a ;. Although this complicated data comparison, it turned out that several people ordered tickets through multiple channels.
Hooligans formed a special risk category during the analysis. Using specialised technology, people who most likely were preventing correct identification by using variationson their names were traced. The German, English, Belgium and Dutch football associations provided databases with names of hooligans who were no longer allowed to enter a stadium. Fifty hooligans were identified this way. De Kruif: to avoid criticism afterwards we had to do everything to get these people out of our databases! Each of these 50 hooligans had ordered one or more tickets and all of them were cancelled.
The resulting data set was used by the Football Championship organisation to accept and fulfil orders. The list with minor errors was processed manually. This also holds for people who potentially ordered more than two tickets for a match. Many of them probably would not have known they were only allowed to book two tickets or made an unintended error. They were not necessarily cheats. The organisation in total refused 1,100 orders. In the end, only those people in Reykjavik can tell you whether they acted in bad faith. This is also the case for the 48-year-old man from a little village in the country who lives with his 47-year-old brother at the same address . . .
STATEMENT:
Manual information section blunders
People are inclined to making blunders, and surprisingly a little informational collection that incorporates information entered physically by people is probably going to contain botches. Information section blunders like errors, information entered in some unacceptable field, missed passages, etc are essentially unavoidable.
2. OCR mistakes
Machines can commit errors when entering information, as well. In situations where associations should digitize a lot of information rapidly, they frequently depend on Optical Character Recognition, or OCR, innovation to do as such. OCR innovation checks pictures and concentrates text from them naturally. It tends to be extremely valuable when, for instance, you need to take a large number of addresses that are imprinted on paper and enter them into a computerized data set so you can investigate them. The issue with OCR is that it is quite often flawed.
In case you're OCR'ing a large number of lines of text, you're very likely going to have a few characters or words that are confounded zeroes that are deciphered as eights, for instance, or formal people, places or things that are perused as normal words in light of the fact that the OCR instrument neglects to recognize appropriately among capital and lowercase letters. Similar kinds of issues emerge with different sorts of robotized machine section of information, like content to-discourse
3. Absence of complete data
When ordering an informational collection, you regularly run into the issue of not having all data accessible for each section.
For instance, a data set of addresses might be feeling the loss of the postal districts for certain passages on the grounds that the postal divisions couldn't be resolved through the strategy that was utilized to assemble the dataset.4. Equivocal information
When assembling an information base, you may track down that a portion of your information is equivocal, prompting vulnerability about whether, how and where to enter it.
For instance, in the event that you are making a data set of telephone numbers, a portion of the numbers you look to enter might be longer than the regular ten digits that you have in a United States telephone number. Are those more drawn out numbers essentially errors, or would they say they are global telephone numbers that incorporate more digits? In the last case, does the number contain total global dialing data?
These are such inquiries that are difficult to answer rapidly and deliberately when you're working with a huge assortment of information.
5. Copy information
You may track down that at least two information sections are for the most part or totally indistinguishable.
For instance, possibly your data set contains two sections for a John Smith living at 123 Main St. In light of this data, it's hard to tell whether these passages are essentially copies (possibly John Smith's data was entered twice unintentionally) or if there are two John Smiths (a dad and child, maybe) living at a similar location. You need to figure out apparently copy passages like this to utilize your information.
Question:
2. How do you evaluate the way EURO 2008 addressed these quality issues? (10 marks) (answer must related to the statement and case study above and the length of the answer must equal with the marks given)
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
