Here's another bit of data weirdness:
This is from the 'Non-derivable fares override table'
R1016126800000 7DSO030120130201201316112012N0000318000001590 YNN
R1016126800000 7DSO180520130401201316112012Y
That probably looks impossible to read, but character 2-5 is the origin (Cannock), and 6-9 is the destination (Stafford). The next five digits are the route code (00000 - any permitted), the spaces are the railcard (none), and 7DS is the ticket type, the next character is always O, and then the next twelve characters are the end and start date.
The two lines are not the same length, because the second, which ends with 'Y' is intended to suppress the previous record!
However, if we look at the previous record, we can see:
03012013
02012013
I.e. this record is only valid between 2/1/2013, and 3/1/2013, that is for one day. The given fare is £31.80 for an adult ticket, and £15.90 for a child ticket.
whereas the suppress record is valid between 4/1/2013 and 18/5/2013.
If you look at the non-override Non-derivable fares file, there are no records
Looking in the flows file there is one flow defined, for Cannock - Stafford Any Permitted, which is flow id 0033454. This is defined as valid from 02/01/2013 through eternity.
The 7DS fare defined in the flow file is £51.30
So to sum up we have
Cannock - Stafford flow fare Any Permitted 7DS, defined as £51.30 from 2/1/13 onwards
Cannock - Stafford non-derivable override fare Any Permitted, 7DS, defined as £31.80 from 2/1/13 to 3/1/13
Cannock - Stafford suppress the '7DS record Any Permitted', between 4/1/13 and 18/5/2013
There is something wrong with this, since the supress record would appear to suppress the flow fare of £51.30 meaning that no Any Permitted 7DS is available. I believe however it is only intended to suppress the £31.80 fare.
According to
http://ojp.nationalrail.co.uk/service/seasonticket/tickets, there is an available 7DS Any Permitted, and the price is £51.30
It does say in the spec
Suppress_mkr
"If the value is Y this indicates that the type N record in the derivable fares file with the same ORIGIN_CODE, DESTINATION_CODE, ROUTE_CODE, TICKET_CODE and RAILCARD_CODE should be suppressed (ie treated as if it does not exist) between
the START_DATE and END_DATE on this record. Records with this field set to Y will contain spaces in all the fields following this one."
I had assumed that 'suppress' records could suppress flow fares, but the spec does not say this and indeed it appears to be the case that 'suppress' records only relate to other override records.
Hence regarding my previous clean-up script, this can be extended to drop all the suppress records, given that the script has already dropped the suppressed override records for the given reference data.
The 'end date' is still an issue though:
'END_DATE Last date for which this record can be used.
Format is ddmmyyyy. A high date
(31122999) is used to indicate records
which have no defined end date. '
This would suggest that end date is an inclusive data, i.e. that the £31.80 season would be valid up to the end of 3/1/2013.
However the question is "did the Cannock - Stafford 'Any Permitted' season really increase from £31.80 to £51.30 on January 4th"? It seems quite an excessive increase.
Either way, as the reference data is arbitrary and the NFM is valid for an open-ended period, it perhaps might be better to choose a later date than 3rd January for the reference date?
--- old post above --- --- new post below ---
Counties are also used for fares to Douglas (Isle of Man). And the spec also seems to say that clusters can contain references to all locations within a given zone (presumably a London zone), but doesn't say how that is indicated and there don't seem to be any current examples.
Yes correct on both counts.
My memory was a bit hazy on Douglas, I looked at this a few days ago so was doing it from memory....
All the records join to either a county, location, or cluster, so the 'zone' seems to be unused.
--
I re-imported the Non-Derivable fares, including all dates, due to my concerns about the data quality, as above. This gives 18,455,566 records.
I ran this query:
delete n2
from nonderivablefares n2
where n2.suppress = 1
and not exists (select *
from NonDerivableFares n1
where
n1.OriginCode = n2.OriginCode and
n1.DestinationCode = n2.DestinationCode and
isnull(n1.RailcardCode,'') = isnull(n2.RailcardCode ,'') and
n1.RouteCode = n2.RouteCode and
n1.TicketCode = n2.TicketCode
and n1.Suppress = 0)
which deletes all suppress records for which there is no matching non-derivable fare.
This eliminated 1,445,777 records.
Regarding the issue of records being suppressed after they had expired, I ran this query:
select
count(*), n2.StartDate, n2.EndDate, n1.StartDate, n1.EndDate
from NonDerivableFares n2
join NonDerivableFares n1 on
n1.OriginCode = n2.OriginCode and
n1.DestinationCode = n2.DestinationCode and
isnull(n1.RailcardCode,'') = isnull(n2.RailcardCode,'') and
n1.RouteCode = n2.RouteCode and
n1.TicketCode = n2.TicketCode
where
n2.suppress = 1
and n1.Suppress = 0
and n2.StartDate > n1.EndDate
and not exists
(select * from NonDerivableFares n3
where
n3.OriginCode = n2.OriginCode and
n3.DestinationCode = n2.DestinationCode and
isnull(n3.RailcardCode,'') = isnull(n2.RailcardCode,'') and
n3.RouteCode = n2.RouteCode and
n3.TicketCode = n2.TicketCode
and n3.Suppress = 0 and n2.StartDate <= n3.EndDate)
group by
n2.StartDate, n2.EndDate, n1.StartDate, n1.EndDate
order by count(*) desc
It gives this output:
318682 2013-01-22 2013-03-31 2012-12-27 2013-01-21
1590 2013-01-04 2013-05-18 2013-01-02 2013-01-03
42 2014-01-02 2999-12-31 2013-01-02 2014-01-01
28 2014-01-02 2999-12-31 2012-09-02 2013-01-01
7 2013-01-02 2999-12-31 2012-09-02 2013-01-01
As you can see there are 318,682 override records that expired on 21st January that were suppressed from 22nd January, 1,590 records that expired on 3 January but were suppressed from 4th January, and so on.
I am not sure what this is supposed to mean.
But the data definitely need some clean up.
--- old post above --- --- new post below ---
Ok, so I'm an idiot.
The date for the fares download was 30th January 2013.
Therefore there's no point in looking at data for 3rd January 2013, because the data were valid on 30th January, but not before.
Excluding non-derivable fares that expired before 30th January, or become valid after 30th January and clean-up of the overrides, I got 11.4 million non-derivable fares.