• Our booking engine at tickets.railforums.co.uk (powered by TrainSplit) helps support the running of the forum with every ticket purchase! Find out more and ask any questions/give us feedback in this thread!

Public NFM

Status
Not open for further replies.

johnnycache

Member
Joined
3 Jan 2012
Messages
421
Can anyone tell me whether there is a website that would enable the fares database to be interrogated so that you could find out the following sorts of information-

All fares with a particular route code
All fares using a particular cluster
All fares set by a particular operator

Or if not any clues as to how to set this up
 
Sponsor Post - registered members do not see these adverts; click here to register, or click here to log in
R

RailUK Forums

maniacmartin

Established Member
Fares Advisor
Joined
15 May 2012
Messages
5,395
Location
Croydon
This would be possible to set up using the recently released ATOC fares information
 

maniacmartin

Established Member
Fares Advisor
Joined
15 May 2012
Messages
5,395
Location
Croydon
I'm sure a web view of the data will be online in due course, however the specification file for the data is 81 pages long as it is a somewhat tedious task to process all this and set up the necessary database schemas. Watch this space
 

soil

Established Member
Joined
28 May 2012
Messages
1,956
Can anyone tell me whether there is a website that would enable the fares database to be interrogated so that you could find out the following sorts of information-

All fares with a particular route code
All fares using a particular cluster
All fares set by a particular operator

Or if not any clues as to how to set this up

No there is not.

The data has been released by ATOC and can be imported into a database and queried on an ad hoc basis, if you have the relevant technical skills.

For example, this query shows that there are 42,174 flows owned by Southwest Trains

select
count(*)
from
Flows
where
TOC = 'SWT'

This query gives the available standard class season tickets from Woking:

declare @Station char(4)
declare @NLCCode char(4)
select @Station = 'WOK' -- Change this

select
@NLCCode = NLCCode
from
Locations
where
CRSCODE = @Station

Declare @FlowCodes Table
(
Code char(4) primary key
)

insert into @FlowCodes
select
ClusterId
from
Clusters
where
ClusterNLC = @NlcCode
union
select @NLCCode

select
*
from
(select
isnull(Destination.Description, DestinationClusterLocation.Description) as Destination,
FlowFares.Fare
from
@FlowCodes as FlowCodes
join Flows on Flows.OriginCode = FlowCodes.Code
join FlowFares on FlowFares.FlowId = Flows.FlowId
left join Locations Destination on Flows.DestinationCode = Destination.NLCCode
left join Clusters DestinationCluster on Flows.DestinationCode = DestinationCluster.ClusterId
left join Locations DestinationClusterLocation on DestinationClusterLocation.NLCCode = DestinationCluster.ClusterNLC
where
FlowFares.TicketCode = '7DS'
union
select
isnull(Origin.Description, OriginClusterLocation.Description) as Destination,
FlowFares.Fare
from
@FlowCodes as FlowCodes
join Flows on Flows.DestinationCode = FlowCodes.Code
join FlowFares on FlowFares.FlowId = Flows.FlowId
left join Locations Origin on Flows.OriginCode = Origin.NLCCode
left join Clusters OriginCluster on Flows.OriginCode = OriginCluster.ClusterId
left join Locations OriginClusterLocation on OriginClusterLocation.NLCCode = OriginCluster.ClusterNLC
where
FlowFares.TicketCode ='7DS'
) as Fares
order by Fares.Fare asc

(note, some of these are overridden, e.g., Woking - Tonbridge - this query does not check this)

This returns 1,261 rows.

Here are the first 50 (in pence per week):

WEST BYFLEET 1490
WORPLESDON 1490
BROOKWOOD 1890
BYFLEET & NEW H 1890
GUILDFORD 2380
ADDLESTONE 2600
WEYBRIDGE 2610
CHERTSEY 3260
SHALFORD SURREY 3270
LONDON RD GUILFD 3290
WALTON ON THAMES 3290
FARNBOROUGH STNS 3500
ASH VALE 3600
HERSHAM 3600
VIRGINIA WATER 3840
CHILWORTH 3860
FARNCOMBE 3860
ESHER 3880
GODALMING 4160
ALDERSHOT. 4190
CLANDON 4190
WANBOROUGH 4190
MILFORD SURREY 4440
FRIMLEY 4480
LONGCROSS 4510
ASH 4640
BERRYLANDS 4690
GOMSHALL 4690
HAMPTON COURT 4690
NEW MALDEN 4690
SURBITON 4690
THAMES DITTON 4690
EGHAM 4710
FLEET 4710
BLACKWATER 4730
NORTH CAMP 4730
BERRYLANDS 4960
SUNNINGDALE 5030
CAMBERLEY. 5070
FARNHAM 5070
HORSLEY 5070
BRENTFORD 5150
CHISWICK 5150
FELTHAM. 5150
GUNNERSBURY 5150
HOUNSLOW 5150
ISLEWORTH 5150
KEW BRIDGE 5150
KEW GARDENS 5150
MORTLAKE 5150

Obviously some of these fares are clustered, but the query is showing each available destination separately. Also a few of these fares are available only in one direction, but because it's a season that's irrelevant (however it might not be when buying the ticket) Note I haven't extracted any route data (not via London, etc.).

A raw dump of available seasons is obviously somewhat useful, but you could make it more useful. E.g., - join in routeing data - what are the associated routeing points, are there any anomalies. Going a step further, you could use a routeing engine to figure out permitted routes for each ticket. For example, say you want a season from Woking to Portsmouth Harbour. BRFares tell us in an instant that the standard season on this route is £102.60/week.

Quite a lot of money.

A fairly simple (note: this is relative!) modification to the code above would give you a list of all Woking seasons costing less than £102.60 and their associated routeing points, but beyond. We could eyeball this, looking for useful tickets, but rather than do that, a computer could iterate through all of them, calculating permitted routes and scanning for cheaper alternatives.

The database is obviously non-trivial.

You could obviously define an arbitrary set of queries and make them available on a website, the issue is that any set of queries would obviously be rather arbitrary, and wouldn't necessarily match any user's requirements, and might in terms of data volume exceed what is practical over the web.

I think therefore that it would be most useful to turn the raw data into a database. This could either be done on a server somewhere, and the resultant relational database redistributed to anyone who wanted it, or otherwise be done on the user's computer acting directly on the raw data.

I don't really have a clear idea on how best to query the data, in terms of providing a set of standard SQL queries that users could run, a reporting interface, or something else.

I would note that the routeing data has not been published by ATOC, and the cost is around £900/year currently.
 
Last edited:

Death

Established Member
Joined
23 Oct 2006
Messages
1,639
Location
Sat at the control desk of 370666...
...however the specification file for the data is 81 pages long as it is a somewhat tedious task to process all this and set up the necessary database schemas.
I read that same spec file in full myself the other day, and the bulk of it is references to the records found in each file and how they are to be interpreted.
I would say that - If ye are looking to skim it - Do pages 1-8 and jot down the page numbers for each subsequent file/section reference on a bit of paper for rapid access. 9 and 10 specify which files are which, which might be handy printed out and referenced at dev time. :)

On the brighter side, the RJIS timetable export reference is only 22 pages long (Though ye'll need to do a lot more reading than that if ye don't already know BR's CIF file format! ;) )

Go to http://data.atoc.org/fares-data and you can download it.
A sign up is required for this, but it is completely free - ATOC just want to know who's using the data and for what purposes.
Please remember to be sparing with the downloads though...We don't want ATOC finding themselves forced to limit access due to resource strain caused by too many enthusiasts trying to do full file refreshes (FFRs) at the same time! :shock:

Just as an additional caution, the three datasets available from that site are deceptively small on download, and decompress to MUCH larger sizes than ye'd ever guesstimate:
  • London Terminals interchange validity data: 20.1KB downloaded, c. 1.45MB decompressed.
  • TTF008.ZIP = Passenger timetable feed/FFR with WTT-style train data: 25MB downloaded, 488MB decompressed.
  • RJFAF915.ZIP = Passenger fares feed/FFR with all fares info: 92MB downloaded, a whopping 1.34GB decompressed! :shock:
...So make sure that ye have plenty of fast disk space to hand when starting to play around with it. Just in case anyone's looking for better compression though, 7-Zip on regular settings can get all three down to a total of 51MB. :)

Just out of curiosity (To the floor) does anyone know how ATOC are likely to view a torrent share of this info around enthusiast groups? It should be OK under the CC license they've released the data under, and it'd remove a heavy chunk of load from their servers, too. :)

Its clearly not just a case of load it into excel and away you go!
Definitely not. The CIF format - As far as I can make out - Was originally designed for use with (And export from) an IBM 360 mainframe! 8-)

Curiously enough though, the NR CIF spec still contains references to mainframes and supply of updates by dialling-in to the former. The document was last updated in 2007CE! :lol:

I would note that the routeing data has not been published by ATOC, and the cost is around £900/year currently.
This would be possible to set up using the recently released ATOC fares information
I was just about to ask: Isn't there enough information in the fares data for us to compile our own versions of the Routeing Guide, as it appears from the RJIS side of things?
It'd never match any printed manuals that Guards or RPOs have been issued with route-for-route...But if there was a way of verifying such an RG as having been dumped from legitimate RJIS data, then it would most likely be usable in that context. :)

By the by; When was this fare and timetable data released to the public by ATOC? I see hints around pointing to last July, but I only discovered it by chance on Friday mourning! :eek:

Farewell... <D
>> Death <<
 

greatkingrat

Established Member
Joined
20 Jan 2011
Messages
2,785
I'm not sure there is any point creating a torrent of the data. ~90MB really is not that much anymore, and I doubt there are enough people who want to download it to cause any significant impact on the ATOC server.
 

radamfi

Established Member
Joined
29 Oct 2009
Messages
9,267
When the files became available I did try to develop my own simple version of Avantix Traveller using Access and an Excel front end, as an academic exercise tbh, as brfares.com seems to satisfy my requirements.

Probably 99% of fares are available from the 'flow' file but some fares are not in the flow file so require the huge files RJFAF842.NDF and RJFAF915.NFO, which make the Access database much bigger. I'm sure it could be done much better using SQL Server or MySQL or similar.
 

soil

Established Member
Joined
28 May 2012
Messages
1,956
I was just about to ask: Isn't there enough information in the fares data for us to compile our own versions of the Routeing Guide, as it appears from the RJIS side of things?
It'd never match any printed manuals that Guards or RPOs have been issued with route-for-route...But if there was a way of verifying such an RG as having been dumped from legitimate RJIS data, then it would most likely be usable in that context. :)

A sample CD of RJIS routeing data is available (free). I ordered a copy, and it is up-to-date as of Y/E 2011.

It is supposed to use the 1997 Fares Manual, which would be supplied as part of the £900/yr contract, but not with the sample.

Unfortunately the RJIS electronic maps from 2011 does not match the printed (contractual) Routeing Guide - there are far more maps, as I understand it the Routeing Guide became obsolete over a decade ago and at some point, they stopped bothering to update it properly.

Should one spend £900 on the current RJIS data, it is my understanding that the routes obtained out of the current system wouldn't be enforceable anyway, because they do not match the printed Routeing Guide, and because the content of RJIS is not published anywhere, even though it is supposed to be used by train booking systems for determining valid routes.

Likewise, I'm not sure if NFM64, which they use internally, is necessarily contractual, so that may not be relevant either....

I think that using the sample (2011) RJIS routeing data, if you deleted the new routeing points and maps from the database, it would probably match the current printed Routeing Guide, which at least has contractual force, and then you could use it with the current NFM.
 

Death

Established Member
Joined
23 Oct 2006
Messages
1,639
Location
Sat at the control desk of 370666...
Probably 99% of fares are available from the 'flow' file but some fares are not in the flow file so require the huge files RJFAF842.NDF and RJFAF915.NFO, which make the Access database much bigger. I'm sure it could be done much better using SQL Server or MySQL or similar.
Has anyone tried porting the whole lot into an Access database yet? I'm toying with this as an option for my system (Which would also support using just the flow file for standard consumer installations) but I won't have the time to do this for quite a while yet, and I'm curious as to how big the resulting database would be. :)

I suppose for systems based entirely on back-ends (Websites, query by e-mail, TTY/SSH based services etc) one could get away with using MySQL running locally, but - Based on my experience of using MaNGOS (An open World of Warcraft server) - That wouldn't be practical for end user installations.

I'm not sure there is any point creating a torrent of the data. ~90MB really is not that much anymore, and I doubt there are enough people who want to download it to cause any significant impact on the ATOC server.
It really depends on timing and demand, though. At present I assume the datafeeds aren't too widely known about, and the nature of their format - As one that isn't really usable without a decent degree of computer and database experience - Is understood by those who do...Meaning that at present, the only persons likely to be downloading it will be those who are certain that they can use the data in one form or another - Which I would guesstimate to being under 1,000 casual users, or about 15GB of bandwidth.
I intend to use the current datafeed for building my systems on, but don't intend to update it again until I've got something stable that makes an up-to-date patch justifiable. :)

Fast forward to a time when a utility exists (As I assume it eventually will) that allows non-technical and/or non-railway folk to run fares lookups using the raw RJIS data downloaded from ATOC. Assuming it became popular and attracted over a million users, that would bump up ATOCs resource load exponentially - About 15TB per million users, if my maths is correct - And that would lead to massive bandwidth bills at ATOCs end. :shock:
Don't forget of course that the day a new NFM/WTT is placed on the server for download and becomes known about, the bulk of users trying to download and update their NFMs all at once could potentially DDoS ATOC if enough users tried doing it all at the same time. :(

Personally; If I do eventually manage to create something that will be able to make offline NFM lookups easy for end users (So basically, something like the NR website on ones computer or smartphone) then that program will receive its updates in 7-Zip format via either a separate server and/or BitTorrent - Just to try and be considerate to ATOCs bandwidth bills if nothing else. :)

A sample CD of RJIS routeing data is available (free). I ordered a copy, and it is up-to-date as of Y/E 2011. It is supposed to use the 1997 Fares Manual, which would be supplied as part of the £900/yr contract, but not with the sample.
When did ye order and receive that, if I may ask? Sounds like something that might interest me initially and I might request one from ATOC if I feel that I can use it to maximum effect.
That said, I think I'll keep myself to processing the NFM/WTT data alone for the time being - I have enough in my intray as it is, and the later I request an RJIS sample the more likely it is to be up to date. :)

By the by: I suppose that £900/year contract includes access to daily/hourly NFM and WTT "push" updates (See the tail end of the RJIS NFM spec) doesn't it? Granted that's *far* too much for end users like you or I, but for the average company that would be an acceptable routine expense. Besides, once ye chop the VAT off that brings it down to about £720. :)

Unfortunately the RJIS electronic maps from 2011 does not match the printed (contractual) Routeing Guide - there are far more maps, as I understand it the Routeing Guide became obsolete over a decade ago and at some point, they stopped bothering to update it properly. Should one spend £900 on the current RJIS data, it is my understanding that the routes obtained out of the current system wouldn't be enforceable anyway, because they do not match the printed Routeing Guide, and because the content of RJIS is not published anywhere, even though it is supposed to be used by train booking systems for determining valid routes.
Personally - If I signed up to the RJIS feed for that price - I'd consider a printed copy of the RG as issued to RPOs to be a more than reasonable inclusion with it. That said, if the paper and RJIS RGs don't tally up with each other anyway, then what would be the point of having either? :|

In troth, it would help immensely if ATOCs dump included a companion to the flow file that specified permitted and excluded routes for the fares within that file, so basic permitted and known excluded routes (Such as FNB-WAT via WKM, RDG, GLD and WOK for an "Any Permitted" and GLD, WOK for a "Not Reading") could be referenced using that file for simplicity. It'd mean that - Provided ye travel along the routes specified in that file - There would be absolutely no question as to route validity at all.

That said, I could see ATOC considering such data to be "proprietary" and a part of the £900 RJIS subscription, as otherwise the only bonus to the paid RJIS feed would be the possibility of discovering less common permitted routes (I.E: BHM-RDG Any Permitted is valid via OXF, BAN-MYB-PAD, and EUS-PAD if I understand rightly) which wouldn't be a big enough "carrot" for the majority of people who might otherwise buy it.
 

radamfi

Established Member
Joined
29 Oct 2009
Messages
9,267
Has anyone tried porting the whole lot into an Access database yet? I'm toying with this as an option for my system (Which would also support using just the flow file for standard consumer installations) but I won't have the time to do this for quite a while yet, and I'm curious as to how big the resulting database would be. :)

My first version, importing just the flow information, along with the minimum files necessary, such as the cluster and location files, was about 80 MB in Access (2003 format). But now I've added the NFO and NDF files the file is now over 800 MB, compacted. And that is after removing all the railcard related fares and fares that expired before January.
 

soil

Established Member
Joined
28 May 2012
Messages
1,956
In troth, it would help immensely if ATOCs dump included a companion to the flow file that specified permitted and excluded routes for the fares within that file, so basic permitted and known excluded routes (Such as FNB-WAT via WKM, RDG, GLD and WOK for an "Any Permitted" and GLD, WOK for a "Not Reading") could be referenced using that file for simplicity. It'd mean that - Provided ye travel along the routes specified in that file - There would be absolutely no question as to route validity at all.

That said, I could see ATOC considering such data to be "proprietary" and a part of the £900 RJIS subscription, as otherwise the only bonus to the paid RJIS feed would be the possibility of discovering less common permitted routes (I.E: BHM-RDG Any Permitted is valid via OXF, BAN-MYB-PAD, and EUS-PAD if I understand rightly) which wouldn't be a big enough "carrot" for the majority of people who might otherwise buy it.

I am not sure the data is especially proprietary, nor that the subscription is intended as a major profit centre, but I think there are a few issues:

(1) ATOC have not kept up with their duty to maintain the printed routeing guide, and it might be an embarrassment to them to supposedly equivalent RJIS routeing data
(2) The Routeing Guide is being replaced anyway, as per this: http://www.railforums.co.uk/showthread.php?t=78627 so they can't be bothered to release the RJIS data since the new system is 'just round the corner' (probably a year away I would think?)
 

barrykas

Established Member
Joined
19 Sep 2006
Messages
1,579
(1) ATOC have not kept up with their duty to maintain the printed routeing guide, and it might be an embarrassment to them to supposedly equivalent RJIS routeing data

As far as ATOC are concerned, the printed routeing guide was withdrawn in the early 2000s due to the prevelance of Avantix Mobile (for on-train staff) and RJIS (for stations), and there was a notice to that effect in Newsrail Express at the time.

Incidentally when I was taught about the RG during my retail training, we were told to use the current Fares Manual when doing fares checks, given that using old fares may give a different result (which may not be in the passenger's favour).

Apart from anything else, we only used to keep a year's worth of Fares Manuals, though of course those wouldn't reflect changes made after fares were frozen for print (such as when the main increase was reduced from RPI+3% to RPI+1% this year and the year before).
 

johnnycache

Member
Joined
3 Jan 2012
Messages
421
Some fascinating responses. What I would really like to do is to compare NFMs and see what fares have increased and by how much.
 

Paul Kelly

Verified Rep - BR Fares
Joined
16 Apr 2010
Messages
4,134
Location
Reading
I agree with soil that the exclusion of the electronic version of the Routeing Guide from the data downloads is likely due to the continuing uncertainty over its future rather than a sneaky scheme to make more money. I hear rumours that Fujitsu have been developing some kind of new data schema for it.

I agree too that the data files are extremely bulky in the format they are exported in, but would note that with they can be compressed to an extremely manageable size by (a) only loading data for one date - the effective date of each new NFM (2nd January 2013 for the current data download from ATOC) is likely a good choice, and (b) devising a better data structure than the one RJIS uses. librailfare, which I have developed over the past few years in order to read and interrogate the fares data from the published NFM CD, uses custom data structures written in C and can hold an entire NFM in around 45 megabytes of memory, including all fares records from normal and non-derivable flows, locations, railcards, tickets, restrictions etc., together with non-standard discounts and rovers and rangers. So it is possible!
 

Death

Established Member
Joined
23 Oct 2006
Messages
1,639
Location
Sat at the control desk of 370666...
My first version, importing just the flow information, along with the minimum files necessary, such as the cluster and location files, was about 80 MB in Access (2003 format). But now I've added the NFO and NDF files the file is now over 800 MB, compacted. And that is after removing all the railcard related fares and fares that expired before January.
Hmm...So piling literally everything (Both fares and times data) into that and using indexed tables where appropriate would probably bump that up to being about 2.2GB worth of Access database, if not more. My habitual use of Access 2000 might save a few meg, but not enough to be notable.

That said, the data itself is in a streamed format; IIUC each line is a separate instruction intended for evaluation on a blow-by-blow basis. It's possible therefore that one specific fare could have it's "R" record in the flow file changed eight times by "A" records in the non-derivables, only to then be the subject of a "D" record toward the end of the data stream!
In that specific instance that would be ten different records that wouldn't need to touch the database at all, if the file was scanned for such "Fares that will be deleted" before main processing. :)

Am dying (Pun intended) for a cigarette, so will tack the rest of my reply on afterward. :)
 

maniacmartin

Established Member
Fares Advisor
Joined
15 May 2012
Messages
5,395
Location
Croydon
Access databases have a maximum file size of 2GB.

For that amount of data, I'd be using a proper SQL server (Microsoft, MySQL, PostgreSQL etc).
 

Death

Established Member
Joined
23 Oct 2006
Messages
1,639
Location
Sat at the control desk of 370666...
Some fascinating responses. What I would really like to do is to compare NFMs and see what fares have increased and by how much.
It's been quite a while since I had a go at data-mining the fares databases supplied with Avantix Traveller, but from what I can recall they were quite similar (If not identical) to the ASCII records that we can now download from ATOC.
Presumably these could be extracted from the older NFMs and ported back to ASCII for further processing without any major trouble, though I doubt I'd have the time to do this myself alas.

As far as ATOC are concerned, the printed routeing guide was withdrawn in the early 2000s due to the prevelance of Avantix Mobile (for on-train staff) and RJIS (for stations), and there was a notice to that effect in Newsrail Express at the time.
As the RG is data used to determine the validity of fares and tickets though, it really would help to have this data made available to passengers too. I once got lucky with a journey from LPG to FNB via MAN, SHF (For BoJ) and STP, but I don't know if those moves would be permitted under todays manual.

That said, I did download the publicly released version of the RG from the NR site a year or two ago and the thing read like a MENSA entrance test. I was only able to accurately compute simpler journeys using no more than three maps at a time - The sort of journeys that could be accurately evaluated through the application of simple common sense. :)

The Routeing Guide is being replaced anyway, as per this: http://www.railforums.co.uk/showthread.php?t=78627 so they can't be bothered to release the RJIS data since the new system is 'just round the corner' (probably a year away I would think?)
I agree with soil that the exclusion of the electronic version of the Routeing Guide from the data downloads is likely due to the continuing uncertainty over its future rather than a sneaky scheme to make more money. I hear rumours that Fujitsu have been developing some kind of new data schema for it.
Hmm, I'll have to have a peek at that thread when I can as I didn't know the RG was being replaced. It'd be nice if they de-complicated it and made it easy enough for the general public to understand and use...But considering my perspective of the industry, chance'll be a fine thing! :shock::lol:

I agree too that the data files are extremely bulky in the format they are exported in, but would note that with they can be compressed to an extremely manageable size by
  • only loading data for one date - the effective date of each new NFM (2nd January 2013 for the current data download from ATOC) is likely a good choice, and
  • devising a better data structure than the one RJIS uses.
From what I've seen in looking at the data dumps so far, it seems to me that they're still being produced in a format that's compatible with systems that one would consider obsolete today, but could still be in service across the NR network here and there even to this day.

I mentioned above that the output seemed consistent with UNIX output from an IBM 360 (Everything's in upper-case, for a start!) and most of the original BR systems (And those with which they interfaced) seem to have been based on VAX/VMS-esque hardware and systems for the most part. The BR/NR CIF spec also refers to data being available in EDCBIC as well as ASCII, which would require 6/7-bit plain-text transfer anyway to avoid compatibility headaches.

In many cases, the RJIS output takes up eight bits for records that could easily be stored in less (The R/I/A/D commands for example could be stored in four bits) and it's presumably that which leads to such surprising compression in latter-day utilities like 7-Zip.

librailfare, which I have developed over the past few years in order to read and interrogate the fares data from the published NFM CD, uses custom data structures written in C and can hold an entire NFM in around 45 megabytes of memory, including all fares records from normal and non-derivable flows, locations, railcards, tickets, restrictions etc., together with non-standard discounts and rovers and rangers. So it is possible!
Nice! I'll have to have a peek at that and see if it can be made to work as well in Visual Basic (Which is what I normally dev in) and see if I can combine it with directly accessible compression like BZip2 for quicker, more efficient access to the raw RJIS info if any of my projects require it. :)
 

OwlMan

Established Member
Joined
25 Jun 2008
Messages
3,206
Location
Bedworth, Warwickshire
I agree with soil that the exclusion of the electronic version of the Routeing Guide from the data downloads is likely due to the continuing uncertainty over its future rather than a sneaky scheme to make more money. I hear rumours that Fujitsu have been developing some kind of new data schema for it.

I doubt it as they have apparantly lost the contract to iblocks and smart421
 

radamfi

Established Member
Joined
29 Oct 2009
Messages
9,267
Access databases have a maximum file size of 2GB.

For that amount of data, I'd be using a proper SQL server (Microsoft, MySQL, PostgreSQL etc).

I encountered that problem when trying to import the NFO and NDF files. I ended up having to split those files into small chunks, import one, strip out the railcard and old fares, compact it, then import the next chunk etc.

The NFO and NDF files should really be scrapped and everything should be put into the flow files.
 

Paul Kelly

Verified Rep - BR Fares
Joined
16 Apr 2010
Messages
4,134
Location
Reading
The NFO and NDF files should really be scrapped and everything should be put into the flow files.

The hard thing about that, as far as I can see, is that then details of which railcards were and weren't allowed would have to be put into the non-standard discounts table. The nice thing about the non-derivable fares table is that the fare for every railcard is specified explicitly, and if a given railcard is not included there then it can not be used to discount that fare - no further database lookups needed.

For the normal flow table on the other hand, checking whether a given railcard is valid involves lots of tedious wildcard matching in the non-standard discounts table. In theory there is a flag that says a given flow is subject to non-standard discounts (and it would only be necessary to check if it was set), but in my experience it is not always honoured (e.g. a lot of flows to Manchester Metrolink stations disallow most railcards and have entries in the non-standard discounts table to indicate that, but they don't have the non-standard discounts flag set against the relevant flows).
--- old post above --- --- new post below ---
I doubt it as they have apparantly lost the contract to iblocks and smart421

Thanks for that - interesting to know where things are going. Some interesting reading here: http://www.railway-technology.com/n...-sales-management-system-rail-settlement-plan (nothing about routeing guide though...)
 

johnnycache

Member
Joined
3 Jan 2012
Messages
421
New fares system being built as well
I'm not sure of the latest position but ATOS under pressure from new entrants such as IPL
I think in phase 1 the new system will have to replicate what we have now but the strategy is to move to "thin TIS" ie the TIS becomes a dumb (and hopefully cheap) device which gets its information from the centre. This should also lead to standardisation of presentation as at the moment each TIS supplier uses the fares feed in a slightly different way
 

Paul Kelly

Verified Rep - BR Fares
Joined
16 Apr 2010
Messages
4,134
Location
Reading
New fares system being built as well
I'm not sure of the latest position but ATOS under pressure from new entrants such as IPL
I think in phase 1 the new system will have to replicate what we have now but the strategy is to move to "thin TIS" ie the TIS becomes a dumb (and hopefully cheap) device which gets its information from the centre. This should also lead to standardisation of presentation as at the moment each TIS supplier uses the fares feed in a slightly different way

So interesting to hear that, as it is a complete reversal of what happened in the 1990s, when the move was away from all data being held on a centralised mainframe accessed over the network through individual terminals, to the fares data being exported from that and held in individual TIS, TVMs etc. - and the mainframe only really being used as a backend database.

In 1998 Sema (now Atos) was positively advertising the new standalone capability as a benefit, not a hindrance. Their big achievement was replicating the logic used for combining flows, clusters, non-derivable fares etc, together into the list of fares available to the passenger. That meant that once the raw data was exported from the mainframe into a standardised schema, anybody who had access to it could build their own fare query engine. But even though, as maniacmartin said, the spec document is 81 pages long, there are so many grey areas and unspecified things that it is inevitable that there are going to be loads of minor implementation differences among different TIS and booking engine suppliers.

What goes around comes around. But any move towards fares data only being accessible in a "ready-to-use" format through some kind of web service would not be good news for people working on split ticketing engines - for optimium performance on that sort of thing it is necessary to have a copy of all the fares data available locally. So I hope that will continue to be available whatever happens.
 

soil

Established Member
Joined
28 May 2012
Messages
1,956
The data structure used by ATOC smells a bit.

The Locations table contains these data:

12,422 'L' records - these are the actual ticket origins/destinations
The Primary Key here should be the UIC Code. However there are about 100 or so foreign stations with duplicate records.

E.g, 'BASEL SBB'
has two records, both UIC 8500010
One record valid from 1/1/1991 to 31/12/2999, ERS Country null, ERS code null, Fare Group 000100, long description null
The other valid from 13/8/2003 to 31/12/2999, ERS Country CH, ERS code AJP, Fare Group null, long description Basel SBB

It looks like these duplicate records can/should be merged, and the UIC is the primary key used by the RJIS system. The UIC is not always entirely numeric.

--
0 'A' records - ignore this

--
9,433 'R' records - these are simply UICs and Railcard Codes.

Each 'L' Location will have zero more Railcard codes valid for that location

This could be stored e.g. in a
RailcardLocation table

RailcardCode character
LocationUIC integer

which would be the composite primary key. Does not belong here.

--
837 'G' records - these are essentially 'Group' markers for 'L' records. In other words a record here indicates that a given 'L' record is a 'Group' destination. This includes things like 'Barnstaple+Bus', where the link is made to the relevant station for the bus service.

--
1238 'M' records.This represents a mapping between 'L' records that are flagged 'G' type and other 'L' records. For each 'L' of type 'Group' will be linked to one or more 'L' records not of type 'Group'.

For example,
RHIGOS BUS
TRECYNON BUS
ABERDARE BUS
ABERDARE+BUS
are all 'L' records marked as Groups using a 'G' record

Each of these has one 'M' record, which links them to the ABERDARE 'L' record

EDENBRIDGE STNS is also an 'L' record marked as a 'G'
It has two 'M' records linking it to Edenbridge Town and Edenbridge.

--
332 'S' records.

These are synonyms, mostly for foreign stations, but for UK station groups it appears to include a denormalised count of the number of the stations in the group; e.g., for HERTFORD STATIONS there is a synonym HERTFORD (2), for LONDON TERMINALS there is LONDON (19).

Junk data.
--

The Clusters table consists of a set of Cluster Ids which reference one or more 'L' Locations OR one or more counties.

These data are not necessarily properly referential either, as North Woolwich is still contained in one cluster.

This structure is a bit of a cluster f***, so to speak, because a cluster can be linked to one plus counties. This appears to have been done purely for the convenience of Southwest Trains on their fares to Jersey and Guernsey - rather than linking to each station, they link to a county, which then links to multiple stations.

The applicable records all have a Cluster NLC with prefix 'CC'.

The counties are not explicitly defined in the database.

The counties can be determined by querying the 'L' Location, so the clusters linking to CC39 reference all Location records with County equal to '39'

--
The Flows table consists of just over half-a-million entries.

A flow has an origin and a destination and may or may not be reversable (valid in the opposite direction). About 60% of flows are not reversable.

An origin/destination can be one of the following:

* The NLC Code for an 'L' Location as described above
or
* A Cluster Id

Generally speaking, a Cluster is not a valid ticket location, but just to make the database more of a mess, there are three cases where this is not the case:

LONDN CTL FCC-C1
FCC E SUSSEX S1
FCC SUSSXDWNS S1

The NLC codes for these 3 are also clusters, and hence these are virtual locations linked to multiple stations.

There is at least one flow where there is an active flow but the location is inactive; this is MANCHSTRMTLK-C11 to cluster S081, so care needs to be taken with referential integrity.

In addition to origin and destination, a flow is defined by the route code ('any permitted', 'not via London', etc.)

Note that 'status code' is always Adult in this table. In addition, certain records are defined as the sum of sub-records - these are mostly London fares (fixed add-on component for Z12* etc.) and plus buses. The sub-records, type 'C' do not appear to be useful, and should be dropped. Most records are NOT summed, so keeping these is not likely to be a useful optimization.

Each flow is unique to a TOC; obviously another TOC could have a duplicate flow.

A flow has one or more fares: a fare consists of a flow, ticket type (SVR, SDS, etc.) and restriction code (e.g., 'BE') indicating when it is valid.

There are 3.9 million flow fares.

--
The non-derivable flow fares consist of two files containing around 18 million records, non-derivable fares and non-derivable fares overrides.

The two files can be stored in the same file, but include a flag to mark those that came from the second file.

Discarding those not applicable on the first day of the fares period seems wise, and reduces this to 8.4 million records.

BRfares does not get this right, for example, there is a record in the non-derivable fares file, which is Newark Stations - Nottingham CDR, Family +Friends Railcard discount, Adult Fare £3.75, Child Fare £1.10; however in the Non-Derivable fare override file, there is a 'supress' marker on this fare.

Sure enough, at networkrail.co.uk, a journey from Newark to Nottingham for 2 adults and 1 child suggests CDRs at £14.80. Adding a Family +Friends Railcard causes the ticket type to change to SDR, because the CDR is suppressed.

Anyway, after importing the 8.4 million records, running this clean-up script

select
n1.id n1id, n2.id n2id, n2.suppress
into #junk
from NonDerivableFares n1
join NonDerivableFares n2 on
n1.OriginCode = n2.OriginCode and n1.DestinationCode = n2.DestinationCode
and n1.RouteCode = n2.RouteCode and n1.TicketCode = n2.TicketCode
where n1.override = 0 and n2.Override = 1
and (n1.railcardcode = n2.RailcardCode
or (N1.RailcardCode is null and n2.railcardcode is null))

delete nonderivablefares
from nonderivablefares
where id in (select n1id from #junk where suppress = 1) or id in (select n2id from #junk where suppress = 1)

delete nonderivablefares
from nonderivablefares
where id in (select n1id from #junk where suppress = 0)

This cleans out 2.3 million+ records. (BRfares misses some of these.)

Leaving just over 6 million.
 
Last edited:

Paul Kelly

Verified Rep - BR Fares
Joined
16 Apr 2010
Messages
4,134
Location
Reading
The Clusters table consists of a set of Cluster Ids which reference one or more 'L' Locations OR one or more counties.

These data are not necessarily properly referential either, as North Woolwich is still contained in one cluster.

This structure is a bit of a cluster f***, so to speak, because a cluster can be linked to one plus counties. This appears to have been done purely for the convenience of Southwest Trains on their fares to Jersey and Guernsey - rather than linking to each station, they link to a county, which then links to multiple stations.

Counties are also used for fares to Douglas (Isle of Man). And the spec also seems to say that clusters can contain references to all locations within a given zone (presumably a London zone), but doesn't say how that is indicated and there don't seem to be any current examples.

BRfares does not get this right, for example, there is a record in the non-derivable fares file, which is Newark Stations - Nottingham CDR, Family +Friends Railcard discount, Adult Fare £3.75, Child Fare £1.10; however in the Non-Derivable fare override file, there is a 'supress' marker on this fare.

Sure enough, at networkrail.co.uk, a journey from Newark to Nottingham for 2 adults and 1 child suggests CDRs at £14.80. Adding a Family +Friends Railcard causes the ticket type to change to SDR, because the CDR is suppressed.
Hmmm - Avantix Traveller (at least NFM12, the newest one I have installed on the computer I'm sitting at right now) also shows this Newark to Nottingham CDR as available with FAM discount. I wonder does Avantix Mobile???
 

soil

Established Member
Joined
28 May 2012
Messages
1,956
Here's another bit of data weirdness:

This is from the 'Non-derivable fares override table'

R1016126800000 7DSO030120130201201316112012N0000318000001590 YNN
R1016126800000 7DSO180520130401201316112012Y

That probably looks impossible to read, but character 2-5 is the origin (Cannock), and 6-9 is the destination (Stafford). The next five digits are the route code (00000 - any permitted), the spaces are the railcard (none), and 7DS is the ticket type, the next character is always O, and then the next twelve characters are the end and start date.

The two lines are not the same length, because the second, which ends with 'Y' is intended to suppress the previous record!

However, if we look at the previous record, we can see:

03012013
02012013

I.e. this record is only valid between 2/1/2013, and 3/1/2013, that is for one day. The given fare is £31.80 for an adult ticket, and £15.90 for a child ticket.

whereas the suppress record is valid between 4/1/2013 and 18/5/2013.

If you look at the non-override Non-derivable fares file, there are no records

Looking in the flows file there is one flow defined, for Cannock - Stafford Any Permitted, which is flow id 0033454. This is defined as valid from 02/01/2013 through eternity.

The 7DS fare defined in the flow file is £51.30

So to sum up we have

Cannock - Stafford flow fare Any Permitted 7DS, defined as £51.30 from 2/1/13 onwards
Cannock - Stafford non-derivable override fare Any Permitted, 7DS, defined as £31.80 from 2/1/13 to 3/1/13
Cannock - Stafford suppress the '7DS record Any Permitted', between 4/1/13 and 18/5/2013

There is something wrong with this, since the supress record would appear to suppress the flow fare of £51.30 meaning that no Any Permitted 7DS is available. I believe however it is only intended to suppress the £31.80 fare.

According to http://ojp.nationalrail.co.uk/service/seasonticket/tickets, there is an available 7DS Any Permitted, and the price is £51.30

It does say in the spec

Suppress_mkr
"If the value is Y this indicates that the type N record in the derivable fares file with the same ORIGIN_CODE, DESTINATION_CODE, ROUTE_CODE, TICKET_CODE and RAILCARD_CODE should be suppressed (ie treated as if it does not exist) between
the START_DATE and END_DATE on this record. Records with this field set to Y will contain spaces in all the fields following this one."

I had assumed that 'suppress' records could suppress flow fares, but the spec does not say this and indeed it appears to be the case that 'suppress' records only relate to other override records.

Hence regarding my previous clean-up script, this can be extended to drop all the suppress records, given that the script has already dropped the suppressed override records for the given reference data.

The 'end date' is still an issue though:

'END_DATE Last date for which this record can be used.
Format is ddmmyyyy. A high date
(31122999) is used to indicate records
which have no defined end date. '

This would suggest that end date is an inclusive data, i.e. that the £31.80 season would be valid up to the end of 3/1/2013.

However the question is "did the Cannock - Stafford 'Any Permitted' season really increase from £31.80 to £51.30 on January 4th"? It seems quite an excessive increase.

Either way, as the reference data is arbitrary and the NFM is valid for an open-ended period, it perhaps might be better to choose a later date than 3rd January for the reference date?
--- old post above --- --- new post below ---
Counties are also used for fares to Douglas (Isle of Man). And the spec also seems to say that clusters can contain references to all locations within a given zone (presumably a London zone), but doesn't say how that is indicated and there don't seem to be any current examples.

Yes correct on both counts.

My memory was a bit hazy on Douglas, I looked at this a few days ago so was doing it from memory....

All the records join to either a county, location, or cluster, so the 'zone' seems to be unused.

--
I re-imported the Non-Derivable fares, including all dates, due to my concerns about the data quality, as above. This gives 18,455,566 records.

I ran this query:

delete n2
from nonderivablefares n2
where n2.suppress = 1
and not exists (select *
from NonDerivableFares n1
where
n1.OriginCode = n2.OriginCode and
n1.DestinationCode = n2.DestinationCode and
isnull(n1.RailcardCode,'') = isnull(n2.RailcardCode ,'') and
n1.RouteCode = n2.RouteCode and
n1.TicketCode = n2.TicketCode
and n1.Suppress = 0)

which deletes all suppress records for which there is no matching non-derivable fare.

This eliminated 1,445,777 records.

Regarding the issue of records being suppressed after they had expired, I ran this query:

select
count(*), n2.StartDate, n2.EndDate, n1.StartDate, n1.EndDate
from NonDerivableFares n2
join NonDerivableFares n1 on
n1.OriginCode = n2.OriginCode and
n1.DestinationCode = n2.DestinationCode and
isnull(n1.RailcardCode,'') = isnull(n2.RailcardCode,'') and
n1.RouteCode = n2.RouteCode and
n1.TicketCode = n2.TicketCode
where
n2.suppress = 1
and n1.Suppress = 0
and n2.StartDate > n1.EndDate
and not exists
(select * from NonDerivableFares n3
where
n3.OriginCode = n2.OriginCode and
n3.DestinationCode = n2.DestinationCode and
isnull(n3.RailcardCode,'') = isnull(n2.RailcardCode,'') and
n3.RouteCode = n2.RouteCode and
n3.TicketCode = n2.TicketCode
and n3.Suppress = 0 and n2.StartDate <= n3.EndDate)
group by
n2.StartDate, n2.EndDate, n1.StartDate, n1.EndDate
order by count(*) desc

It gives this output:

318682 2013-01-22 2013-03-31 2012-12-27 2013-01-21
1590 2013-01-04 2013-05-18 2013-01-02 2013-01-03
42 2014-01-02 2999-12-31 2013-01-02 2014-01-01
28 2014-01-02 2999-12-31 2012-09-02 2013-01-01
7 2013-01-02 2999-12-31 2012-09-02 2013-01-01

As you can see there are 318,682 override records that expired on 21st January that were suppressed from 22nd January, 1,590 records that expired on 3 January but were suppressed from 4th January, and so on.

I am not sure what this is supposed to mean.

But the data definitely need some clean up.
--- old post above --- --- new post below ---
Ok, so I'm an idiot.

The date for the fares download was 30th January 2013.

Therefore there's no point in looking at data for 3rd January 2013, because the data were valid on 30th January, but not before.

Excluding non-derivable fares that expired before 30th January, or become valid after 30th January and clean-up of the overrides, I got 11.4 million non-derivable fares.
 
Last edited:
Status
Not open for further replies.

Top