Information and how to make it useful

by Richard 22. May 2009 18:31

Well ok, maybe not the whole topic in one blog post! What I would like to to do is start a conversation recommendation 14 – a UK version of data.gov

The Power of Information Task Force flagged up that one of the main problems with UK government information is finding out what we have published, what form it is in, and how it can be used; we are looking at how we might do this.

Any solution must support open standards and would ideally be open source, but there are a couple of other questions we are pondering at the moment:

  • What characteristics would be most useful to you – feeds (ATOM or RSS) or bulk download by e.g. FTP, etc?
  • Should this be an index or a repository?
  • Should this serve particular types of data e.g. XML, JSON or RDF?
  • What examples should we be looking at (beyond data.gov e.g.http://ideas.welcomebackstage.com/data)?
  • Does this need its own domain, or should it sit on an existing supersite (e.g. http://direct.gov.uk)

Let us know any and all thoughts – we will pick up twitter comments with #poit or #opendata. In the meantime, you can find some of the government's published data sources on this data wiki (thanks to Rewired State).

Comments

5/27/2009 5:35:36 PM #

The main issue is to get access to real, raw data. I was just looking over some crime stats, and those are all predigested into statistics that I can't learn anything from that hasn't already been reported. Give us raw data, not reports in Excel tables. This is really the biggest issue that makes the most difference for how useful the data will be.

As for your questions: Yes, please provide feeds. Bulk downloads (and, thus, a repository) would be awesome, but if that means it'll take three years to get started, I'd say go for the index - at least at first.

A lot of data is best presented in CSV. XML and JSON are more for structured data or for API responses. For tabular data in files, CSV is the best choice.

Great to see these kinds of things happening, I hope more European countries will follow your lead!

Robert Kosara

5/27/2009 5:35:36 PM #

The main issue is to get access to real, raw data. I was just looking over some crime stats, and those are all predigested into statistics that I can't learn anything from that hasn't already been reported. Give us raw data, not reports in Excel tables. This is really the biggest issue that makes the most difference for how useful the data will be.

As for your questions: Yes, please provide feeds. Bulk downloads (and, thus, a repository) would be awesome, but if that means it'll take three years to get started, I'd say go for the index - at least at first.

A lot of data is best presented in CSV. XML and JSON are more for structured data or for API responses. For tabular data in files, CSV is the best choice.

Great to see these kinds of things happening, I hope more European countries will follow your lead!

Robert Kosara

5/27/2009 9:16:55 PM #

Personally I think you need more careful consideration to actually getting the data from the wonks into some sort of format online. Really Robery above has it right and the barrier to overcoming that is not nice open standards or syndication methods. Its explaining the data owners why this is important and how to do, and to perhaps provide a platform to make the data available easily. In my experience trying to explain to the economists and statisticians (and sometimes, politicians) that it is good open and un-massaged data that we need to release, rather than lovely excel spreadsheets with artfully build pivot tables, or 'report's with expert commentary is going to be the bigger battle than us all agreeing in geek world that it should be done and how.

The technology can wait, and is easily solved, getting the data is the key.

Whitehall Webby

5/27/2009 10:42:57 PM #

Pingback from blog.helpfultechnology.com

Cui bono? The problem with opening up data at  Helpful Technology

blog.helpfultechnology.com

5/28/2009 12:32:55 AM #

Regarding formats: RDF please! And not just RDF: link it in to existing resources as part of Linked Data. If you get stuck, irc.freenode.net #swig

Please, no new government formats. I've seen so many badly designed formats put together by large government bodies or industry bodies with government breathing down their neck. They are open in as much as you can sit and read 9,000 pages of XML muck.
Regarding where to put the data: on the existing sites. There are plenty of places to look for data already. For something like crime stats, each police force has their own site, and the Home Office has a site. For education figures, put them on the DfES site. Put them at a reliable, sensible URI that won't be changed willy-nilly. Cool URIs don't change after all, just like cool telephone numbers and cool e-mail addresses. You may then want to pull together a central index. The fact is that each government body ought to be responsible for putting out decent quantities of data.

Examples: OpenLibrary.org, bbc.co.uk/programmes, dbpedia.org, libris.kb.se (the Swedish version of the British Library), GovTrack.us and, yes, data.gov. And, best of all, the London Gaze tte - look at the work done by Jeni Tennison and John Sheridan at TSO and OPSI respectively. Don't create new stuff: follow the work done on these services. And don't be afraid to ask - there's a big community of people who can help.

Data isn't just for mashups. There are plenty of really easy wins: add hCards to addresses all across government websites. There are plenty of things that can be done with this.

Tom Morris

5/28/2009 12:32:55 AM #

Regarding formats: RDF please! And not just RDF: link it in to existing resources as part of Linked Data. If you get stuck, irc.freenode.net #swig

Please, no new government formats. I've seen so many badly designed formats put together by large government bodies or industry bodies with government breathing down their neck. They are open in as much as you can sit and read 9,000 pages of XML muck.
Regarding where to put the data: on the existing sites. There are plenty of places to look for data already. For something like crime stats, each police force has their own site, and the Home Office has a site. For education figures, put them on the DfES site. Put them at a reliable, sensible URI that won't be changed willy-nilly. Cool URIs don't change after all, just like cool telephone numbers and cool e-mail addresses. You may then want to pull together a central index. The fact is that each government body ought to be responsible for putting out decent quantities of data.

Examples: OpenLibrary.org, bbc.co.uk/programmes, dbpedia.org, libris.kb.se (the Swedish version of the British Library), GovTrack.us and, yes, data.gov. And, best of all, the London Gazette - look at the work done by Jeni Tennison and John Sheridan at TSO and OPSI respectively. Don't create new stuff: follow the work done on these services. And don't be afraid to ask - there's a big community of people who can help.

Data isn't just for mashups. There are plenty of really easy wins: add hCards to addresses all across government websites. There are plenty of things that can be done with this.

Tom Morris

5/28/2009 1:07:19 AM #

There is a company in the States that is focused on this very problem. They call the category "Social Data Discovery"
Called Socrata

www.socrata.com

Mark

5/28/2009 7:49:54 AM #

Me three. It's the data that's the key. Look at the school exam results fiasco where the fourth estate 'complain' that they need a 2 week embargo in order to analyse and publish the data, but the same data is NOT freely available to the public. And the departmental response is that "we'll see what we can do" and is looking to create an exemption. Answers to bullets in order:
1. It depends on the data. Both.
2. Index. Require each data holder to make it available in their own domain. Hold a central index.
3. No opinion. Sorry.
4. http://www.osgeo.org/geodata/repository
5. No. Cabinet Office, Direct Gov, and local domains will be fine.

Feargal Hogan

5/28/2009 8:02:12 AM #

I've two comments to add to the mix, relating to RSS and domains.

A large or complex=2 0website can have a number of different RSS feeds. Whilst this allows an individual to subscribe only to the areas of the site that provide a specific interest or subject matter, very often an aggregation of the site's feeds would be useful yet is not made available. One site that does this very well is Island Blogging, a digital community of (currently) 60 individual blogs where an aggregated RSS feed has been made available of the latest blog posting for each blog. See feed://islandblogging.co.uk/?bdprssfeed=2

This means that any interested party can add the aggregated feed to their news reader and see at a glance which blogs have been updated, with out having to subscribe to each one in turn. I think of this as an added-value resource, simplifying information gathering and concentrating the resource into a single place. It works because all 60 blogs are focused on the lives of Scottish islanders and, whilst the subjects of individual blogs vary wildly, there is a common theme and so the aggregation makes sense.

Thus I believe an aggregation of information streams from government sources, loosely grouped into subjects of interest, policies, and so forth, would be a valuable addition to any information dissemination policy.

With regard to your question about domains and supersites such as Directgov or Business Link, I believe the supersite approach is the way to go, but getting the information architecture right is paramount or the site will just be a muddle. The government policy of ration alising down the explosion of government domains to a one-stop-shop Directgov for citizens and Business Link for businesses was a brilliant concept, even it is has a major flaw that has yet to be fixed.

Directgov works well, but its weakness is that officially it only applies to England. This is the cause of much irritation in Scotland and Wales. One example of this is Local Directgov (www.direct.gov.uk/en/Diol1/DoItOnline/DG_073190). For example a resident in the New Forest using Directgov can access a deeplink to NFDC's website to arrange a bulky waste collection, by entering a postcode (eg SO45 1YG) and being provided with a link. Use a Scottish postcode (try HS2 9DU) and an unrecognised postcode message is returned, and a list of Scottish councils provided, and the citizen is expected to find their way from there (hardly helpful nor especially 'empowering'!).

Councils are encouraged to link their websites to Directgov for services such as the brilliant car tax renewal facility (a UK resource), but its very much a one way street. There is little content on Directgov relating to devolved or local government services in Scotland and Wales, so councils in this areas question, with valid justification, why they should be promoting Directgov at all.

I hope this helps.

John Fox

5/28/2009 8:05:40 AM #

Taking each question in turn…

"What characteristics would be most useful to you – feeds (ATOM or RSS) or bulk download by e.g. FTP, etc?"

Different applications here. Feeds are useful for modern web apps and t he like. FTP is good for browsing and mirroring. If you want to make the geeks really happy, offer rsync support so that it's all easy to mirror.

"Should this be an index or a repository?"

Both, potentially, but it should aim to be a repository.

"Should this serve particular types of data e.g. XML, JSON or RDF?"

I think this needs to be investigated on a per-application basis. XML/REST and JSON interfaces are good for web services, and some XML-based formats are good for raw information. RDF is really about the ontologies than the data itself, and would presumably sit as a richer layer between something like RSS/Atom and the guts of the data.

"What examples should we be looking at (beyond data.gov e.g. http://ideas.welcomebackstage.com/data)?"

For APIs (rather than data downloads), look at the web apps out there that excel at this: Twitter, Backpack, Lighthouse, and so on.

For structured information, the work the BBC did on bbc.co.uk/programmes/ (where every brand/series/programme can be retrieved as RDF, amongst other things) is a good place to start.

"Does this need it's own domain, or should it sit on an existing supersite (e.g. http://direct.gov.uk)?"

I think it's worthwhile keeping it distinct and separate from any specific branding which could well be transient. W hile DirectGov is the endpoint du jour, there's no guarantee that the next parliament (or even this one) wouldn't decide that the public doesn't respond well to the name and decide to change it. In contrast, a data.gov equivalent needs to be put somewhere and stay there - changing it undermines a good chunk of the effort.

Other thoughts:—

• Host downloads of example applications (open source, in various languages) which build on the data or otherwise manipulate it. There's no requirement that these would have to be developed internally, though.

• Documentation. In plain English. avoid buzzwords like the plague: be clear, concise, explicit, and most of all, current.

• Any software/scripts/etc. produced as part of the development of the effort which would be potentially valuable to others should be released as open source under a permissive (e.g., MIT-style) license.

Mo McRoberts

5/28/2009 9:43:23 AM #

I spent a large chunk of my life trying to persuade the Office for National Statistics to do precisely this, from the inside. Overall, I failed.

As Jeremy rightly observes, the starting point shouldn't be the (delivery) technology. Once we've got the raw data coming into a decent database, it should be easy to spit it out in whatever format people might want. JSON? XML? RDF? CSV? RSS? Atom? Yes, yes, yes, yes, yes, yes. Get one of them working, and the others will be easy.

No, the hard part is getting the data in the first place. Part of it is cultural. Statisticians are protective perfectionists. Most simply don't like the idea of letting ordinary people get at the data. (There's some justification for this, when you see how media or politicians have twisted their work in the past.)

And part of it is historic. Issuing data as data to the public has never really been possible before. There haven't been any direct precedents... until now.

There's actually plenty of data available - if you know where to look. The National Statistics website does contain a pretty thorough index of what's available; but usability is still a serious problem, despite my best efforts, and the outcome is usually PDF or XLS (although for the record, there's a fair bit of CSV in there too).

What we need, I think, are some demonstrators. Grab a few datasets likely to interest the geeks, get them 'out there' - using Google Spreadsheets if you have to. Then let's see what the community does with them.

And20keep in mind, the next Census is just around the corner (March 2011, results probably a year later). There's already a commitment to offer an API on this data. Maybe we - in the broadest possible definition - should be targeting that.

Simon Dickson

5/28/2009 12:16:39 PM #

A good place to start would be opening up information about international development projects and programmes.

Owen Barder

5/28/2009 8:41:04 PM #

Characteristics & Types: It completely depends on the kind of data that's held and on what you want to do with it -- but this doesn't really matter now. To echo several other comments: it's most important to get it out there. Formats can be considered later. For now, just zip up the raw data, in whatever format it is held, and get it online.

Index or Repos: Both.

Examples: ONS have some plans for releasing data along with canonical interpretations of it, which is an interesting idea. If I were government, though, I wouldn't worry about this bit yet. Once the data is out there people will do things with it, and useful things that government could do to help will become clear.

Domain: couldn't matter less.

Harry

5/29/2009 6:54:41 AM #

Why is government and its workers such a bunch of uber-perfectionists ?

Rather than worrying about the detail, release some, see what happens and learn for the next time.

When you learned to walk, you were allowed to fall over a few times.

Please understand that we will forgive you if what you release requires work on our part to improve.

We would all just quite like to get started.

Failure is an option ; delay and prevarication ( whilst your most likely preference ) would be sad.

alex

5/29/2009 4:20:05 PM #

I agree with Alex. That is what we're doing in Kent with our Pic & Mix project. I've thrown in some thoughts on Steph Gray's blog here blog.helpfultechnology.com/.../

Noel

5/29/2009 7:33:30 PM #

[FULL DISCLOSURE - I'm the CEO of Socrata and we offer technology solutions that help government agencies publish data online]

Lots of great comments in the discussion thread. The key to success is to err on the side of being pragmatic over idealistic. Get some datasets online as quickly as you can. Evolve the number of datasets over time. Offer them in as many human and machine readable formats as you can, but don't let the absence of any specific format hold you back from getting something online straight away.

As for JSON, XML, API, RDF and other formats - yes to all of the above, plus let people download in CSV and XLS, let them propagate data by embedding it in their blog posts and websites. Let people comment, discuss and rate the data. Let them create and save filters against the data. Let them subscribe to updates to the data via RSS/ATOM. Let citizens suggest new datasets.

Socrata has authored a whitepaper describing how governments can unlock vast amounts of public data by implementing social data discovery solutions. You can download the whitepaper at http://www.socrat a.com/about/download-whitepaper as well as learn more about Socrata and our Socrata Social Data Network at http://www.socrata.com.

Kevin Merritt

5/31/2009 8:21:16 PM #

How about supplementing this with "appropriate data types". For example, if the data includes geodata, publish it as geoRSS and KML?

Tony Hirst

6/1/2009 5:28:22 PM #

I would be most interested to see REALTIME data sets: energy usage, climate figures, industrial output, even immigration figures etc. To me this is way more useful than curated static historic data (not that I don't see the value in those - just that realtime is what's often missing). This should be provided via a service like Pachube (www.pachube.com) so that it can adequately cope with both realtime-brokering of XML, JSON, RSS, ATOM (etc) and cross-analysis with other international or publicly contributed datasets.

Historic data provided by the government is fine for determining things retrospectively, and creating beautiful visualisations and graphs, but realtime data enables on-the-fly decision making and response. In other words, it enables creating a new class of interconnectedness between people and their political structures.

Terry M

6/1/2009 8:05:57 PM #

+1 to RDF, it simplifies merging and querying disparate datasets.

As well as the linked data cloud, note the existing work in the uk gov, and the BBC :

esw.w3.org/.../

webbackplane.com/.../more-rdfa-goodness-from-uk-government-web-sites

blogs.talis.com/.../...ng-coherence-at-bbccouk.php

Danny Ayers

6/1/2009 8:53:11 PM #

Definitely RDF. There are now so many tools that support RDF now, and I think it's time to start really pushing RDF into the mainstream. RDF is so much easier to use that ye olde XML or RDBMS when it comes to being the ultimately flexible format for integrating data and sharing data.

I'd also recommend contacting the Ordnance Survey geosemantics team:

http://www.ordnancesurvey.co.uk/oswebsite/ontology/" rel="nofollow">www.ordnancesurvey.co.uk/oswebsite/ontology/

as we are doing lots of work with RDF. We have RDF for the administrative geography of GB that might be useful for you to reuse - though we will be updating it shortly to account for recent boundary changes. The version as it stands can be found here:

http://www.ordnancesurvey.co.uk/oswebsite/ ontology/AdministrativeGeography/v2.0/AdministrativeGeography.rdf

John Goodwin

6/1/2009 8:54:06 PM #

Mmm...left hand/right hand...

Two surprising questions. First:

> Should this serve particular types of data e.g. XML, JSON or RDF?

It's odd not to mention RDFa, given that some UK government sites are already leaders in this space.

And the second surprising question:

> What examples should we be looking at (beyond data.gov e.g.http://ideas.welcomebackstage.com/data)?

How about looking at some of your own web-sites?

(Try http://bit.ly/QQ0Uf for some pointers.)

Ok, I'm being a little flippant.

But it do es surprise me that the question is being posed as 'what can we learn from elsewhere', when actually there are many places close to home where we're already leading the way.

Regards,

Mark

Mark Birbeck

6/1/2009 9:00:15 PM #

@Tony Hirst - curious about what you mean when you say RDF is too cluttered??

@Mo McRoberts - no RDF is very much about the data. RDFS or OWL is about the ontologies.

John Goodwin

6/1/2009 10:12:28 PM #

@Tony Hirst - curious about what you mean when you say RDF is too cluttered??

@Mo McRoberts - no RDF is very much about the data. RDFS or OWL is about the ontologies.

John Goodwin

6/2/2009 10:37:01 AM #

@John Goodwin

I know RDF makes life easy for writing all sorts of wonderful queries, but if I look at some RDF, or look at a SPARQL query, there is so much syntactic stuff floating around it makes me go bleurghhh and give up before I start.

I have a hard enough job persuading people that it's not that hard to write site:ac.uk in a Google query to do a site limited search, or persuade them they haven't broken their browser if they see raw (or even templated) RSS.

So horses for courses. Tools and representations need to be appropriate for the audience you want to be able to use them.

If gov data is just released as RDF, I probably won't play with it for a long time, because there are lots of other things around that are closer to being usable by real people, and I only have so much time to play...

Tony Hirst

6/2/2009 2:23:53 PM #

"Any solution must support open standards and would ideally be open source"

+1 vote to RDF without doubt.

The owner of the content should only be responsible for creating and maintaining semantically enabled RDF data - the data they decide to make public.

Along then comes an intermediary and creates other formats of that data which are of value to the mashers, journalists, citizens and aggregators like RSS or .csv files which can then be derived from the RDF.

OK, I grant you that might be seen as a more skilled job at the moment (learning SPARQL is not a barrel of laughs) but this is partly because the languages are so new and tools are only now being formed.

"Should this be an index or a repository?"

If organisations were generating RDF - maybe alongside traditional RDBMS stores - then the place to keep it would be as close as possible to where they generate it.

I have no experience to share but keeping RDF data in a central place seems counter productive to me - you create all kinds of problems about currency and send mixed messages about perceived ownership... when you update a record in an RDBMS then the corresponding RDF file should update too - why wait for another http or ftp process to go wrong?

yourdept .gov.uk/buildings.html
yourdept .gov.uk/buildings.rdf
yourdept .gov.uk/people.html
yourdept .gov.uk/people.rdf

"The POI Task Force flagged up that one of the main problems with UK government information is finding out what we have published, what form it is in, and how it can be used; we are looking at how we might do this."

Stick to RDF using an open format that anyone can dereference and understand, and then tell the world where it is kept, e.g. at the moment you go tell an RDF directory:

http://pingthesemanticweb.com/

It is quite possible that eventually search engines come looking for your RDF data anyhow, you will signpost to it from robots.txt, sitemaps, and then from meta tags in the head of your html versions of your content.

There should be no problem in Direct.gov.uk spidering all gov uk domains for instances of rdf files if a central directory needed to be kept.

I can of course quite see that in central government departments the question of 'to RDF or not' might seem like a moot point if interoperability has little or no value, but when you shift the question to local government then surely the weight of argument becomes simply overwhelming in favour of using the open standards of RDF.

Distict A publishes their car park information as a csv like this:

id, name, lat, lng, spaces, handicapped
23, "Market St", -0.655, 5.34, "56", "2"

District B publishes their car park information as a csv like this:

car_park, easting, northing, total_spaces
'Market Street, Leamington Spa', 134644, , 475688, 67

Now District A and District B may well have "published open data" as csv files, but its not in a shared public format that will allow anyone else (including themselves!) to compare or easily merge or extract data from a set of them.

Which is a string, which an integer, do we convert eastings to degrees lat, how do we handling mismatch of values etc. After trying to mash 5 of them you'd be ready for a lie down, and we KNOW that the formats would change as would the file addresses.

File addresses of rdf files might change too, I agree, but they more are likely to mirror the output of your RDBMS, whereas the csv file could just be lobbed into a shared space by a temp.

"We need a standard!" We've got one, its called RDF
"We need a family of ontologies to describe the data" We've got them, and what's more they are mature enough to use (IPSV, LSSL, DC).
"We need something to add to our CMS to create RDF files", ah, now thats trickier ... but there are some Open Source folks, and a wedge of EU money addressing just that now ...

http://www.iks-project.eu/

[full disclosure: I attended the requirements gathering workshop]

We need more than a shared space containing csv files.

Paul Geraghty

6/2/2009 9:24:41 PM #

@Tony Hirst - interesting. I'm not a database expert by background - got into RDF/semantics long before RDBMS. I just genuinely find RDF/SPARQL sooo much easier to use than RDBMS/SQL or XML/XQuery.

John Goodwin

6/3/2009 8:57:23 AM #

Also meant to say that our (beta) RDF for the administrative geography of GB is hosted as linked data by the University of Southampton here:

http://os.rkbexplorer.com/

John Goodwin

6/3/2009 2:42:17 PM #

@John, neat, didn't know that existed.

BTW the url http://www.rkbexplorer.com/explorer/ returning blank page.

Paul Geraghty

6/3/2009 7:37:42 PM #

@Paul - thanks. Currently updating it to account for GBs ever changing administrative boundaries  

Southampton were kind enough to host this for us, but at some stage we would like to host it ourselves in a proper linked data way.

http://www.rkbexplorer.com/explorer/ seems to work for me in Firefox. It does require Java and can take a while to load. If you have problems let me know and I'm mention it to the Southampton Uni guys.

John Goodwin

6/4/2009 1:56:19 PM #

I read a mention of offering Excel files up there somewhere - please, please don't! Stick to standard formats.

As long as the formats and APIs are open, standard and well documented, I'll be happy to handle any translation / conversion I need. Offering XLS files, even in addition to standard formats, will only prop up this harmful assumption that everyone has MS Office, and is a waste of resources.

Excel can read CSV. Please don't pander to Redmond.

Geoff

6/4/2009 2:09:01 PM #

The less *you* have to do, the sooner *we* get the data.

What is the very least you can do to get as much online as soon as possible?

Phase one - one big website with every thing dumped on as it is (csv, mdb, xls, xml, doc, etc) - open it to the public.

Phase two - create a taxonomy, migrate all data files into the taxonomy - open it to the public.

Phase three - create any conversion processes required - open it to the public.

The government should not be putting data in here as an afterthought at the end of processing.

The government should be putting the raw data here as their very first step. Then they should be processing it from this repository just a the public will.

No government document should ever cite or include any data that is not referenced directly from the new publicly accessible store.

It must be the source of all data and an input to every process/action (apart from fitst capture of course!) - it should not be a repository to dump databases when they are finished with.

Postcode, OS and tidal data should all be made publ ic domain and included immediately.

pp

6/4/2009 2:56:34 PM #

Use FOSS Principles:

Release Early,
Release Often.

The Open Sourcerer

6/4/2009 5:46:19 PM #

Two comments

@John Fox. There are devolved versions of direct.gov.uk, such as nidirect.gov .uk. But if you didn't realise that Northern Ireland is still part of the United Kingdom you might not have looked for it. Nevertheless, a search on one site, with a postcode from another region, should automatically drop you in to their tools - something that requires some work.

On RDF - the point here is that semantic web data is searchable and computable by programs running on other computers. See, e.g., the winners at http://challenge.semanticweb.org/ . So marking up your information using semantic web standards is a good thing. You could even build your wiki sites using Semantic MediaWiki. The next step is producing human interfaces to all that data. To do that, give money to projects offering to do that, be the social efforts (e.g. MySociety), or projects seeking EU Framework Funding. In either case, involve the semantic web experts in the British Isles, DERI at University College Galway.

David Newman

6/4/2009 5:46:19 PM #

Two comments

@John Fox. There are devolved versions of direct.gov.uk, such as nidirect.gov .uk. But if you didn't realise that Northern Ireland is still part of the United Kingdom you might not have looked for it. Nevertheless, a search on one site, with a postcode from another region, should automatically drop you in to their tools - something that requires some work.

On RDF - the point here is that semantic web data is searchable and computable by programs running on other computers. See, e.g., the winners at http://challenge.semanticweb.org/ . So marking up your information using semantic web standards is a good thing. You could even build your wiki sites using Semantic MediaWiki. The next step is producing human interfaces to all that data. To do that, give money to projects offering to do that, be the social efforts (e.g. MySociety), or projects seeking EU Framework Funding. In either case, involve the semantic web experts in the British Isles, DERI at University College Galway.

David Newman

6/4/2009 9:57:48 PM #

* Release early release often.
* Stick the data up in whatever format is a reasonable format for that data. If someone wants JSON rather than XML then that can be added later (more importantly they'll be able to do it their selves if they cannot wait)
* Make it rsyncable. FTP is pretty h orrible, enable sftp for non-geeks.
* Over time make the data put out by different people have the same semantics. Require police forces outputting data use certain csv/rdf fields and data formats

Dean

6/5/2009 5:48:33 AM #

So that is now 38 comments from the web users and interested parties, and I am not sure if the digital engagement team has acknowledged one of them ?

Is this digital engagement or digital one-way " we will ask your opinion but it does not matter what you say "

Please try and show by your actions that you are sincere about this.

Should I eat my hat if anyone in government says thank you to the 38 commenters ?

alex

6/5/2009 10:36:26 AM #

I think the key here is to make use of collective intelligence. We are talking about a large volume of information that will be accessed by a large audience. You need to be able to categorise your audience and then let them tell you what information they found the most useful. You can then use this information to build a more intelligent search capability, so that information is easier to find.

Andy Charlton

6/5/2009 11:02:04 AM #

Putting to one side the issues about the non-technical mechanisms for getting at this data (undoubtedly the hardest bit..), making it available as *easily as possible* via standards and systems that have emerged and proved themselves to be successful seems the sensible route. I'm talking REST rather than SOAP, RSS/JSON rather than RDF. (The comparisons aren't fair, but the point is that easy is good, hard is bad)

Actually, I think the Guardian have done an interesting thing putting all their data on Google Docs - give it's likely that the input documents are going to be spreadsheets and the output can tweaked and queried in myriad different ways, this seems powerful.

If we really want this data to be embraced and used, go imperfect/lightweight rather than perfect/heavy...  

Mike Ellis

6/5/2009 5:17:48 PM #

Pingback from weloveyourwalls.com

Data Is A Dish Best Served Raw | weloveyourwalls design blog

weloveyourwalls.com

6/5/2009 11:47:30 PM #

Pingback from blog.josema.net

Data.gov, ¿un nuevo estándar en gobierno abierto? «  el ñasco a la barrapan

blog.josema.net

6/6/2009 12:18:00 PM #

Thank you for all your comments so far, please keep them coming. There will be an update next week on what we have been up to and our thinking so far.

Richard

6/6/2009 2:01:53 PM #

Pingback from futuregovconsultancy.com

FutureGov  
» Useful links  

» links for 2009-06-06

futuregovconsultancy.com

6/8/2009 9:28:50 AM #

First, very impressed you are reaching out to the public for help in putting data.gov together.  The effort in the U.S. was very much a silo effort, done inside the government and presented as a fait accompli.  Somewhat ironic that the open government/transparency effort at the White House is bar the most secretive one they have going.

The biggest recommendation I could make for your site is to focus on expanding the amount of raw data available.  Instead of simply building a portal pointing to existing services, you should focus on "freeing" some key databases that have not been available to developers such as the magicians at My Society.  In the U.S., that bulk data might be the full text of all patents or perhaps the source code (XML) to the Official Journals of Government.

Issues such as JSON v. XML or RSS v. Atom are much less important than making bulk data available. As you can see, I come firmly down on the repository side of the repository/index debate!

Thanks again for taking input from the public on your system, even those of us located outside the Commonwealth!

Carl Malamud

6/8/2009 10:27:38 AM #

A few comments:

- I would concur with the idea that PSIHs (public sector information holders) should publish in RDF - nightly ideally; there are plenty of tools available to do subsequent translations; however, for most people who really want to use the data "discovery" is key and an effective way to do this would be to madate metadata creation and then publish that metadata as RDF; once discovered users can quickly ascertain whether the data set behind the metadata will be of any utility to them rather than download the data first; this would in no way constrain users from access to near real time data (and the nightly updates would be small, change only type data, reducing server and bandwidth demands)

- "raw" data would need to be de-personalised - some level of aggregation and anonymisation of data should be a given

- POIT recommendation 14: "public information data sets should be easy to find and use" itself suggests a central repository though it does also suggest catalogue; even the UK Location Strategy moots centralised repositories; it might be possible to dynamically reverse engineer such a thing as each PSIH makes their data discoverable (per the above) but why would you?

- viz "surveillance society" - PSIHs have increasing oversight or control of embedded, ubiquitous data feeds themselves from CCTV, ANPR, environmental monitoring networks etc - a veritable potential feast.....if Google StreetView can cause the upset it has in some quarters, you ain't see nothing yet

- PSIHs come in various "flavours" from central government agency, local authority, executive agency to trading fund and others, each with varying remits as to "trading" behaviour; with that comes varying copyright and licensing provisions - this links back to the "use" term in POIT recommendation 14.  This then opens up debate about how PSIH data could/should be licensed for use beyond government

- "selective" use of available PSIH data in the media and by the political class brings pause amongst data custodians - "what will they do with it?", "they don't understand sampling methods and errors" - that informs the status quo

- on top of this most (not all) PSIHs are appalling at any kind of pro-active disclosure about any data they have and how you might gain access to, let alone use, it; it is going to take massive cultural change in government and especially within the civil service to alter this reality - that is the biggest challenge of digital engagement, as others have observed the technical solutions are pretty much understood (if not universally agreed upon)

James Cutler

6/8/2009 12:29:17 PM #

Tony Hirst

6/9/2009 8:36:29 AM #

My advice is KISS, keep it simple, use whatever media the people you want to talk to are using, ie if its twitter then tweet. If they want pdfs then give them that. Put stuff up on websites and make links easy to find and negotiate, make it easy for people to reply, and acknowledge their responses quickly or they lose interest in you.

Mrs Doyle

6/10/2009 11:52:29 AM #

I agree with u Mrs Doyle. KISS is a great way to improve and ease your visitors. Giving them exactly what they want and being precise and providing them with the link they need will make the user journey so much easier. I use Eyetracker program to scan and locate where people focus a lot. Then I ensure that I have exactly what people are looking at the right place at the right time.

Mauritius

7/7/2009 8:54:27 AM #

Pingback from blog.okfn.org

Open Knowledge Foundation Blog  » Blog Archive   » Speaking at OpenTech 2009

blog.okfn.org

7/17/2009 12:27:31 AM #

@Mauritius : I use Eyetracker also and i have to say it's a great,useful tool.

Nice Blogger

7/23/2009 10:54:48 PM #

Pingback from rfahey.org

The need for KPIs versus FOIs | Talkin' bout a revolution

rfahey.org

8/5/2009 8:40:43 PM #

rfahey.org
ha that wold be a REVOLUTION huh?

Miguel A.

8/15/2009 4:34:26 PM #

re Does this need its own domain, or should it sit on an existing supersite (e.g. http://direct.gov.uk)

I would say it needs to be subdomain of direct.gov, rather than a new domain that is unfamiliar to users.

William Contracting

8/20/2009 8:34:59 AM #

Pingback from ideapolicy.wordpress.com

Linked data, swings and roundabouts « Policy and Performance

ideapolicy.wordpress.com

8/26/2009 4:25:45 PM #

I think for tabular data in files CSV is the best choice.  JSON is perfect for small amounts of data with simple structure, used only as an internal implementation detail of an AJAX application. It is not appropriate as a stable interchange format between multiple applications. No new government formats, please…

Calling Cards

9/10/2009 11:11:08 AM #

Pingback from onlinejournalismblog.com

Data and the future of journalism panel discussion: Linked Data London

onlinejournalismblog.com

9/14/2009 7:56:56 PM #

Pingback from stalkked.com

Web Semantico: Linked Data e futuro del giornalismo

stalkked.com

9/15/2009 8:23:22 PM #

There are plenty of places to look for data already. For something like crime stats, each police force has their own site, and the Home Office has a site. For education figures, put them on the DfES site. Put them at a reliable, sensible URI that won't be changed willy-nilly. Cool URIs don't change after all, just like cool telephone numbers and cool e-mail addresses. You may then want to pull together a central index.

Instead of simply building a portal pointing to existing services, you should focus on "freeing" some key databases that have not been available to developers such as the magicians at My Society.

sim only

9/17/2009 6:42:32 PM #

It's a great idea to provide the data. I'm also impressed with his concern to provide them the best way. I believe in traditional formats such as CSV, XLS and TXT are the most valid.
Congratulations for the article!

Souza Vilabol

10/2/2009 10:32:51 PM #

Pingback from trailkev.wordpress.com

Government wants developers to get excited and make things « A little Jack with that?

trailkev.wordpress.com

10/3/2009 10:05:11 AM #

Pingback from yuvablog.com

Government urges developers to get excited and make things | Yuvablog

yuvablog.com

10/15/2009 12:55:35 PM #

That's a great link Noel. The more transparency the better. Of course there will be a lot of commercial value but hopefully a lot of free and useful references will come out of it.

Business Report

11/3/2009 12:12:52 AM #

I would have thought a format that was accessible by as many people as possible  - If developing applications then XML or even CSV -or an excel compatible dataset as there are so many free spreadsheets available. If written word then pdf or doc are good but PDF gives a little more security.

Tim Driver

11/9/2009 10:41:55 AM #

Accessibility is key. Multiple data sets (JSON etc) and no proprietary formats such as XLS please. CSV, XML or plain text.

Cath Kidston

11/10/2009 8:35:39 PM #

I think it should be on an existing supersite, and if you used rss feeds that would be great for me and keeping up with posts.

white nightstands

11/17/2009 6:18:55 PM #

Pingback from blogs.talis.com

Nodalities  » Blog Archive   » data.gov.uk and the Talis Platform

blogs.talis.com

11/27/2009 2:33:02 PM #

Accessibility by the masses is essential and I agree any developers should be using XML not sure about CSV though. It really needs to be downlaodable so people can print it out and read it in their own time, not just when at work (for instance).

Morgan
http://www.morgan-photography.co.uk

Morgan Rushton

12/2/2009 2:54:52 PM #

In addition to accessibility please make sure it follows open standards.

Stoodleigh Court

12/2/2009 7:25:47 PM #

I think the ideal would be on an existing supersite, and if you used rss feeds that would be great for keeping up with posts.

Valerie

12/3/2009 1:22:46 AM #

Access to the raw data I believe is important, but we'll need something like FTP to access the data. I mean, RSS feeds and other things like that are nice (and will keep us updated), but they are all window dressing compared to the raw data

Antique Furniture

12/9/2009 12:10:10 PM #

I would prefer an FTP directory, but understand that in this day and age, most people would err towards an RSS.

To echo the comment about, a RAW feed would be probably be best on a separate site from directgov which can be a nightmare to navigate.

Suffolk Photographer

12/23/2009 6:36:16 PM #

I agree with Mrs Doyle Keeping it simple is the best way to for users.Many users will find FTP more difficult then RSS.

Downtown Hotels

12/28/2009 9:58:04 PM #

This is similar to social data discovery. And I agree using rss feeds would be great for keeping up with posts.

Stephen

Add comment

(Will show your Gravatar icon)



Search

Media links

RSS Twitter Bookmark and Share

Archive

<<  February 2010  >>
MoTuWeThFrSaSu
25262728293031
1234567
891011121314
15161718192021
22232425262728
1234567
Post list