guide

Kā atvērt datus

Languages:  de  el  en  es  fr  he  hr  id  is  it  ja  ko  lt  lv  nl_BE  pt_BR  ro  ru  zh_CN  zh_TW 

This section forms the core of this handbook. It gives concrete, detailed advice on how data holders can open up data. We’ll go through the basics, but also cover the pitfalls. Lastly, we will discuss the more subtle issues that can arise.

There are three key rules we recommend following when opening up data:

  • Keep it simple. Start out small, simple and fast. There is no requirement that every dataset must be made open right now. Starting out by opening up just one dataset, or even one part of a large dataset, is fine – of course, the more datasets you can open up the better.

Atcerieties - tas ir saistīts ar inovāciju. Darboties tik ātri cik vien iespējams ir labi, jo tas nozīmē, ka jūs varat radīt dinamiku un mācīties no pieredzes. Inovācija ir saistīta ar neveiksmi tikpat lielā mērā kā ar veiksmi un ne katra datu kopa būs noderīga.

  • Engage early and engage often. Engage with actual and potential users and re-users of the data as early and as often as you can, be they citizens, businesses or developers. This will ensure that the next iteration of your service is as relevant as it can be.

    It is essential to bear in mind that much of the data will not reach ultimate users directly, but rather via ‘info-mediaries’. These are the people who take the data and transform or remix it to be presented. For example, most of us don’t want or need a large database of GPS coordinates, we would much prefer a map. Thus, engage with infomediaries first. They will re-use and repurpose the material.

  • Address common fears and misunderstandings. This is especially important if you are working with or within large institutions such as government. When opening up data you will encounter plenty of questions and fears. It is important to (a) identify the most important ones and (b) address them at as early a stage as possible.

There are four main steps in making data open, each of which will be covered in detail below. These are in very approximate order - many of the steps can be done simultaneously.

  1. Choose your dataset(s). Choose the dataset(s) you plan to make open. Keep in mind that you can (and may need to) return to this step if you encounter problems at a later stage.
  2. Apply an open license.
    1. Nosakiet kāda veida intelektuālā īpašuma tiesības ir ietvertas datos.
    2. Apply a suitable ‘open’ license that licenses all of these rights and supports the definition of openness discussed in the section above on ‘What Open Data’
  1. Ievērojiet: ja jūs to nevarat izdarīt, atgriezieties pie 1.soļa un mēģiniet citu datu kopu
  1. Make the data available - in bulk and in a useful format. You may also wish to consider alternative ways of making it available such as via an API.
  2. Make it discoverable - post on the web and perhaps organize a central catalog to list your open datasets.

Choose Dataset(s)

Choosing the dataset(s) you plan to make open is the first step – though remember that the whole process of opening up data is iterative and you can return to this step if you encounter problems later on.

If you already know exactly what dataset(s) you plan to open up you can move straight on to the next section. However, in many cases, especially for large institutions, choosing which datasets to focus on is a challenge. How should one proceed in this case?

Šāda saraksta sagatavošanai vajadzētu būt ātram procesam, kas nosaka kuras datu kopas varētu atvērt pirmās. Vēlākos posmos būs pietiekoši laika, lai pārbaudītu detalizēti vai tās ir piemērotas.

There is no requirement to create a comprehensive list of your datasets. The main point to bear in mind is whether it is feasible to publish this data at all (whether openly or otherwise) - see this previous section.

Pajautājiet iedzīvotājiem

We recommend that you ask the community in the first instance. That is the people who will be accessing and using the data, as they are likely to have a good understanding of which data could be valuable.

  1. Prepare a short list of potential datasets that you would like feedback on. It is not essential that this list concurs with your expectations, the main intention is to get a feel for the demand. This could be based on other countries’ open data catalogs.
  2. Prasiet iesniegt komentārus
  3. Publicise your request with a webpage. Make sure that it is possible to access the request through its own URL. That way, when shared via social media, the request can be easily found.
  4. Radiet vienkāršus veidus, ka sniegt atbildes. Izvairieties no prasības reģistrēties, jo tas samazina atbilžu daudzumu.
  5. Circulate the request to relevant mailing lists, forums and individuals, pointing back to the main webpage.
  6. Organizējiet konsultāciju pasākumu. Pārliecinieties, ka tas notiek piemērotā laikā, lai varētu piedalīties gan uzņēmēji, gan datu prasītāji, gan amatpersonas.
  7. Uzaiciniet kādu politiķi pārstāvēt, runāt jūsu organizācijas vārdā. Atvērtie dati visticamāk ir daļa no valsts politikas, kas vērsta uz plašāku valsts informācijas pieejamību.

Cost basis

How much money do agencies spend on the collection and maintainence of data that they hold? If they spend a great deal on a particular set of data, then it is highly likely that others would like to access it.

This argument may be fairly susceptible to concerns of freeriding. The question you will need to respond to is, “Why should other people get information for free that is so expensive?”. The answer is that the expense is absorbed by the public sector to perform a particular function. The cost of sending that data, once it has been collected, to a third party is approximately nothing. Therefore, they should be charged nothing.

Ease of release

Sometimes, rather than deciding which data would be most valuable, it could be useful to take a look at which data is easiest to get into the public’s hands. Small, easy releases can act as the catalyst for larger behavioural change within organisations.

Esiet uzmanīgi izmantojot šo pieeju. Var gadīties, ka šiem mazajiem datu izlaidumiem ir tik maza nozīme, ka no tiem nav iespējams neko izveidot. Ja tas notiek, pastāv iespēja, ka tiks sagrauta visa projekta ticamība.

Observe peers

Open data is a growing movement. There are likely to be many people in your area who understand what other areas are doing. Formulate a list on the basis of what those agencies are doing.

In most jurisdictions there are intellectual property rights in data that prevent third-parties from using, reusing and redistributing data without explicit permission. Even in places where the existence of rights is uncertain, it is important to apply a license simply for the sake of clarity. Thus, if you are planning to make your data available you should put a license on it – and if you want your data to be open this is even more important.

What licenses can you use? We recommend that for ‘open’ data you use one of the licenses conformant with the Open Definition and marked as suitable for data. This list (along with instructions for usage) can be found at:

A short 1-page instruction guide to applying an open data license can be found on the Open Data Commons site:

Padariet datus pieejamus (Tehniskā pieejamība)

Open data needs to be technically open as well as legally open. Specifically, the data needs to be available in bulk in a {term:machine-readable} format.

Available

Data should be priced at no more than a reasonable cost of reproduction, preferably as a free download from the Internet. This pricing model is achieved because your agency should not undertake any cost when it provides data for use.

In bulk

The data should be available as a complete set. If you have a register which is collected under statute, the entire register should be available for download. A web API or similar service may also be very useful, but they are not a substitutes for bulk access.

In an open, machine-readable format

Re-use of data held by the public sector should not be subject to patent restrictions. More importantly, making sure that you are providing machine-readable formats allows for greatest re-use. To illustrate this, consider statistics published as PDF (Portable Document Format) documents, often used for high quality printing. While these statistics can be read by humans, they are very hard for a computer to use. This greatly limits the ability for others to re-use that data.

Zemāk minētas dažas pieejas, kas var noderēt:

  • Keep it simple,
  • Move fast
  • Be pragmatic.

Daudz labāk ir iedot šodien neapstrādātus datus nekā perfektus datus pēc sešiem mēnešiem.

There are many different ways to make data available to others. The most natural in the Internet age is online publication. There are many variations to this model. At its most basic, agencies make their data available via their websites and a central catalog directs visitors to the appropriate source. However, there are alternatives.

When connectivity is limited or the size of the data extremely large, distribution via other formats can be warranted. This section will also discuss alternatives, which can act to keep prices very low.

Tiešsaistes metodes

Via your existing website

The system which will be most familiar to your web content team is to provide files for download from webpages. Just as you currently provide access to discussion documents, data files are perfectly happy to be made available this way.

One difficulty with this approach is that it is very difficult for an outsider to discover where to find updated information. This option places some burden on the people creating tools with your data.

Via 3rd party sites

Many repositories have become hubs of data in particular fields. For example, pachube.com is designed to connect people with sensors to those who wish to access data from them. Sites like Infochimps.com and Talis.com allow public sector agencies to store massive quantities of data for free.

Third party sites can be very useful. The main reason for this is that they have already pooled together a community of interested people and other sets of data. When your data is part of these platforms, a type of positive compound interest is created.

Wholesale data platforms already provide the infrastructure which can support the demand. They often provide analytics and usage information. For public sector agencies, they are generally free.

These platforms can have two costs. The first is independence. Your agency needs to be able to yield control to others. This is often politically, legally or operationally difficult. The second cost may be openness. Ensure that your data platform is agnostic about who can access it. Software developers and scientists use many operating sytems, from smart phones to supercomputers. They should all be able to access the data.

Via FTP servers

Nemodernāka metode, lai nodrošinātu pieeju datiem, ir izmantot failu transporta protokolu (FTP). Šī metode ir piemērota, ja jūsu auditorija pamatā ir tehniski izglītota, piemēram, programmatūru izstrādātāji vai zinātnieki. FTP sistēma darbojas HTTP vietā, tomēr tā ir īpaši piemērota failu pārsūtīšanai.

FTP ir zaudējusi atbalstu. Tā vietā, lai gādātu par mājas lapu, FTP servera izskatīšana vairāk līdzinās datora mapju izskatīšanai. Tāpēc, pat ja FTP ir piemērota mērķim, to izmantojot mājas lapu izstrādātājiem ir daudz mazāk iespēju prasīt samaksu par pielāgošanu.

As torrents

BitTorrent is a system which has become familiar to policy makers because of its association with copyright infringement. BitTorrent uses files called torrents, which work by splitting the cost of distributing files between all of the people accessing those files. Instead of servers becoming overloaded, the supply increases with the demand increases. This is the reason that this system is so successful for sharing movies. It is a wonderfully efficient way to distribute very large volumes of data.

As an API

Data can be published via an Application Programming Interface (API). These interfaces have become very popular. They allow programmers to select specific portions of the data, rather than providing all of the data in bulk as a large file. APIs are typically connected to a database which is being updated in real-time. This means that making information available via an API can ensure that it is up to date.

Neapstrādātu datu publiskošanai lielos apjomos jābūt katras atvērto datu iniciatīvas primārai rūpei. Ir virkne izmaksu, lai nodrošinātu API:

  1. Cena. Tie prasa daudz vairāk darba izstrādei un uzturēšanai nekā failu nodrošināšana.
  2. Gaidas. Lai veidotu lietotāju kopienu sistēmas pamatā, ir nepieciešams sniegt drošību. Ja kaut kas nojuks, tiek sagaidīts, ka jūs atradīsiet līdzekļus, lai to sakārtotu.

Liela apjoma datu pieejamība nodrošina to, ka:

  • there is no dependency on the original provider of the data, meaning that if a restructure or budget cycle changes the situation, the data are still available.
  • ikviens var iegūt kopiju un to izplatīt tālāk. Tas samazina izplatīšanas izmaksas organizācijai, kas ir datu avots un nozīmē, ka nepastāv viens vienīgs klupšanas akmens.
  • citi var attīstīt savus pakalpojumus, jo viņi var būt droši, ka dati tiem netiks atņemti.

Providing data in bulk allows others to use the data beyond its original purposes. For example, it allows it to be converted into a new format, linked with other resources, or versioned and archived in multiple places. While the latest version of the data may be made available via an API, raw data should be made available in bulk at regular intervals.

For example, the Eurostat statistical service has a bulk download facility offering over 4000 data files. It is updated twice a day, offers data in Tab-separated values (TSV) format, and includes documentation about the download facility as well as about the data files.

Another example is the District of Columbia Data Catalog, which allows data to be downloaded in CSV and XLS format in addition to live feeds of the data.

Make data discoverable

“Atvērtie dati” ir bezjēdzīgi bez lietotājiem. Jums jāpārliecinās par to, ka cilvēki var atrast izejmateriālu. Šī nodaļa pievērsīsies dažādām pieejām.

The most important thing is to provide a neutral space which can overcome both inter-agency politics and future budget cycles. Jurisdictional borders, whether sectorial or geographical, can make cooperation difficult. However, there are significant benefits in joining forces. The easier it is for outsiders to discover data, the faster new and useful tools will be built.

Jau esošie rīki

There are a number of tools which are live on the web that are specifically designed to make data more discoverable.

One of the most prominent is the DataHub and is a catalog and data store for datasets from around the world. The site makes it easy for individuals and organizations to publish material and for data users to find material they need.

In addition, there are dozens of specialist catalogs for different sectors and places. Many scientific communities have created a catalog system for their fields, as data are often required for publication.

Valdībai

As it has emerged, orthodox practice is for a lead agency to create a catalog for the government’s data. When establishing a catalog, try to create some structure which allows many departments to easily keep their own information current.

Resist the urge to build the software to support the catalog from scratch. There are free and open source software solutions (such as CKAN) which have been adopted by many governments already. As such, investing in another platform may not be needed.

There are a few things that most open data catalogs miss. Your programme could consider the following:

  • Providing an avenue to allow the private and community sectors to add their data. It may be worthwhile to think of the catalog as the region’s catalog, rather than the regional government’s.
  • Facilitating improvement of the data by allowing derivatives of datasets to be cataloged. For example, someone may geocode addresses and may wish to share those results with everybody. If you only allow single versions of datasets, these improvements remain hidden.
  • Be tolerant of your data appearing elsewhere. That is, content is likely to be duplicated to communities of interest. If you have river level monitoring data available, then your data may appear in a catalog for hydrologists.
  • Ensure that access is equitable. Try to avoid creating a privileged level of access for officials or tenured researchers as this will undermine community participation and engagement.

Pilsoniskajai sabiedrībai

Be willing to create a supplementary catalog for non-official data.

It is very rare for governments to associate with unofficial or non-authoritative sources. Officials have often gone to great expense to ensure that there will not be political embarrassment or other harm caused from misuse or overreliance on data.

Moreover, governments are unlikely to be willing to support activities that mesh their information with information from businesses. Governments are rightfully skeptical of profit motives. Therefore, an independent catalog for community groups, businesses and others may be warranted.