Sunday, May 27, 2018

#Wikidata - No #copyright on common knowledge

The approach on copyright for text and data is imho utterly different. For your text you seek a reputable source, you cite it. All in all a lot of work.

A proper approach to data is that you seek confirmation on what you already know and it is encouraging when there are many sources that agree on what is common to all of them. When you add new data, typically most of what you care for you will find through links from existing shared data probably in multiple sources. This is not done by hand, too much work, it is done by bot and consequently there is not even some "sweat on the brow".

Arguably, common data exists as common knowledge. It is not proprietary to any one of these sources and consequently claiming copyright let alone a license is at least problematic.

When data is specific to one data source, it is inherently problematic. It may be wrong, particularly when it differs from what other sources state. It follows that there is a need for care before this data is used. You then get into a manual process of reconciling and curating the data, you may even decide to diverge from what all the others say. The confirmation and the creation of new data both is actually research. It is not the using of data from the other source. In my mind this means that there is no burden of copyright applicable.

When data is considered from Wikipedias for Wikidata, the same considerations apply. When you think about it, it is quite bizarre; you take expressions in words and convert it into a qualifier that represent said words, words that can be in any language. Words that may not even be what you see on your screen. The processing of texts may be automated and, it is easy to understand that from the input of all the Wikipedias alone a superset of data is created that is more than any one article. The notion that copyright can be legitimately claimed is problematic at best.

When you take all this on board, and the fact that individual facts cannot be copyrighted, it is obvious to me that the choice for a CC-0 licence for Wikidata is fortunate. A license implies copyright but it is given away with this licence. The claim of the copyright is at best a defensive strategy.

Thursday, May 24, 2018

#BRAVEedit - Lessons learned at the Amsterdam editathon

The Amsterdam editathon for female civil rights activists was a success. This Listeria list shows it well. All the articles not cursive now have an article in Dutch. It may be that some Dutch articles were written in Brussels. The request to those who joined was to write or translate articles in English or Dutch. This is the same Listeria list for English..

Editathons happened in twenty countries concurrently. Organising such an event is a mammoth task, it is easy for things to go wrong, not to be clear. With 20 countries many languages are involved producing sufficient information for all of them is not easy. The question is how to do this optimally.

Information is to be available for all the languages a particular person or organisation is targetted for. Care needs to be taken that people are uniquely identified; this to prevent the creation of duplicate articles (this happened for Fartuun Abdisalaan Adan aka Fartuun Adan). Now you can make lists in a spreadsheet, you can write texts with the sources but the challenge is to maintain this for twenty languages!

Enter a Wikidata property: "on focus list of Wikimedia project". As you research a person that is to be on the focus list for the "Amnesty International Editathon", you either append it to an existing item or a new item. You can add the location as a "qualifier", this has been done for Amsterdam enabling a list specific for this editathon.

Articles that are to be written will be in italic on a Wikipedia and, this helps prevent duplicate articles. Sources can be added as qualifiers as well and with a small change they show in a list. Images and all kinds of other information can be added and this shows really well in Reasonator.. The family of Fartuun Adan for instance was already known and, she did win the International Women of Courage Award in 2013.

So the editathons happened and, for all the people and organisations the "on focus list" property can still be added. This will make it easier to analyse the impact of all the work; queries can be made to learn how many articles exist and, it makes it easy to learn how many of these articles are actually read.

One of the things these list also show is that for many people we do not have an illustration. I think that when Amnesty makes images available for all of them, the articles become more attractive and are likely to gain more attention.

For me the Amsterdam was wonderful. I enjoyed seeing people grapple with the idiosyncrasies of Wikipedia editing. I had it confirmed again that I am not a Wikipedian.. I do not edit really but being there had me experiment and think about organising multi lingual projects and find confirmation for my understanding how this could be done efficiently.

PS I do think that the people at the English chapter and the people at Amnesty did a great job.

Monday, May 14, 2018

#AfricaGap - #Wikidata; its quality as Wikidata matures

Currently there are 45 countries that I monitor for their national politicians. When I add a specific national "position", I do several things; I add existing politicians that are known in a particular category and I include a definition of what that category contains.

I give hardly any attention to details; my objective here is simple I want to see how this (underdeveloped) data evolves. There is a huge gap in what we know about Africa and as it is, we hardly inform about Africa, we need Africans to help us gain the most basic facts straight for ourselves.

As Wikidata matures, we gain subsets of data that is of varying quality. The most mature living data are our interwiki links. It is live data and it serves a purpose. Changes require attention to detail it has an immediate effect in the discoverability of information. When data comes alive, when it serves a purpose, it has people who will invest their time to get the data right. They will give attention to detail because that serves their purpose.

For arcane subjects like the Ottoman Empire, even Africa, there are few people who find a purpose in the data. Arguably there is so little data that almost everything added is a 100% gain in quality (a person exists, he is a member of parliament of ***, I do not understand African names so it could be male or female I do not know). Sometimes there are whole lists of people like these people from the Bosnian Eyalet, it is easy enough to complete such a list. But will it serve a purpose? How to give it a purpose?

There is no uniform quality to Wikidata. There are whole areas where we are 100% of the mark as we do not have the data nor the ability to link to data elsewhere. There are other areas like in biomedical literature where our quality is such that it is actually useful. As this becomes known thanks to its evangelists, more attention is given by a wider public and more attention to detail is given in the process.

Arguably the quality of subsets of our data depends on its usefulness. When it is useful, people will come and give the attention to detail as it serves their purpose.

Saturday, May 12, 2018

#Wikidata - #Copyright and linked data

There are many points of view when it comes to copyright and data. In the Wikipedia world the discussion is different because each text has its own copyright. Data is different because you can not own ie copyright a separate fact.

When data is open or opened up, it follows that much of the data that exist in multiple sources is identical. When the data is the same, it has two benefits. The first is quality. When multiple sources agree on something, it is more likely to be correct. The second is copyright; whose copyright?

Every now and again, the license used by Wikidata is questioned. Typically by Wikipedians who think they know their stuff. They will be the first to tell you the importance of sources and, indeed many factoids in Wikidata do not have a source. When a factoid is sourced, a statement like John Doe died on Friday, 13th, that factoid only links to the source and hardly to the place where it came to the attention of the person or the bot adding it to Wikidata.

When I add the fact that someone is a member of the Somalian parliament, when a list is used like this one, that information is sourced, there is no added value except for a name being on a list. It has been in the news that in the last year parliamentarians have been murdered, there is no article for them and consequently even in Wikipedia it is only a name on a list, no added value, no arguable reason for copyright.

Value is in the links, it is in knowing the same data to be true in many sources. Claiming copyright, particularly in data, is predatory. It prevents people from bringing facts together. Only when facts are brought together informed knowledge exists. Only in linked data, sourced data, there is a handle on fake facts and fake news.

Thursday, May 10, 2018

#Wikimedia - What I am willing to do for the #AfricaGap

Africa hardly gets attention in Wikimedia projects. When the one project that brings together, Wikidata, does not know the people who are or used to be president of an African country, this is obvious. There is no reasonable argument to counter this.

What I can do is "watch the gap". To do this I have a growing list of African National politicians. The list is not complete, I am still adding countries. I do not add ministers and I have not included "first wives", this to reach out to people who care about that other gap, a gap that is no longer as wide.

When people add data about politicians, it will update Listeria lists. There are many of them and they will show up on my watch list. It means that I can tweet about changes as they occur.

To be perfectly honest; I expect it to be like in a railstation; typically you wait for the trains and are watching a chasm and not a divide.

Friday, May 04, 2018

#Wikimedia - Introducing the #AfricaGap

Minding the gaps is  important in all our projects. The #GenderGap program is an excellent project that shows the important and impressive results possible when we make a deliberate effort.

One area where we are weak is in our coverage of everything Africa. One area where we are particularly weak is in providing support for our readers and editors in Africa.

There are many things that can be done to improve upon the current situation and I am grateful to the people who have worked so hard to get us where we are.

To mind a gap, it starts with awareness. My "Africa" page provides some insight in the politicians of African countries. Obviously most politicians are missing and as my page links to Listeria list, every time a new African politician becomes known in Wikidata, it will show up on my watch list.

I intent to include all African countries and their national politicians. I will remain committed to bring more information about Turkey and its history, this project will show through the daily Listeria updates the extend of our African efforts. It would be cool when 1% of the humans we know is from Africa.

Tuesday, April 17, 2018

#Wikimedia - please mind the Africa data gap

A friend attended a Wikimedia conference in Africa. He asked me for the number of people known to be from Mozambique. A question like this is really relevant, I asked for a query and I am happy there is a result however, only 319 people known from Mozambique in Wikidata (that is all Wikipedias together) is a really low number. It is not an exception, countries like Rwanda or Niger, Malawi or Gabon do not fare better.

When you consider that there are more people known to be from Andorra (339) it is obvious that there is a real issue with how we cover "the rest of the world".