Saturday, June 24, 2017

#Wikipedia - Sister projects in search results

The Wikipedia Signpost informs that the discovery team extended the results for search on Wikipedia. New is that English Wikipedia now includes results from
WikisourceWiktionaryWikiquote and  Wikivoyage and that is indeed welcome news.

There is one puzzling part in the information; "Wikidata and Wikispecies are not within the scope of this feature." It is puzzling because including Wikidata search results is where search has been augmented for years in many Wikipedias including the English Wikipedia by the people who added this little bit of magic Magnus provided.

As you can see in the screenshot of the search for Wilbur R. Leopold, an award was conferred on him and the origin of this factoid is the article on the award. Thanks to Wikidata, information is available for Mr Leopold. There are so many references in Wikidata that have no article in a Wikipedia or any other project that from a search perspective it is probably the next frontier.

When wiki links, red links and even black links can be associated with Wikidata items, it becomes even easier to add precision to the search results. Adding these links is the low hanging fruit to improved quality in Wikimedia projects anyway. 
Thanks,
     GerardM


Sunday, June 18, 2017

#Wikidata - John P. A. Ioannidis and his awards

I am a self confessed award junkie. They are imho important because they are an indication of who is notable and who is less so.

Three awards are associated with professor Ionannidis in Wikidata. One award was also conferred on Hans Rosling and this gives me added confidence in Mr Ionannidis and other recipients of the Chanchlani Global Health Research Award.

Professor Ionannidis throws cold water on much of the practice of scientific practice and consequently on its practitioners. One of his papers has the title: Why most published research findings are false and it is inherently a challenge as well to what we write in the Wikipedias and Wikidata.

At Wikidata a wholesale import is happening of papers, science facts and its authors. This is a great idea, particularly when papers that dismiss much of the nonsense papers gets a prominent place. The result will be that the Neutral Point Of View gets an other twist; it balances what we include with actual science.
Thanks,
     GerardM

Saturday, June 17, 2017

#Wikidata vs #GeoNames - the first to throw a stone

Wikidata has some vocal people vilifying GeoNames. They insist that no data from GeoNames is included in Wikidata because "the quality is so bad". In my last post I wrote down assertions about Wikidata. One of them is that "Never mind how "bad" an external data source is, when they are willing to cooperate on the identification and curation of mutual differences, they are worthy of collaboration".

I wrote an email to Markc Wick, the founder of GeoNames and with his permission I can publish our mail exchange.

Hoi,The import of data from GeonNames into Wikipedia has been controversial. People say that the quality of the GeoNames data is not "good enough". It resulted in the deletion of thousands of articles from the Swedish Wikipedia. I am not Swedish, I did not follow their discussions but the problem is it sours collaboration with other parties because "their data might not be 100%".
This happened in the past, I care for the future. In Wikidata we do link to GeoNames (example Almere [1]).
There are several ways in which we can help each other and potentially even benefit from a collaboration. Wikidata is licensed with a CC-0 license and therefore GeoNames can have all our data and do with it as they please.
My initial proposal is for a comparison of the shared data. The data where GeoNames differs from Wikidata is potentially problematic. Concentrating on these differences together will improve both our and your data.
Would you be interested?
Thanks,
       GerardM
       Gerard Meijssen
His answer is everything I could hope for:
Hi Gerard
Thanks a lot for your email. A couple of weeks ago I have started to parse the wikidata extract and look for the matching attributes. Unfortunately I got interrupted and have not yet looked at the result of the parsing. I will continue as soon as I find the time.
The goal is to add the wikidata identifier to the alternatenames table with pseudos language code 'wkdt'. What I have noted so far is that sometimes the geonameids in wikidata go the wrong concept. For instance going to the city feature when the article is speaking about the administrative division or vice versa. This is one of the things I would like to check before adding the wikidataid as alternatename. GeoNames also has links to wikipedia.
I don't think wikipedia should import all geonames features, not all of them are relevant enough to justify a wikipedia article.
Best Regards 
Not only is there an interest to collaborate; Marc is checking the links in Wikidata referring to GeoNames and as can be expected he finds issues. As I asserted, this is to be expected and collaboration is the only way forward for optimal results.
Thanks,
      GerardM

Tuesday, June 13, 2017

#Wikidata some assertions

Wikidata is no different from any community, there are differences of opinion. Everybody has his or her own perspective but there are assertions that can be made that have a more universal resonance. 

The assertions below represent the underlying arguments I use in my blog posts and in the discussions I take part of. They are the ones I feel are not necessarily "political" or have a negative impact.
Thanks,
       GerardM
  1. There is no data store without problems, this includes Wikipedia and Wikidata.
  2. The data we hold is best understood by applying set theory. The data in Wikidata consists of many subsets; probably the most valuable subset for the WMF are the interwiki links.
  3. The error rate in each subset can be assessed and is by definition different from the overall Wikidata error rate
  4. The absence of data often indicates a bias in the data Wikidata holds. A good example is the lack of data relevant to the global south.
  5. Given the huge influx of data from Wikipedia, the biggest imports have been from English Wikipedia and it is one reason for the existing biases in Wikidata.
  6. An absence of data prevents the application of tools. Tools may suggest writing a Wikipedia article, tools may compare data with other sources.
  7. Concentrating on the differences between Wikidata and any other data source is the most optimal way of improving the quality of existing data in either data set.
  8. Having an application for the data in Wikidata is the best way for improving the usefulness for a subset of data.
  9. Each contributor to Wikidata works on the data set(s) of his/her own choice, these data sets interact in the whole of Wikidata. This may raise issues and this can not always be avoided.
  10. Examples of problematic data must be seen in the light of the total of the data set they are part of. Statistically they may be irrelevant.
  11. Never mind how "bad" an external data source is, when they are willing to cooperate on the identification and curation of mutual differences, they are worthy of collaboration
  12. Wikidata improves continually and as such it is "purrfect" but it will never be perfect.

Monday, June 12, 2017

#Causegraph, an other way of looking at #Wikidata


Causegraph is a tool to visualize and analyze cause/influence relationships using Wikidata. If you have not seen it yet, give it a spin.

Randomly looking at the galaxy of relations, I found a Charles Frédéric Bassenge, he is in Wikidata because he is the father of Pauline Runge. He is in Wikidata because she has an entry in WikiTree. What amazes me most is the quality of the data for the father and his absence in WikiTree. 

Causegraph works on the basis of there being a direct relation between two persons. For Jacob Palis, the doctoral students and doctoral advisers are included and not the other TWAS award winners.

What is really good is that it is regularly updated. It would be even better when it was a Labs tool. This might enable real time updates .. <grin> there is always a wish for more and better </grin>
Thanks,
       GerardM

Sunday, June 11, 2017

How #Wikipedia gets into @Africa


This is a map showing how fiber is getting into Africa. The blind spots is where the Internet does not go. The red lines is where the future for the Wikimedia lies.
Thanks,
        GerardM

#Wikidata - Premio Almirante Álvaro Alberto

The Premio Almirante Álvaro Alberto is named after admiral Álvaro Alberto da Mota e Silva. They are both notable for their own reasons.

The award was mentioned in an article on the German Wikipedia for César Camacho. The award was not known to Wikidata and was added. The website of the conferring organisation gives me the impression that it is the "National Council for Scientific and Technological Development" and part of the Brazilian ministry of sciences. When you look for it in Wikidata, it is embarrassing.

The admiral is probably a child of his time. He was military and also a very relevant scientist. As a military man he held the rank of vice admiral and as a scientist he was twice the president of the academy of scientists. He was also very much involved in the Brazilian nuclear program.

When you consider the notability of Brazil, it is astounding how little is known in Wikidata. Many politicians have been added for Brazil; national senators and deputies. 

Brazil is one of the top twenty countries in the world I think, when you consider any and all of the "lesser" countries it is obvious that we know even less. When Wikipedia and by inference Wikidata is about the sum of all knowledge, there is a lot of white space where all our tools have no impact.
Thanks,
     GerardM

Saturday, June 10, 2017

#Wikidata - #diversity of #science - Professor Govind Swarup

Professor Swarup received the TWAS prize. The TWAS Prize is an annual award instituted in 1985 by The World Academy of Sciences to recognise excellence in scientific research in the global South. It follows that when attention is given to scientists like Mr Swarup, it should be easy to link to other scientists, particularly those from the global south.

With twenty two awards Mr Swarup does not disappoint. Many of the awards are from India; one of the conferring organisations, the Indian Science Congress Association, lists 41 awards. Its rule that a scientist can now receive only one award in his lifetime indicates how many scientists are recognised by the ISCA.

Making the TWAS prize winners more complete by adding the awards helps to improve the diversity of scientists. It is not only women who have not been fully recognised it is also the scientists from the global south.
Thanks,
      GerardM

Monday, June 05, 2017

#Wikimedia - Felix Andries Vening Meinesz and the sum of all #knowledge

Mr Vening Meinesz was an important Dutch scientist. There are a few pointers to his relevance; he was a member to several august bodies and he was awarded many awards, awards from several countries.

One of the medals is the "Alexander Agassiz Medal". When you look at the English Wikipedia article, you find no red links while many award winners do not have an article. Otto S. Pettersson for instance is/was known to Wikidata but he was not associated with the award. When you google for another awardee, it is most likely that the 1926 award winner was Jacob Bjerknes, not Wilhelm Bjerknes.. Even sources get it wrong..

In many ways, awards are rather boring. Getting all the information right is a lot of work and when it is to be written in Wikipedia articles, there are too many Wikipedias and awards for all the awards to get an article in all of them.

When awards are fed to Wikipedia articles from Wikidata like it is done for sources, it becomes a lot more manageable. Increasingly Wikidata knows about more awards for more people. What does it take to reach the necessary tipping point? Which Wikipedia will consider this first?
Thanks,
     GerardM

#Wikidata - a #young face for #science


These are members of the "Jonge Academie".  They are Dutch scientists and for this academy to remain young, membership expires after ten years. Similar academies exist in several other countries. Countries like Pakistan and Belgium.

With politicians riding roughshod over scientific facts, rejuvenation of science is important. The notion that scientists are old is all too easy and it is equally easy to dismiss young scientists for a lack of relevance. Check out these Dutch scientists, they are relevant and will be for a long time.
Thanks,
      GerardM

Saturday, June 03, 2017

#Wikipedia - Bias and the King #Faisal International #Prize

The King Faisal International Prize is an international award recognising five distinct areas. They are: Service to Islam, Islamic Studies, Arabic language and literature, Medicine and Science.

When a Wikipedia is to write in a NPOV way about the King Faisal International Prize, all five categories need to be included. Just listing Medicine and Science and having the article as a "science award" ignores the scientific realities in the other three categories or prevents the inclusion of other theological or literature awards.

This is an unfounded bias and remediation is needed in order to achieve NPOV.
Thanks,
     GerardM

Thursday, June 01, 2017

#Wikipedia - another #German #Award


It is funny in its own way that the only award winner that has no "red" or "blue link" on this award page are Wikipedians; German Wikipedians. They won the 2016 GDCh-Preis für Journalisten und Schriftsteller.

Typically we do not give much attention to our achievements and as such the understated attention can be understood. At Wikidata we need to have an item in order to recognise an award winner. As there was a photo of the award ceremony, it was obvious to add it to the item for these award winners as well.
Thanks,
      GerardM

Tuesday, May 30, 2017

#Wikidata - Thomas Scheibel - a German scientist

When Wikidata is filled, it is typically content from the English Wikipedia. This adds to an existing bias. Germany is known for the high quality of its scientist and, Mr Scheibel is one of them.

Recently all the German "science awards" have their data imported in Wikidata. Mr Scheibel is only one of many Germans who's career is now better represented. Four of the awards bestowed on him are now well represented. For recent awardees Wikidata was searched for their presence; something like 25% of these people already existed.

The diversity of Wikidata is better served when more data from the "other" Wikipedias is imported. This is best done by people who know a language and for me Dutch, German and French are obvious targets. I can manage in many other languages but it is easier to make mistakes.

What we need is champions for the data for countries and cultures. They can ensure that we are truly of global relevance. This is one way of ensuring that we will be of relevance in 2030.
Thanks,
      GerardM

Sunday, May 28, 2017

#Wikimedia #GLAM - donation of pictures from #Syria

The Tropenmuseum was asked to make available pictures they have from Syria. Today, it is my pleasure to inform you about a donation by the "Nationaal Museum van Wereldculturen,

At the time we celebrated the Erasmus Prize and, at the time we worried about Bassel Khartabil. There is little hope for Bassel but there is hope for us to become informed about Syria.

It is Ramadan, and whether you believe or not, this is an auspicious time to consider the information we have on Syria.

Share the pictures you hold in Commons. As we learn just a little more about the country, its peoples and history we may find them to be just like us.
Thanks,
       GerardM

Friday, May 26, 2017

#Wikidata - Theodor Eschenburg award

Mr Eschenburg was a German journalist. Mr Eschenburg received nine awards, you will not (yet) find this in Wikidata and had one award named after him; the Theodor-Eschenburg-Preis.

All the information about award winners etc is in the article of the conferring organisation; the Deutsche Vereinigung für Politikwissenschaft. That works fine in Wikipedia but the award is surely not an alias for the award in Wikidata.

When I find such issues, the situation is easily by creating a new item.

NB when you look at the Reasonator page for Mr Eschenburg, check out the long line of notable people he is part of..
Thanks,
       GerardM

Tuesday, May 23, 2017

#Wikidata - Chuck Davis and #VIAF


Mr Chuck Davis used to dance, and make people dance. He came first to our attention because he was awarded the Capezio Dance Award and consequently a Wikidata item was created on March 26. His Wikipedia article was linked on 22 May 2017‎ seven minutes after the article first appeared.

The most fabulous thing is that when I checked on May 23, VIAF already had a link to his Wikidata item. It is proof positive that librarians are actively including Wikidata to VIAF. This is the perfect argument to intensify the collaboration with librarians to give readers of Wikipedia and readers of library books the best of our shared sum of all knowledge.
Thanks,
       GerardM

NB Mr Davis died on May 14, 2017.

Monday, May 22, 2017

#Wikimedia - Presenting #authors in #Scholia

In a fairly rapid pace more and more literature and its authors are included in Wikidata. Many publications are used as sources in a Wikipedia and others get included because scientific "facts" supported by sources find their way in Wikidata as well.

Scholia is a tool that indicates where authors fit in (it does more <grin> but this blog post is only about this </grin>).

When multiple publications are known for an author, it shows the distribution of the publications in time, the number of pages (when known), venue statistics, a co-author graph, the topics, associated images, a topics-works matrix, education, employer/affiliation, academic tree, locations, citation statistics, citations by year and finally citing authors. There are two ways of expressing an opinion, it is exhaustive or it is a bit much. Whatever your choice, a tool like Scholia is awesome. Just the thought that Wikidata already has a relevance that justifies a tool like this.
Thanks,
      GerardM

#Wikidata - One size fits all but only size 47 serves me well

When a Wikipedia decides on its policies; in the end it is a "one size fits all". It is the policy wonks who decide and all editors have to abide by it and all readers suffer the consequences. Shoes are made for walking but you only get the best mileage out of shoes when they fit.

When you look at the categories for different Wikipedias they are not the same. Some explicitly exclude the standard information of other Wikipedias. As a result there is no universal standard and this is detrimental to readers who frequent multiple Wikipedias.

At the same time, a Wikipedia community may define its policies and practices as they see fit. This does not mean that they define what individual readers actually prefer only what they get presented. The amount of categories in use and their structure is a good example how editors define information given or withheld from readers. Increasingly the combined information from categories from Wikipedias find their way into Wikidata. When a Wikipedia does not include a category, by using the definitions for a category it is possible to present many if not most of what a category could have been.

The question is not can we show what articles of a Wikipedia would be in a category, the question is if our readers will be supported and if not what arguments we have to disallow readers the structures they personally prefer.
Thanks,
      GerardM


Thursday, May 18, 2017

#Wikidata - Manfred Rudersdorf has no #Wikipedia article

Professor Manfred Rudersdorf (left) has no Wikipedia article. As an historian he is expert on the history of "his" university. In the picture you see the presentation of this book to the rector of the University of Leipzig.

When you inspect the Reasonator page for Mr Rudersdorf, it is remarkably complete. It demonstrates that the inclusion from sources external to the Wikimedia Foundation slowly but surely results in proper information.

When you think of it, finding people like Mr Rudersdorf is obvious. There is only one sum of all knowledge and much of it is connected in one way or another. In fact it is a puzzle and we Wikimedians are all too familiar with puzzles.
Thanks,
     GerardM

Monday, May 15, 2017

#Wikidata - Johanna Mestorf is not a #German

Johanna Mestorf was the first "German" female Professor. She was however not German as Germany did not exist; Mrs Mestorf was from the Kingdom of Prussia. Wikipedia has it that Prussia existed from 1701 to 1918 and Mrs Mestorf died in 1903. In the totality of the German speaking world Mrs Mestdorf was prossibly the first female Professor.

Current nationalities and previous nationalities do not match. Trying to understand historic facts from a modern perspective produce a fake perspective.

Not calling Mrs Mestorf German may be problematic for some. But hey is that not what a neutral point of view is about?
Thanks,
      GerardM

Sunday, May 14, 2017

#Wikipedia - #German #Science #Awards II

Adding awards to scientists makes it obvious that there are many scientists out there. The German Wikipedia knows about some 366 German science awards and, they are not provincial. For many awards any deserving scientist may be recognised.

As you move through the list, German Wikipedia practices are different. Some Wikipedians do not like red links so the award winners are just text. Luckily for me, others still allow for red links and this helps a lot.

The Heinrich-Emanuel-Merck-Preis article sees a lot of red. When an effort is made to connect these red links, Wikidata already knows about many of them. Petra Stephanie Dittrich is one such. Many scientist like her have been included because they are included in AcademiaNet.

When I add missing people, given that this is about German data I prefer to add the labels in German. Mr Jonathan V. Sweedler is of the "University of Illinois" and therefore likely American but that is a detail I frequently leave to others.

There is yet another group of scientists finding their way in Wikidata. They are the authors of papers that are used in citations or to establish fact in Wikidata. Awards are another relevant aspect of these scientists.
Thanks,
     GerardM

Saturday, May 13, 2017

#Wikimedia - #Classifying Saadia Zahidi

Mrs Saadia Zahidi came first to the attention of the Wikimedia movement because she featured in the BBC's 100 Women both in 2013 and 2014.

The BBC is really British but the conclusion that Mrs Zahidi is British is a stretch. She studied at three universities; two in the USA and one in Switzerland. She grew up in Pakistan and is a member of the Executive Committee of the World Economic Forum also in Switzerland.

It is easy to claim relevant people as being part of a group. The urge to classify is obvious but classification is inherently discriminatory. With people this is more or less accepted. For Mrs Zahidi her affiliation with the World Economic Forum is missing but she is at least recognised as an author, It was easy enough to add {{authority control}} in her Wikipedia article.

Classification is a hot button subject at Wikidata. There are those like me that resent this weird notion that subclasses are a good thing to have. There is a lengthy discussion about the validity of subclasses for guns, spacecraft and such stuff. It is so convoluted that you need to be an expert to understand the classes in the first place. What makes this nonsense so infuriating is that it makes Wikidata solidly a one maybe few language resource. The argument that it combines things that are the same can be easily ignored because proper statements and a query provide the same result.

Classification is discriminatory. In the past an explanation was asked and not forthcoming. It is wrong to call Mrs Zahidi British. At best she lived or lives in the UK. It is wrong to have tiny subclasses it largely prevents the use and usefulness of Wikidata.

As a movement we should hold back our urge to classify. Classification is a judgement; we should be more descriptive.
Thanks,
      GerardM

Monday, May 08, 2017

#Wikipedia - #German #Science #Awards

There are many different awards known on the German Wikipedia. The category for Science awards alone includes some 366 entries.

For whatever reason most of the implied data has not been transferred to Wikidata. It is probably because there are no or few categories for people who received an award. This is where Awarder, yet another tool by Magnus can make a difference.

The Aby Warburg Prize for instance included much information and by running the tool missing recipients were added including the date it was awarded. The Adolf Windaus Medal did not know any recipients and it takes as long to add all of them. When you run on data from both the German and the English Wikipedia, the result is even better as it was for the Berlin-Brandenburg Academy Award.

To complete the information, there is the "conferred by" and the "named after" property to consider as well as looking for people the Wikipedia does not know. You then find the missing links and people known on another Wiki project. It is easy to add new items for the "red links".

The Albrecht-Ludwig-Berblinger-Preis is a redirect. There are no links in the article for the people who were awarded an award. Adding an item for the award is easy adding at least one recipient is easy as well. This is where Awarder does not help.

The Augsburger Wissenschaftspreis für Interkulturelle Studien award information is in two parts. The first part with data until 2007 can be read by the Awarder. The second part includes a more complex table and cannot be read. In the past the Linked Items tool did the job. It did not include all the associated dates but it did produce a list of all the Wikilinks. They could be processed by PetScan.

Adding information like this from the German Wikipedia takes some effort. In this way we improve the global reach of Wikidata. For awards like the August-Lösch-Ehrenring there is the occasionally new information in the sources for the award. At some stage bots will pick up new information added in Wikidata to make suggestions to Wikipedia editors.

As a rule the quality of Wikipedia articles like this is good and it is worth the effort to promote science.
Thanks,
      GerardM

Saturday, May 06, 2017

Teaching #Wikipedia using local #news

One of the functions of Wikipedia is providing a background and, to understand what is in the news, you need a lot of background. This is one of the first things to overcome when children start reading the news.

When newspapers are introduced in the class, a first exercise is to just read and have the children select a few articles that are of interest to them. As a follow up they analyse the text for concepts that you have to know about to understand the article. They make lists for the selected articles.

On another day they select new articles, make similar lists but are asked if there is an overlap with other lists. They are asked to write a few lines for all the concepts in such a way that there is enough to understand the article coming from every original news article.

This is when Wikipedia is introduced. The children is shown that it provides basic and neutral information that help them understand, for instance, the news. Their next challenge is read Wikipedia and see how its articles help understand the news. When subjects are missing, they make a list.

The last thing to do is write stubs for missing Wikipedia articles. What then may follow is the standard course ware for writing Wikipedia articles. The objective for this approach is that it helps children to better understand the news; understand that news is a continuum. The news is compact and assumes basic knowledge and such information can be found in Wikipedia.
Thanks,
       GerardM

#Wikidata - Steven E. Petersen

Some say that Wikidata is only there to support Wikipedia. Maybe. Wikidata includes information about Mr Steven E. Petersen. The English Wikipedia has three red links for the Grawemeyer Award for recipients in the field of Psychology.

Mr Petersen was already known to Wikidata as the author of four publications. Some will argue that publications are the bread and butter of Wikipedia and they are.

The sum of all knowledge is one whole and as such all Wikimedia projects together are tools that bring all the information, all the knowledge to everyone. All do it in their own way and as such Wikidata does support Wikipedia and it supports Wikipedia among all the other things it is useful for.
Thanks,
      GerardM

Thursday, May 04, 2017

#Wikidata - Award for Scientific Freedom and Responsibility

Mrs Elizabeth Loftus won the Grawemeyer Award in 2005. When you read her Wikipedia article, she has been celebrated frequently for her work.  One way to consider the relevance of people is in the similarities people people share with others and Mrs Loftus shares many awards with many scientists.

As we all stand up for science, several of the awards celebrate science and taking a position that is not popular with the powers that be. One such award is the Award for Scientific Freedom and Responsibility it includes Mrs Jean Maria Arrigo and in my opinion her Wikipedia article does not do justice to the cause that got her this award. Often the true heroes of science do not get the recognition they deserve.

Not everyone goes out to march for Science. I am not in the USA and I do sympathise with this cause. What Wikimedians can do for science is document science, scientists and the scientific process and use scientific practices to ensure that Wikipedia does not carry the false flags some !@#$ insist on.
Thanks,
      GerardM

Sunday, April 30, 2017

#Wikidata and #Libraries - RAMP and #WorldCat identifiers


There are books and there are authors. Libraries are first and foremost in the book business. They register their books because otherwise they do not know what they have. Authors are important but they are secondary. Particularly the not so well known authors, authors with one book do not always get the full treatment. There is a registration, sort of, and it is waiting in the wings to be fully registered.

Libraries and librarians are Wikimedia's friends. A presentation from the IUPUI University Library shows how their wish for good documentation works for them. They release the rights to their "finding aids" and add missing "authority records" for people and companies. They then create Wikipedia articles based on their "finding aids" and add Wikidata records. They have it down to an art so much so that their tool, RAMP (Remixing Archival Metadata Project) lives as a web-based tool on WMFLabs. This invites any librarian anywhere to join the fun.

One of the articles is about Hugh Ned Brown. The article is good and the Wikidata is quite good as well. RAMP is a tool for librarians. It is wonderful and if there is one question left, it is how Wikimedians can contact librarians like the ones at IUPUI to fix the issues we find at our end.
Thanks,
      GerardM

#Wikipedia #Research - World Famous in Holland

When category names are well chosen, they predict similarity between what is in a category. A research paper named: "Recognizing Descriptive Wikipedia Categories for Historical Figures" came to this conclusion, it is complete with a lot of mathematics. They did their panel research so it must be good.

At the back of the paper there is a list of English categories and their "Surprise level". It focuses on balancing the effect of both size of the category and the probability of inside-category pairs become close neighbours.

This makes size of the category relevant. One of the categories is "International Tennis Hall of Fame inductees", it has 222 entries. The German category knows about 22 additional inductees. The S-level is 163.84. For "24 Hours of Le Mans drivers" there are 1,247 entries and the German category knows about 501 additional drivers, the S-level is 138.47.

Categories in Wikidata may include a definition of its content. For in stance: "is a list of" - "human" with a qualifier of "award received" - "International Tennis Hall of Fame". This definition can be used in tools even bots to include all the missing statements in Wikidata.

The absence of articles shows a bias; they are what editors found notable enough to write articles about. It is however not that relevant. One question is: does this research translate to other Wikipedias and its categories another is if there is a predictive value for the relevance of missing articles in other Wikipedias for the same category.

For some categories, relevance exists because of the interest in a specific culture. For me Johan Cruyff is more relevant than any "wide receiver" in American Football; I cannot name one. This research is interesting but it does not give us the most famous people ever. This is obvious because of the distribution of topics of English Wikipedia.
Thanks,
       GerardM

Friday, April 28, 2017

#Wikidata user story - The Golden Brain #Award

I have added people who won the Grawemeyer Award. This award has many categories and I concentrate on the category "psychology". The people in this category get extra attention; all the information from categories and awards are included as well.

Often people turn out to be connected through multiple awards like Mrs Anne Treisman and Mrs Leslie Ungerleider. They both received the Golden Brain Award as well.

When you read the article on the Golden Brain Award, the winners are all in a nice table. For each of them there is either a blue or a red link and for some there is only a string of text.
2015Okihide HikosakaNational Eye InstituteUS
Adding the missing people in Wikidata is not hard, just some additional work. For Mr Okihide Hikosaka there is enough information to add an item to Wikidata. When this is done for all the award winners, it is possible to create a list with the same information in any Wikipedia. 

By adding all this information, people who are into what I concentrate on are better connected. They have more near links to data that links to other people who are relevant in the field of psychology and my hope is that this will trigger people to give attention to missing articles and information.
Thanks,
     GerardM

 

Thursday, April 27, 2017

#OCLC, #VIAF and #WorldCat - I love them and, they could be even better for me

Jimmy Wales is doing his thing for proper news and it is welcome news. When it pans out it will work but people have to read what WikiTribune will bring. As always it takes education and access to information. Libraries have always been the bedrock of available information to people and the OCLC is what connect the worlds libraries. So it is important for it to be as good as it can be when it is to bring more people to libraries and read.

The OCLC brings two programs that are important to me as a person and as a Wikimedian. They are VIAF and WorldCat. VIAF is the "Virtual International Authority File"; it is a system that brings together the information the world's libraries have in their system and aims to connect them. VIAF is largely maintained by software but there are processes to fix issues that do occur. Wikidata is connected to VIAF because it is the link to information about authors that exists in Wikipedia and Wikisource in many languages. There are bots that do find VIAF identifiers thanks to identifiers known at Wikidata and once a month Wikidata identifiers are updated in the VIAF registry. Using VIAF on its own, you will find for instance a Uilyam Şekspir.

WorldCat is where it becomes interesting to readers. For me its information is available in Dutch. Echoing a blogpost on the OCLC blog we can bring more joy to the library's website and for people who come to the library world from a Wikipedia there are opportunities. I have a profile at WorldCat; it knows my library because I entered it as one of my favourites. WorldCat assumes that it knows my location and suggest a library that is not near to me and that is not useful. So picking up on the cookie information it does not need to know my location and allow for an easy link to my library. This will help me. What WorldCat could do is ask people if the suggested library is indeed their library.

The blogpost mentioned earlier talks about web analytics. I would absolutely love to know how many Wikipedia readers get to VIAF or WorldCat. It would be wonderful to know if we get readers connected to their libraries. When we do, the effect of improvements will show and that will motivate Wikimedians even more to get people their facts, and have us share in the sum of all knowledge.
Thanks,
      GerardM

Monday, April 24, 2017

#OSM - Districts of #Kerala


A Wikipedian asked me to blog about this map. The map is shown from within the English Wikipedia. It works really well on my mobile (an i-phone). The next step, integrating multi layered maps in articles.. and on a mobile?
Thanks,
       GerardM

For some documentation..

Thursday, April 20, 2017

#Wikidata user stories - Suggesting Henry Putnam, a great #Librarian

As software suggest what articles to write, it is relevant to understand what logic it is based on. Phenomena like the "six degrees of separation" made popular around Kevin Bacon has its scientific approach in graph theory "betweenness centrality". This is used as a basis in the research that what articles are important and what automated suggestions to make.

Mr Putnam is one of the more relevant librarians. He developed an eponymous classification system, continued its development as the Librarian of Congress (it is still in use), was twice president of the American Library Association and was a knight of the order of the Polar Star. When weight is applied to references to a person, all this is of relevance in the right setting.

When an article is to be written or improved, it helps when it can be suggested what it is that can be improved. By including statements in Wikidata suggestions can be made based in the local language. Facts like date of birth and death are also easy and obvious.

So when people consider a particular subject to be of universal relevance, it helps when associated subjects are well developed in Wikidata. When for all the presidents of the American Library Association many facts like where they studied, where they worked and what awards they received are included. When this is done for all the people who share categories, the betweenness of many influential librarians increases. This will have its influence on what is suggested for people to do.
Thanks,
       GerardM

Wednesday, April 19, 2017

#Wikidata user stories - the sum of all #knowledge


Map showing all places English Wikipedia covers


Map showing all places GeoNames covers

They say "a picture paints a thousand words". There is no argument; English Wikipedia covers only so much. With such a lack of coverage it is impossible to understand what is missing and its relevance particularly to people who do not read English.

LSJbot has created lots of articles for the places GeoNames knows about in several Wikipedias. As a consequence through the backdoor much of the missing information enters Wikidata. There have been some rumblings among Wikidatans that the GeoNames data is not perfect.. But hey, let's make "Be bold", a Wikipedia quality a Wikidata quality as well.

For many Wikipedians, the notion of bot generated articles is an anathema. For others the fact that there is so much that we do not cover is as problematic. The good news is that more information in Wikidata will enable us to predict what is lacking in content. We only need to acknowledge that Wikipedia is not the sum of all knowledge.. yet.
Thanks,
      GerardM

#Wikidata user story - Suggestions to #Wikipedia editors

Exciting is the #research done on "suggestions to Wikipedia editors". There is a paper and a great presentation. The bottom line is that when you know what to suggest to people; when you make it personal, the result is what you would hope. Consider, 3.2 times the number of articles created and two times more articles created than without personalised recommendations.

There is math involved, obviously, but the gist is that when suggestions are in line with previous activities, people will be triggered to do more. When you listen to the presentation, this first experiment asks people to translate from English. The assumption is that English covers more than most.

The slides of the presentation include visualisations showing the coverage of several Wikipedias. When you consider them, it becomes clear where the Wikimedia projects are challenged.

Leila Zia, the presenter makes it clear; all this would not be possible without Wikidata. One thing where Wikidata is different from the assumptions of the research is that there is an increasing number of subjects that have no links to Wiki(m/p)edia articles at all. Many of these are connected to existing content as they share common statements, statements like "profession: soccer player" of "award received: whatever award".

When totally new subjects are to be considered, there is already plenty that might be suggested in Wikidata itself.
Thanks,
      GerardM

Monday, April 17, 2017

#Wikidata user story - #DBpedia, #death and #Federation

Federation between DBpedia and Wikidata became possible. As a consequence, the results of a query that runs on DBpedia can be linked to Wikidata.

Some time ago people at DBpedia created a wonderful query that shows differences between DBpedia and the Dutch and Greek Wikipedia. It received approval from the Dutch Wikipedia community.

With federation something much more interesting became possible; a federated query comparing Wikidata with one DBpedia at a time. When the query runs, current data from Wikidata and DBpedia is presented.  When a Wikipedia associated with  DBpedia changes, DBpedia may import the differences from a RSS-feed and consequently running the query again will show the latest differences.

Updating information about one particular type of statement like date of death, place of death or whatever, will always be based on the current differences.. Experiencing the results in this way is truly motivating. Federation is an instrument that can helps us improve the quality of either federated system.
Thanks,
      GerardM

#Wikidata user story - #Wikipedia #diversity and diversity #research

Diversity, especially the "gender gap" is one of the best researched subjects of Wikipedia. There are many projects that have it as their goal to diminish the gap they object to.

Wikidata has the best and most up to date information about any Wikipedia. People are updating Wikidata all the time, typically its information is based on a Wikipedia.

Take gender; many a Wikipedia has a category for this so it is easy to update Wikidata based on what is in such categories. When a researcher is interested in the articles where Wikidata does not have such information, articles will be found and it is appreciated when Wikidata is updated by them as part of these activities. As a rule, the percentage of "humans" with no known gender is dropping anyway.

When a Wikipedia editor has an interest in female scientists that do not have an article in English, it is easy enough to have a query for that. Not all female scientists with or without a Wikipedia article can be found this way but it is just a matter of adding them in Wikidata. When another editor is interested in female scientists with no article in German of Kannada, it is just one change in the same query.
Thanks,
        GerardM

#Wikidata user story - the #library

The OCLC is an organisation combining most of the libraries in the world. It used to connect to the English Wikipedia but as Wikidata connects all Wikipedias, the OCLC does a better job linking to Wikidata. Through Wikidata it can link to articles about authors in any language.

For many authors the connection between VIAF, the system used by the OCLC and Wikidata is still missing. Many people are adding VIAF identifiers and once a month the data is imported and all the new data pops up.

Best practice at English Wikipedia has it that an {{authority control}} template is added in the reference section of people. When a VIAF identifier is added in Wikidata not only a VIAF identifier but also Worldcat information is shown (the example is for William Keepers Maxwell Jr.). Doing this is possible for any Wikipedia.

Now to expand on this; when a reader opts in, we could show if a book of an author is available in the local library.. What do you think?
Thanks,
       GerardM

Why #Wikidata? Because it is useful!

Wikidata was useful from the start. It provides a service to all Wikipedias and after the startup, it now provides the same service to Commons and Wikisource. It connects information about the same subject, they are the interwiki links.

The next phase was to connect these subjects. This is an internal Wikidata project and it not really used. This data could be useful but it is not always up to date and the requirements for the primary use cases are not realistic and almost impossible to fulfil. The challenge is to provide sourced information for every statement.

The challenge is: how do we provide a use for the Wikidata data. How do we get people to actually use Wikidata, have an interest in the data and maintain what is in their interest.

Software developers create "user stories" to explain what their software is to achieve. Why not write user stories that show how Wikidata can already be used and expand the stories on how to be even more useful and usable?
Thanks,
      GerardM

Sunday, April 16, 2017

#Wikipedia - The death of Lanier Meaders

Mr Meaders was a notable potter who died in February 1998 according to folkpottery.com. The English Wikipedia article however is in two minds about his death. Yes he is dead but when did he die?

According to the category he was one of the living death for 10 years. In the text the year of his demise is correctly stated as 1998. By googling for a source another date was found.

As I am not an English Wikipedian, I do not know how to indicate sources in English Wikipedia. The date of death in Wikidata does have a reference. The question is how differences like the dates of death of Mr Meaders are found and improve the consistency in the information that we provide in all of our projects.
Thanks,
      GerardM

NB the information in Wikidata on Mr Meaders is not complete.

Thursday, April 13, 2017

#Wikidata - People die; implications for another #policy approach

People die, notable people die. It is natural and it happens all the time. Many a #Wikipedia has a category for the people who died in a specific year. Such categories are what makes a wonderful tool by Pasleim tick. It shows those Wikidata items that have no date of death while a Wikipedia knows about the demise of the person involved.

This is a wonderful tool; it allows Wikidata to take care of those who died and update its data. It leaves us with another option and add one more tool. A tool that checks if the date of death exists in the Wikipedias that do not have such a category.

Consider this; a date of death is relevant when you consider the "Biographies of Living People". Having complete information for people is important. So why not flip our approach to the BLP and provide tools to improve the existing information in all of our projects?

First things first; the objective is to signal the death of a person. As is the current policy, it is up to every project to do with it as it likes. What should follow is looking for sources when one is available and preferably add at least one to Wikidata for re-use.

What are the benefits; a positive approach to maintenance and invite people to do something that actually matters now. It is an invitation to read the article and see what more can be done to get in into shape.

When the date for a death exists in an article, the article will be removed from the articles that need attention. There are plenty of valid approaches to this.

Improving user engagement is one of the objectives of the Wikimedia Foundation itself. I really want the WMF to include active engagement where it makes a difference and be as pro active as it can in this field. This is a positive approach and that is what we badly need.
Thanks,
      GerardM

Saturday, April 08, 2017

#WhiteHouse Fellows - Mrs Margarita Colmenares

Mrs Margarita Colmenares is a White House Fellow. A message was posted on Twitter that her article had been created and to support the message, it was easy enough to add her on Wikidata as well. The article mentioned that she was a White House Fellow and adding one layer of additional information is one way of making a person more relevant.

Adding this fellowship and adding other people who were a fellow was easy enough. The Wikipedia article referred to the website of the White House for information and when you visit its website you will be thanked for having an interest in this subject.

At a time like this it is good to consider Archive.org.  Its crawler worked well at some dates for other dates the message you will see is: "Got an HTTP 301 response at crawl time".

Anyway.. Together, the information at whitehouse.gov and at archive.org provide enough of a reference.
Thanks,
     GerardM

Friday, April 07, 2017

#Wikidata - #Perfection or #progress

When you consider the intention of the "BLP" or the "Biographies of Living People", you will find that it is defensive. It is the result of court cases brought against the Wikimedia Foundation or Wikipedians by living people. The result was a restrictive policy that intents to enforce the use of "sources" for all statements on living people.

The upside was fewer court cases and the downside; administrators who blindly applied this policy particularly in the big Wikipedias. Many people left, they no longer edit Wikipedia.

At Wikidata there are proponents of enforcing a BLP explicitly so that they have the "mandate" to block people when they consider them too often in violation of such a policy.

For a reality check; there are many known BLT issues in Wikidata that are not taken care of. There are tools like the one by Pasleim who make it easy to do so. There have been no external complaints about Wikidata so far but internal complaints, complaints about the quality of descriptions for instance, are easily waved away.

The implementation of a "DLP" or "Data of Living People" where "sources" are mandatory would kill much of the work done at Wikidata and will not have an effect on the existing backlog. Killing the backlog removes much of the usability of Wikidata and will prove to be even worse.

In order to responsibly consider new policies, first reflect on the current state of a project. What issues need to be addressed, what can be done to focus attention on the areas where it is most needed. How can we leverage what we know in other projects and in external sources. When it is really urgent make a cost analysis and improve the usability of our software to support the needed progress. And yes, stop insisting on perfection; it is what you aim for, No one of us is in a position to throw the first stone.
Thanks,
      GerardM


Wednesday, April 05, 2017

#Wikimedia and our #quality

In Berlin, the Wikimedia Foundation deliberated about the future. A lot of noble intentions were expressed. People went home glowing in the anticipation of all the good things they want. It is good to talk the talk and follow up and walk the walk.

A top priority for Wikidata is that it is used and useful. As it becomes more useful, quality becomes more of a priority for the people who use it. They will actively curate the data and remedy issues because they have a stake in the outcome.

So far Wikidata is largely filled with information from all the Wikipedias and this process can be improved substantially. For this to happen there is a need for more complete and up to date data. So what use can we give this data so that it gains use, and thereby gains value?

What if .. What if Wikidata could be used as an instrument to find the 4% of wiki links in Wikipedia that point to the wrong articles? With some minor changes to the MediaWiki software this can be done. This approach is described here for instance.. The beauty of this proposal is that not all the Wikipedians have to get involved, it is for those who care, for the rest it is mostly business as usual.

There are other benefits well. When it is "required" to add a source to a statement like "spouse of", it should be or is a requirement on the Wikipedia as well. When the source is associated with the Wiki link or red link for that matter, it should be possible for Wikidata to pick it up manually or with software.

When content of Wikidata more closely mirrors information of a Wikipedia in this way, it becomes easy and obvious to compare this information with other Wikipedias. Overall quality improves, but as relevant, the assurance we can give about our quality improves.

When we consider Wikimedia for the next 15 years, I expect that we will focus on quality and prevent bias not only by combining all our resources but also by reaching out to other trusted sources. By working together we will expose a lot more fake facts.
Thanks,
       GerardM

Sunday, April 02, 2017

#Wikidata - #Quality is a #perspective.

Forget absolutes. As an absolute quality does not exist for Wikidata. At best quality has attributes, attributes that can be manipulated, that interact. With 25,430,779 items any approach to quality will have a potentially negative quality effect when quality is approached from a different perspective.

Yet, we seek quality for our data and aim for quality to measurably improve. There are many perspectives possible and they have value, a value that is strengthened when it is combined with other perspectives.

At the Wikimedia Foundation, the "Biographies of Living Persons" or BLP has a huge impact. When you consider this policy, it is about biographies, a Wikipedia thing and this is not what Wikidata does. It is important to appreciate this as it is a key argument when a DLP "Data of Living Persons" is considered. Important is that the BLP focuses on articles for living people and its aim is to prevent law suits from articles that have a negative impact on living people.

Data is different, it is used differently and it has an impact in different ways.  Take for instance notability; a person may be notable and relevant because of having held an office or receiving an award. In order to complete information on the succession of an office or an award, it is therefore essential to include all persons involved in Wikidata. At the same time, when information is incomplete it can have an impact on a person as well. "you did not get that award because Wikidata does not say so".

Wikidata is incomplete and immature. Given the different perspectives on a DLP, most of them are not achievable in short order. The people who insist on a "source" for any statement will wipe most of the Wikidata statements and force it to a stand still. The people who insist on completeness have an impossible full time job for many years to come.

So what to do? Nothing is not an option but seeking ways to improve both quality and quantity is. A key value of Wikidata is its utility. The "Black Lunch Table" is one example of giving utility to Wikidata. They use Wikidata to manage the Wikipedia articles they want to write and expand on the notability of artists by including information on Wikidata. All the information helps people to write Wikipedia articles. Quality is important. Being included on the Black Lunch Table means something; artists are considered to be notable and worthy of a Wikipedia article.

Another example is using the links to authors so that people can read a book.

Given the size of Wikidata, it is impossible to get everything right in short order. When we can get people to adopt subsets of our data, these will grow. Our data will be linked. When we get to the stage where people actually object to data in Wikidata, we have improved both our quantity and quality substantially. As it is, looking at all the data, typically there is little to object to and that is in itself objectionable.
Thanks,
     GerardM

#Wikimedia - First a #strategy, then #Action

The people at Open Library have books they love to share. They are in the process of opening what they have even more.

In a previous post it was mentioned that there is a JSON document to getting information on authors like Cicero. There are many works by Cicero and today they have a JSON document in production for the books as well.

So what possible scenario is there for the readers of any Wikipedia; they check in Open Library what books there are for Cicero (or any other authors). They download a book and read it.

Where we are:
  • there is an API informing about authors and their books at Open Library based on the Open Library identifier.
  • an app can now be build that shows this information
    • this app could use identifiers of other "Sources" like Wikidata, VIAF or whatever on the assumption that Wikidata links these "Sources".
    • this app could show information based on Wikidata statements in any language using Wikidata labels.
    • this app may download the book (maybe not yet but certainly in the future)

What next:
  • investigate the JSON and see what we already can do with it
    • publish the results and iterate
  • Add more identifiers of authors known to Open Library to Wikidata
    • there are many OL identifiers in the Freebase information; they need to be extracted and a combined list of Wikidata identifiers and OL identifiers allows OL to curate it for redirects and we can then publish.
  • Raffaele Messuti pointed to existing functionality that retrieves an author ID for Wikidata and VIAF using an ISBN number.
    • Open Library knows about ISBN numbers for its books. When it runs the functionality for all the authors where it does not have a VIAF identifier it can enrich its database and share the information with Wikidata.
    • Alternatively someone does this based on exposed information at Open Library.. :)
  • We add a link to Open Library in the {{authority control}} in Wikipedia
  • We could add information for nearby libraries like they do in Worldcat [1].
  • We can measure how popular it is; how many people we refer to Open Library or to their library.
At the Wikimedia Foundation we aim to share in the sum of all knowledge. We aim to enable people acquire information. Making this happen for people at Wikipedia, Open Library and their library is part of this mission we just have to be bold and make it so.
Thanks,
      GerardM