Ten years later, in 2008, 50% of all the documents available on the internet were encoded in Unicode, with the other 50% encoded in ASCII. ASCII is still very useful, especially the original 7-bit plain ASCII, because it can be read, written, copied and printed by any text editor or word processor, and it is the only format compatible with 99% of all hardware and software.
First published in January 1991, Unicode "provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language" (excerpt from the website). This double-byte platform-independent encoding provides a basis for the processing, storage and interchange of text data in any language, and any modern software and information technology protocols. Unicode is maintained by the Unicode Consortium, and is a component of the W3C (World Wide Web Consortium) specifications.
= Language dictionaries
Logos is an international translation company with headquarters in Modena, Italy. In 1997, Logos had 200 in-house translators in Modena and 2,500 free-lance translators worldwide, who processed around 200 texts per day. The company made a bold move, and decided to put on the web all the linguistic tools used by its translators, for the internet community to freely use them as well. The linguistic tools were the Logos Dictionary, a multilingual dictionary with 7 billion words (in fall 1998); the Logos Wordtheque, a multilingual library with 300 billion words extracted from translated novels, technical manuals and other texts; the Logos Linguistic Resources, a database of 500 glossaries; and the Logos Universal Conjugator, a database for verbs in 17 languages.
When interviewed by Annie Kahn on December 7, 1997 for the French daily Le Monde, Rodrigo Vergara, head of Logos, explained: "We wanted all our translators to have access to the same translation tools. So we made them available on the internet, and while we were at it we decided to make the site open to the public. This made us extremely popular, and also gave us a lot of exposure. This move has in fact attracted many customers, and also allowed us to widen our network of translators, thanks to contacts made in the wake of the initiative."
In the same article, Annie Kahn wrote: "The Logos site is much more than a mere dictionary or a collection of links to other online dictionaries. The cornerstone is the document search program, which processes a corpus of literary texts available free of charge on the web. If you search for the definition or the translation of a word ('didactique', for example), you get not only the answer sought, but also a quote from one of the literary works containing the word (in our case, an essay by Voltaire). All it takes is a click on the mouse to access the whole text or even to order the book, including in foreign translations, thanks to a partnership agreement with the famous online bookstore Amazon.com. However, if no text containing the required word is found, the program acts as a search engine, sending the user to other web sources containing this word. In the case of certain words, you can even hear the pronunciation. If there is no translation currently available, the system calls on the public to contribute. Everyone can make suggestions, after which Logos translators check the suggested translations they receive."
Robert Beard, a language teacher at Bucknell University (in Lewisburg, Pennsylvania), founded the website "A Web of Online Dictionaries" (WOD) in 1995, and included it then in a larger project, yourDictionary.com, that he cofounded in early 2000. He wrote in January 2000: "The new website is an index of 1,200+ dictionaries in more than 200 languages. Besides the WOD, the new website includes a word-of-the-day-feature, word games, a language chat room, the old 'Web of Online Grammars' (now expanded to include additional language resources), the 'Web of Linguistic Fun', multilingual dictionaries; specialized English dictionaries; thesauri and other vocabulary aids; language identifiers and guessers, and other features; dictionary indices. yourDictionary.com will hopefully be the premiere language portal and the largest language resource site on the web. It is now actively acquiring dictionaries and grammars of all languages with a particular focus on endangered languages. It is overseen by a blue ribbon panel of linguistic experts from all over the world."
yourDictionary.com wants to be the premiere portal for all languages without any exception, and as such offers a specific section called Endangered Language Repository. Robert Beard explained in the same email interview: "Languages that are endangered are primarily languages without writing systems at all (only 1/3 of the world's 6,000+ languages have writing systems). I still do not see the web contributing to the loss of language identity and still suspect it may, in the long run, contribute to strengthening it. More and more Native Americans, for example, are contacting linguists, asking them to write grammars of their language and help them put up dictionaries. For these people, the web is an affordable boon for cultural expression."
The 6,700 languages of our planet are catalogued in "The Ethnologue: Languages of the World", an encyclopedia published by SIL International (SIL: Summer Institute of Linguistics). Barbara Grimes was the editor of the 8th to 14th editions, 1971-2000. She wrote in January 2000: "The Ethnologue is a catalog of the languages of the world, with information about where they are spoken, an estimate of the number of speakers, what language family they are in, alternate names, names of dialects, other socio-linguistic and demographic information, dates of published Bibles, a name index, a language family index, and language maps." The Ethnologue is freely available on the web. The print version and CD-ROM can be bought online.
= Minority languages
Caoimhín Ó Donnaíle teaches computing - through the Gaelic language - at the Institute Sabhal Mór Ostaig, located on the Island of Skye, in Scotland. He also maintains the bilingual (English, Gaelic) college website, which is the main site worldwide with information on Scottish Gaelic, as well as the webpage European Minority Languages, a list of minority languages by alphabetic order and by language family. He wrote in May 2001: "There has been a great expansion in the use of information technology in our college. Far more computers, more computing staff, flat screens. Students do everything by computer, use Gaelic spell-checking, and a Gaelic online terminology database. There are more hits on our website. There is more use of sound. Gaelic radio (both Scottish and Irish) is now available continuously worldwide via the internet. A major project has been the translation of the Opera web browser into Gaelic - the first software of this size available in Gaelic."
What about the internet and endangered languages? "I would emphasize the point that as regards the future of endangered languages, the internet speeds everything up. If people don't care about preserving languages, the internet and accompanying globalisation will greatly speed their demise. If people do care about preserving them, the internet will be a tremendous help."
Guy Antoine is the founder of Windows on Haiti, a reference website about Haitian culture. He wrote in November 1999: "In Windows on Haiti, the primary language of the site is English, but one will equally find a center of lively discussion conducted in 'Kreyòl'. In addition, one will find documents related to Haiti in French, in the old colonial Creole, and I am open to publishing others in Spanish and other languages. I do not offer any sort of translation, but multilingualism is alive and well at the site, and I predict that this will increasingly become the norm throughout the web."
Guy added in June 2001: "Kreyòl is the only national language of Haiti, and one of its two official languages, the other being French. It is hardly a minority language in the Caribbean context, since it is spoken by eight to ten million people. (…) I have taken the promotion of Kreyòl as a personal cause, since that language is the strongest of bonds uniting all Haitians, in spite of a small but disproportionately influential Haitian elite's disdainful attitude to adopting standards for the writing of Kreyòl and supporting the publication of books and official communications in that language. For instance, there was recently a two-week book event in Haiti's Capital and it was promoted as 'Livres en folie' ('A mad feast for books'). Some 500 books from Haitian authors were on display, among which one could find perhaps 20 written in Kreyòl. This is within the context of France's major push to celebrate Francophony among its former colonies. This plays rather well in Haiti, but directly at the expense of Creolophony. What I have created in response to those attitudes are two discussion forums on my website, Windows on Haiti, held exclusively in Kreyòl. One is for general discussions on just about everything but obviously more focused on Haiti's current socio-political problems. The other is reserved only to debates of writing standards for Kreyòl. Those debates have been quite spirited and have met with the participation of a number of linguistic experts. The uniqueness of these forums is their non-academic nature."
= Translations
Henk Slettenhaar is a professor in communication technologies at Webster University, Geneva, Switzerland. He has regularly insisted on the need of bilingual websites, in the original language and in English. He wrote in December 1998: "I see multilingualism as a very important issue. Local communities that are on the web should principally use the local language for their information. If they want to present it to the world community as well, it should be in English too. I see a real need for bilingual websites. I am delighted there are so many offerings in the original language now. I much prefer to read the original with difficulty than getting a bad translation."
Henk added in August 1999: "There are two main categories of websites in my opinion. The first one is the global outreach for business and information. Here the language is definitely English first, with local versions where appropriate. The second one is local information of all kinds in the most remote places. If the information is meant for people of an ethnic and/or language group, it should be in that language first, with perhaps a summary in English. We have seen lately how important these local websites are - in Kosovo and Turkey, to mention just the most recent ones. People were able to get information about their relatives through these sites."
Jean-Pierre Cloutier was the editor of "Chroniques de Cybérie", a weekly French-language online report of internet news. Jean- Pierre wrote in August 1999: "The web is going to grow in non- English-speaking regions. So we have to take into account the technical aspects of the medium if we want to reach these 'new' users. I think it is a pity there are so few translations of important documents and essays published on the web - from English into other languages and vice versa. (…) In the same way, the recent spreading of the internet in new regions raises questions which would be good to read about. When will Spanish- speaking communication theorists and those speaking other languages be translated?"
Marcel Grangier is the head of the French Section of the Swiss Federal Government's Central Linguistic Services, which means he is in charge of organizing translations into French for the Swiss government. He wrote in January 1999: "We can see multilingualism on the internet as a happy and irreversible inevitability. So we have to laugh at the doomsayers who only complain about the supremacy of English. Such supremacy is not wrong in itself, because it is mainly based on statistics (more PCs per inhabitant, more people speaking English, etc.). The answer is not to 'fight' English, much less whine about it, but to build more sites in other languages. As a translation service, we also recommend that websites be multilingual. The increasing number of languages on the internet is inevitable and can only boost multicultural exchanges. For this to happen in the best possible circumstances, we still need to develop tools to improve compatibility. Fully coping with accents and other characters is only one example of what can be done."
2001: COPYRIGHT, COPYLEFT AND CREATIVE COMMONS
= [Overview]
Creative Commons (CC) was founded in 2001 by Lawrence Lessing, a professor at Stanford Law School, California. As explained on its website, "Creative Commons is a nonprofit corporation dedicated to making it easier for people to share and build upon the work of others, consistent with the rules of copyright. We provide free licenses and other legal tools to mark creative work with the freedom the creator wants it to carry, so others can share, remix, use commercially, or any combination thereof." There were one million Creative Commons licensed works in 2003, 4.7 million licensed works in 2004, 20 million licensed works in 2005, 50 million licensed works in 2006, 90 million licensed works in 2007, and 130 million licensed works in 2008. Science Commons was founded in 2005 to "design strategies and tools for faster, more efficient web- enabled scientific research." ccLearn was founded in 2007 as "a division of Creative Commons dedicated to realizing the full potential of the internet to support open learning and open educational resources."
= Copyright on the web
What did people think about copyright on the web, when there were heated debated about print articles and other copyrighted works being posted and re-posted without the consent of their authors? Here are some answers.
Based in San Francisco, California, Jacques Gauchey was a journalist in information technology and a "facilitator" between the United States and Europe. He wrote in July 1999: "Copyright in its traditional context doesn't exist any more. Authors have to get used to a new situation: the total freedom of the flow of information. The original content is like a fingerprint: it can't be copied. So it will survive and flourish."
Guy Antoine is the founder of Windows on Haiti, a reference website about Haitian culture. He wrote in November 1999: "The debate will continue forever, as information becomes more conspicuous than the air that we breathe and more fluid than water. (…) Authors will have to become a lot more creative in terms of how to control the dissemination of their work and profit from it. The best that we can do right now is to promote basic standards of professionalism, and insist at the very least that the source and authorship of any work be duly acknowledged. Technology will have to evolve to support the authorization process."
Alain Bron is a consultant in information systems and a novelist. He wrote in November 1999: "I regard the web today as a public domain. That means in practice the notion of copyright on it disappears: everyone can copy everyone else. Anything original risks being copied at once if copyrights are not formally registered or if works are available without payment facilities. A solution is to make people pay for information, but this is no watertight guarantee against it being copied."
Peter Raggett was the deputy-head (and now the head) of the OECD Central Library (OECD: Organization for Economic and Cooperation Development). He wrote in August 1999: "The copyright question is still very unclear. Publishers naturally want their fees for each article ordered and librarians and end-users want to be able to download immediately the full text of articles. At the moment, each publisher seems to have its own policy for access to electronic versions and they would benefit from having some kind of homogenous policy, preferably allowing unlimited downloading of their electronic material."
Tim McKenna is an author who thinks and writes about the complexity of truth in a world of flux. He wrote in October 2000: "Copyright is a difficult issue. The owner of the intellectual property thinks that s/he owns what s/he has created. I believe that the consumer purchases the piece of plastic (in the case of a CD) or the bounded pages (in the case of book). The business community has not found a new way to add value to intellectual property. Consumers don't think very abstractly. When they download songs for example, they are simply listening to them, they are not possessing them. The music and publishing industry need to find ways to give consumers tactile vehicles for selling the intellectual property."
= Copyright and WIPO
Since the web became mainstream, the posting by the thousands of electronic texts and other documents has been an headache for organizations in charge of applying the rules relating to intellectual property.
The World Intellectual Property Organization (WIPO) is an intergovernmental organization, and one of the 16 specialized agencies of the United Nations. It is responsible for protecting intellectual property throughout the world through cooperation among countries. It is also responsible for implementing various multilateral treaties dealing with the legal and administrative aspects of intellectual property.
Intellectual property comprises industrial property and copyright. Industrial property relates to inventions, trademarks, industrial designs and appellations of origin. Copyright relates to literary, musical, artistic, photographic and audiovisual works. WIPO stated on its website in 1999: "As regards the number of literary and artistic works created worldwide, it is difficult to make a precise estimate. However, the information available indicates that at present around 1,000,000 books/titles are published and some 5,000 feature films are produced in a year, and the number of copies of phonograms sold per year presently is more than 3,000 million."
Copyright protection means that using a copyrighted work is lawful only if we get authorization from the copyright owner. As explained by WIPO on its website in the section "International Protection of Copyright and Neighbouring Rights", the authorizations granted by the copyright owner can be: "The right to copy or otherwise reproduce any kind of work; the right to distribute copies to the public; the right to rent copies of at least certain categories of works (such as computer programs and audiovisual works); the right to make sound recordings of the performances of literary and musical works; the right to perform in public, particularly musical, dramatic or audiovisual works; the right to communicate to the public by cable or otherwise the performances of such works and, particularly, to broadcast, by radio, television or other wireless means, any kind of work; the right to translate literary works; the right to rent, particularly, audiovisual works, works embodied in phonograms and computer programs; the right to adapt any kind of work and particularly the right to make audiovisual works thereof."
Under some national laws, some of these rights - which together are referred to as "economic rights" - are not exclusive rights of authorization but, in some specific cases, merely rights to remuneration. In addition to economic rights, authors - whether or not they own the economic rights - enjoy "moral rights" on the basis of which authors have the right to claim their authorship and require that their names be indicated on the copies of the work and in connection with other uses, and they have the right to oppose the mutilation or deformation of their works.
= Shrinking of public domain
Michael Hart created Project Gutenberg in July 1971 to make electronic versions of literary works and disseminate them for free. In 2009, Project Gutenberg has had tens of thousands of downloads every day. As recalled by Michael in January 2009, "I knew [in July 1971] that the future of computing, and the internet, was going to be… 'The Information Age.' That was also the day I said we would be able to carry quite literally the entire Library of Congress in one hand and the system would certainly make it illegal… too much power to leave in the hands of the masses."
As defined by Project Gutenberg, "public domain is the set of cultural works that are free of copyright, and belong to everyone equally", i.e. for books, the ones that can be digitized and released on the internet for free. But the task of Project Gutenberg hasn't be made any easier by the increasing restrictions to public domain. In former times, 50% of works belonged to public domain, and could be freely used by everybody. A much tougher legislation was set in place over the centuries, step by step, especially during the 20th century, despite our so-called "information society". In 2100, 99% of works might be governed by copyright, with a meager 1% for public domain.
In the "Copyright HowTo" section of its website, Project Gutenberg explains how to confirm the public domain status of books according to U.S. copyright laws. Here is a summary: (a) Works published before 1923 entered the public domain no later than 75 years from the copyright date: all these works belong to public domain; (b) Works published between 1923 and 1977 retain copyright for 95 years: no such works will enter the public domain until 2019; (c) Works created from 1978 on enter the public domain 70 years after the death of the author if the author is a natural person: nothing will enter the public domain until 2049; (d) Works created from 1978 on enter the public domain 95 years after publication or 120 years after creation if the author is a corporate one: nothing will enter the public domain until 2074.
Each copyright legislation is more restrictive than the previous one. A major blow for digital libraries was the amendment to the 1976 Copyright Act signed on October 27, 1998. As explained by Michael Hart in July 1999: "Nothing will expire for another 20 years. We used to have to wait 75 years. Now it is 95 years. And it was 28 years (+ a possible 28-year extension, only on request) before that, and 14 years (+ a possible 14-year extension) before that. So, as you can see, this is a serious degrading of the public domain, as a matter of continuing policy."
John Mark Ockerbloom, founder of The Online Books Page in 1993, got also deeply concerned by the 1998 amendment. He wrote in August 1999: "I think it is important for people on the web to understand that copyright is a social contract that is designed for the public good - where the public includes both authors and readers. This means that authors should have the right to exclusive use of their creative works for limited times, as is expressed in current copyright law. But it also means that their readers have the right to copy and reuse the work at will once copyright expires. In the U.S. now, there are various efforts to take rights away from readers, by restricting fair use, lengthening copyright terms (even with some proposals to make them perpetual) and extending intellectual property to cover facts separate from creative works (such as found in the 'database copyright' proposals). There are even proposals to effectively replace copyright law altogether with potentially much more onerous contract law. (…) Stakeholders in this debate have to face reality, and recognize that both producers and consumers of works have legitimate interests in their use. If intellectual property is then negotiated by a balance of principles, rather than as the power play it is too often ends up being ('big money vs. rogue pirates'), we may be able to come up with some reasonable accommodations."
Michael Hart wrote in July 1999: "No one has said more against copyright extensions than I have, but Hollywood and the big publishers have seen to it that our Congress won't even mention it in public. The kind of copyright debate going on is totally impractical. It is run by and for the 'Landed Gentry of the Information Age.' 'Information Age'? For whom?"
Sure enough. We regularly hear about the great "information age" we live in, while seeing the tightening of laws relating to dissemination of information. The contradiction is obvious. This problem has also affected several European countries, where the copyright law switched from "author's life plus 50 years" to "author's life plus 70 years", following pressure from content owners who successfully lobbied for "harmonization" of national copyright laws as a response to "globalization of the market". To regulate the copyright of digital editions in the wake of the relevant WIPO international treaties, the Digital Millenium Copyright Act (DMCA) was ratified in October 1998 in the United States, and the European Union Copyright Directive (EUCD) was ratified in May 2001 by the European Commission.
According to Michael Hart, and Project Gutenberg CEO Greg Newby, "as of January 2009, the total number of separate public domain books in the world is between 20 and 30 million, and that 5 million are already on the internet, and we expect another million per year from now until all the easy-to-find books are done. 10 million or so will be done before people start to think about the facts telling them the rate cannot continue to double as they come up to the point of already having done half. New copyrights lasting virtually for ever in the U.S. will bring the growth process to a screeching halt when The Mickey Mouse copyright laws, literally, copyright laws on Mickey Mouse, and Winnie-the-Pooh, etc., stop all current copyright from expiring for the forseeable future."
= Copyleft and Creative Commons
The term "copyleft" was invented in 1984 by Richard Stallman, a
computer scientist at MIT (Massachusetts Institute of
Technology), who launched the GNU project to develop a complete
Unix-like operating system called the GNU system.
As explained on the GNU website: "Copyleft is a general method for making a program or other work free, and requiring all modified and extended versions of the program to be free as well. (…) Copyleft says that anyone who redistributes the software, with or without changes, must pass along the freedom to further copy and change it. Copyleft guarantees that every user has freedom. (…) Copyleft is a way of using of the copyright on the program. It doesn't mean abandoning the copyright; in fact, doing so would make copyleft impossible. The word 'left' in 'copyleft' is not a reference to the verb 'to leave' — only to the direction which is the inverse of 'right'. (…) The GNU Free Documentation License (FDL) is a form of copyleft intended for use on a manual, textbook or other document to assure everyone the effective freedom to copy and redistribute it, with or without modifications, either commercially or non commercially."
Creative Commons (CC) was founded in 2001 by Lawrence Lessing, a professor at Stanford Law School, California. As explained on its website: "Creative Commons is a nonprofit corporation dedicated to making it easier for people to share and build upon the work of others, consistent with the rules of copyright. We provide free licenses and other legal tools to mark creative work with the freedom the creator wants it to carry, so others can share, remix, use commercially, or any combination thereof."
There were one million Creative Commons licensed works in 2003, 4.7 million licensed works in 2004, 20 million licensed works in 2005, 50 million licensed works in 2006, 90 million licensed works in 2007, and 130 million licensed works in 2008.
Science Commons was founded in 2005. As explained on its website: "Science Commons designs strategies and tools for faster, more efficient web-enabled scientific research. We identify unnecessary barriers to research, craft policy guidelines and legal agreements to lower those barriers, and develop technology to make research, data and materials easier to find and use. Our goal is to speed the translation of data into discovery — unlocking the value of research so more people can benefit from the work scientists are doing."
ccLearn was founded in 2007. As explained on its website: "ccLearn is a division of Creative Commons dedicated to realizing the full potential of the internet to support open learning and open educational resources. Our mission is to minimize legal, technical, and social barriers to sharing and reuse of educational materials."
2002: A WEB OF KNOWLEDGE
= [Overview]
The MIT OpenCourseWare (MIT OCW) is an initiative launched by MIT (Massachusetts Institute of Technology) in 2002 to put its course materials for free on the web, as a way to promote open dissemination of knowledge. In September 2002, a pilot version was available online with 32 course materials. In November 2007, all 1,800 course materials were available, with 200 new and updated courses per year. From 2003 onwards, in the same spirit of free access of knowledge, the Public Library of Science (PLoS) launched several high-quality online periodicals. New kinds of encyclopedias were set up, for the general public to both use available articles and contribute to their writing. Wikipedia, launched in 2001, became the leading online cooperative encyclopedia worldwide, with hundreds and then thousands of contributors writing articles or editing and updating them, leading the way to other initiatives like Citizendium. launched in 2006, and the Encyclopedia of Life, launched in 2007.
= Culture, from print to digital
More and more computers connected to the internet were available in schools and at home in the mid-1990s. Teachers began exploring new ways of teaching. Going from print book culture to digital culture was changing relationship to knowledge, and the way both scholars and students were seeing teaching and learning. Print book culture provided stable information whereas digital culture provided "moving" information. During a conference organized by the International Federation of Information Processing (IFIP) in September 1996, Dale Spender gave a lecture about "Creativity and the Computer Education Industry", with insightful comments on forthcoming trends.
Here are some excerpts:
"Throughout print culture, information has been contained in books - and this has helped to shape our notion of information. For the information in books stays the same - it endures. And this has encouraged us to think of information as stable - as a body of knowledge which can be acquired, taught, passed on, memorised, and tested of course. The very nature of print itself has fostered a sense of truth; truth too is something which stays the same, which endures. And there is no doubt that this stability, this orderliness, has been a major contributor to the huge successes of the industrial age and the scientific revolution. (…)
But the digital revolution changes all this. Suddenly it is not the oldest information - the longest lasting information that is the most reliable and useful. It is the very latest information that we now put the most faith in - and which we will pay the most for. (…) Education will be about participating in the production of the latest information. This is why education will have to be ongoing throughout life and work. Every day there will be something new that we will all have to learn. To keep up. To be in the know. To do our jobs. To be members of the digital community. And far from teaching a body of knowledge that will last for life, the new generation of information professionals will be required to search out, add to, critique, 'play with', and daily update information, and to make available the constant changes that are occurring."
Russon Wooldridge, a professor in the Department of French Studies at the University of Toronto, Canada, wrote in February 2001: "All my teaching makes the most of internet resources (web and email): the two common places for a course are the classroom and the website of the course, where I put all course materials. I have published all my research data of the last 20 years on the web (re-edition of books, articles, texts of old dictionaries as interactive databases, treaties from the 16th century, etc.). I publish proceedings of symposiums, I publish a journal, I collaborate with French colleagues by publishing online in Toronto what they can't publish online at home. In May 2000, I organized an international symposium in Toronto about French studies enhanced by new technologies (Les études françaises valorisées par les nouvelles technologies). (…)
I realize that without the internet I wouldn't have as many activities, or at least they would be very different from the ones I have today. So I don't see the future without them. But it is crucial that those who believe in free dissemination of knowledge make sure that knowledge is not 'eaten' by commercial ventures for them to sell it. What has happened in book publishing in France, in linguistics for example, where you can only find textbooks for schools and exams, should be avoided on the web. You don't go to Amazon.com and the likes to find disinterested science. On my website, I refuse any sponsorship."
= A few leading projects
# MIT OpenCourseWare
The MIT OpenCourseWare (MIT OCW) is an initiative launched by MIT (Massachusetts Institute of Technology) to put its course materials for free on the web, as a way to promote open dissemination of knowledge. In September 2002, a pilot version was available online with 32 course materials. The website was officially launched in September 2003. 500 course materials were available in March 2004. In May 2006, 1,400 course materials were offered by 34 departments belonging to the five schools of MIT. In November 2007, all 1,800 course materials were available, with 200 new and updated courses per year.
MIT also launched the OpenCourseWare Consortium (OCW Consortium) in November 2005, as a collaboration of educational institutions that were willing to offer free online course materials. One year later, it included the course materials of 100 universities worldwide.
# Public Library of Science
With the internet as a powerful medium to disseminate information, it seems quite outrageous that the results of research - original works requiring many years of efforts - are "squatted" by publishers claiming ownership on these works, and selling them at a high price. The work of researchers is often publicly funded, especially in North America. It would therefore seem appropriate that the scientific community and the general public can freely enjoy the results of such research. In science and medicine for example, more than 1,000 new articles reviewed by peers are published daily.
The Public Library of Science (PLoS) was founded in October 2000 by biomedical scientists Harold Varmus, Patrick Brown and Michael Eisen, from Stanford University, Palo Alto, and University of California, Berkeley. Headquartered in San Francisco, PLoS is a non-profit organization whose mission is to make the world’s scientific and medical literature a public resource in free online archives. Instead of information disseminated in millions of reports and thousands of online journals, a single point would give access to the full content of these articles, with a search engine and hyperlinks between articles.
PLoS posted an open letter requesting the articles presently published by journals to be distributed freely in online archives, and asking researchers to promote the publishers willing to support this project. From October 2000 to September 2002, the open letter was signed by 30,000 scientists from 180 countries. The publishers' answer was much less enthusiastic, although a number of publishers agreed for their articles to be distributed freely immediately after publication, or six months after publication in their journals. But even the publishers who initially agreed to support the project made so many objections that it was finally abandoned.
Another objective of PLoS was to become a publisher while creating a new model of online publishing based on free dissemination of knowledge. In early 2003, PLoS created a non- profit scientific and medical publishing venture to provide scientists and physicians with free high-quality, high-profile journals in which to publish their work. The journals were PLoS Biology (launched in 2003), PLoS Medicine (2004), PLoS Genetics (2005), PLoS Computational Biology (2005), PLoS Pathogens (2005), PLoS Clinical Trials (2006) and PLoS Neglected Tropical Diseases (2007), the first scientific journal on this topic.
All PLoS articles are freely available online, on the websites of PLoS and in the public archive PubMed Central, run by the National Library of Medicine. The articles can be freely redistributed and reused under a Creative Commons license, including for translations, as long as the author(s) and source are cited. PLoS also launched PLoS ONE, an online forum meant to publish articles on any subject relating to science or medicine.
Three years after the beginning of PLoS as a publisher, PLoS Biology and PLoS Medicine have had the same reputation for excellence as the leading journals Nature, Science and The New England Journal of Medicine. PLoS has received financial support from several foundations while developing a viable economic model from fees paid by published authors, advertising, sponsorship, and paid activities organized for PLoS members. PLoS also hopes to encourage other publishers to adopt the open access model, or to convert their existing journals to an open access model.
# Wikipedia
Wikipedia was launched in January 2001 by Jimmy Wales and Larry Sanger (Larry resigned later on). It has quickly grown into the largest reference website on the internet, financed by donations, with no advertising. Its multilingual content is free and written collaboratively by people worldwide, who contribute under a pseudonym. Its website is a wiki, which means that anyone can edit, correct and improve information throughout the encyclopedia. The articles stay the property of their authors, and can be freely used according to the GFDL (GNU Free Documentation License).
In December 2004, Wikipedia had 1.3 million articles (by 13,000 contributors) in 100 languages. In December 2006, it had 6 million articles in 250 languages. In May 2007, it had 7 million articles in 192 languages, including 1.8 million articles in English, 589,000 articles in German, 500,000 articles in French, 260,000 articles in Portuguese, and 236,000 articles in Spanish.
Wikipedia is hosted by the Wikimedia Foundation, founded in June 2003, which has run a number of other projects, beginning with Wiktionary (launched in December 2002) and Wikibooks (launched in June 2003), followed by Wikiquote, Wikisource (texts from public domain), Wikimedia Commons (multimedia), Wikispecies (animals and plants), Wikinews, Wikiversity (textbooks), and Wiki Search (search engine).
# Citizendium
Citizendium was launched in October 2006 as a pilot project to build a new encyclopedia, at the initiative of Larry Sanger, who was the cofounder of Wikipedia (with Jimmy Wales) in January 2001, but resigned later on, over policy and content quality issues. Citizendium - which stands for a "citizen's compendium of everything" - is a wiki project open to public collaboration, but combining "public participation with gentle expert guidance".
The project is experts-led, not experts-only. Contributors use their own names, not anonymous pseudonyms (like in Wikipedia), and they are guided by expert editors. As explained by Larry in his essay "Toward a New Compendium of Knowledge", posted in September 2006: "Editors will be able to make content decisions in their areas of specialization, but otherwise working shoulder-to-shoulder with ordinary authors." There are also constables who make sure the rules are respected.
Citizendium was launched on March 25, 2007, with 1,100 articles, 820 authors and 180 editors. There were 9,800 high- quality articles in January 2009, and 11,800 articles in August 2009. Citizendium also wants to act as a prototype for upcoming large scale knowledge-building projects that would deliver reliable reference, scholarly and educational content.
# Encyclopedia of Life
The Encyclopedia of Life was launched in May 2007 as a global scientific effort to document all known species of animals and plants (1.8 million), including endangered species, and expedite the millions of species yet to be discovered and catalogued (about 8 million).
This collaborative effort is led by several main institutions: Field Museum of Natural History, Harvard University, Marine Biological Laboratory, Missouri Botanical Garden, Smithsonian Institution, Biodiversity Heritage Library (BHL). The initial funding came from the MacArthur Foundation (US $10 million) and the Sloan Foundation ($2.5 million). A $100 million funding over ten years will be necessary before self-financing.
The multimedia encyclopedia will gather texts, photos, maps, sound and videos, with a webpage for each species. It will provide a single portal for millions of documents scattered online and offline. As a teaching and learning tool for a better understanding of our planet, the encyclopedia wants to reach everyone: researchers, teachers, students, pupils, media, policy makers and the general public.
The encyclopedia's honorary chair is Edward Wilson, professor emeritus at Harvard University, who was the first to express the wish for such an encyclopedia, in an essay dated 2002. Five years later, his project could become reality thanks to technology improvements for content aggregators, mash-up, wikis, and large scale content management.
As a consortium of the ten largest life science libraries, the Biodiversity Heritage Library (BHL) started the digitization of 2 million documents from public domain spanning over 200 years. In May 2007, when the project was officially launched, 1.25 million pages were already digitized in London, Boston and Washington DC, and available in the Text Archive section of the Internet Archive.
The Encyclopedia of Life is built on the work of thousands of experts around the globe, in a moderated wiki-style environment, for the general public to be able to contribute. The first pages were available in mid-2008. The encyclopedia should be fully "operational" in 2012 and completed with all known species in 2017. The English version will be translated in several languages by partner organizations. People will be able to use the encyclopedia as a "macroscope" to identify major trends from a considerable stock of information - in the same way they use a microscope for the study of detail.
2003: EBOOKS ARE SOLD WORLDWIDE
= [Overview]
First, publishers began to sell digital versions of their books online, on their own websites or on the new eBookstores of Amazon.com and Barnes & Noble.com. In 2000, new online bookstores were created to sell "only" digital books (ebooks), like Palm Digital Media (renamed Palm eBook Store), Mobipocket or Numilog. At the same time, publishers were digitizing their books by the hundreds, while the public was getting used to read ebooks on computers, laptops, phones, smartphones and reading devices. 2003 was a turning point in an emerging market. More and more books were published simultaneously as a print book and a digital book, and thousands of new books, beginning with best-sellers, were sold as ebooks in various formats: PDF (to be read on Acrobat Reader, replaced by Adobe Reader), LIT (to be read on Microsoft Reader), PRC (to be read on Mobipocket Reader) and others, with the Open eBook format becoming a standard for ebooks.
= Books, from print to digital
The new online bookstores selling "only" digital books were also called aggregators because they were producing and selling ebooks from many publishers. It took them a few years (at least in Europe) to convince publishers that books should have two versions, print and digital, and to wait for the public to be ready to read on an electronic device, be it a computer, a laptop, a PDA, a mobile phone, a smartphone or a reading device. This emerging market took off in 2003, and more and more books were simultaneously published as a print book and a digital book.
In the 1990s, few people believed digital books would be commonplace in the near future. They thought people would still be attached to print books regardless of whatever happened, remembering this sentence of Robert Downs, a librarian who wrote in the 1980s: "My lifelong love affair with books and reading continues unaffected by automation, computers, and all other forms of the twentieth-century gadgetry." (excerpt from "Books in My Life", Library of Congress, 1985)
In an article published in February 1996 by the Swiss magazine "Informatique-Informations", Pierre Perroud, founder of the digital library Athena, explained that "electronic texts represent an encouragement to reading and a convivial participation to culture dissemination", particularly for textual research and text study. These texts are "a good complement to the print book, which remains irreplaceable when for 'true' reading. (…) The book remains a mysteriously holy companion with profound symbolism for us: we grip it in our hands, we hold it against us, we look at it with admiration; its small size comforts us and its content impresses us; its fragility contains a density we are fascinated by; like man it fears water and fire, but it has the power to shelter man's thoughts from time."
Roberto Hernández Montoya, an editor of the electronic magazine Venezuela Analítica, wrote in September 1998: "The printed text can't be replaced, at least not for the foreseeable future. The paper book is a tremendous 'machine'. We can't leaf through an electronic book in the same way as a paper book. On the other hand electronic use allows us to locate text chains more quickly. In a certain way we can more intensively read the electronic text, even with the inconvenience of reading on the screen. The electronic book is less expensive and can be more easily distributed worldwide (if we don't count the cost of the computer and the internet connection)."
In the 2000s, while many people still prefer reading a print book, more and more readers enjoy reading their ebooks on their notebook, smartphone or any other electronic device. They buy their ebooks online from Amazon, Barnes & Noble, Yahoo, Palm, Mobipocket or Numilog.
In March 2000, Numilog was founded by Denis Zwirn near Paris, France, as a company specializing in the distribution of digital books. Numilog launched in September 2000 an online bookstore that became the main French-speaking aggregator of digital books over the years. Numilog has sold books and audiobooks in partnership with a number of publishers, including Gallimard, POL, Le Dilettante, Le Rocher, La Découverte, De Vive Voix, Eyrolles or Pearson Education France. Numilog was bought in May 2008 by Hachette Livre, a leading publishing group.
= Adobe Reader
Adobe launched PDF (Portable Document Format) in June 1993, with Acrobat Reader (free, to read PDF documents) and Adobe Acrobat (for a fee, to make PDF documents). As the "veteran" format, PDF was perfected over the years as a global standard for distribution and viewing of information. It "lets you capture and view robust information from any application, on any computer system and share it with anyone around the world. Individuals, businesses, and government agencies everywhere trust and rely on Adobe PDF to communicate their ideas and vision" (excerpt from the website). Adobe Acrobat gave the tools to create and view PDF files, in several languages and for several platforms (Windows, Mac, Linux).
In August 2000, Adobe bought Glassbook, a company specializing in digital books software for publishers, booksellers, distributors and libraries. Adobe also partnered with Amazon.com and Barnes & Noble.com to offer ebooks for the Acrobat Reader and the Glassbook Reader.
In January 2001, Adobe launched the Acrobat eBook Reader (free) and the Adobe Content Server (for a fee).
The Acrobat eBook Reader was used to read PDF files of copyrighted books, while adding notes and bookmarks, getting the book covers in a personal library, and browsing a dictionary.
The Adobe Content Server was intended for publishers and distributors for the packaging, protection, distribution and sale of copyrighted books in PDF format, while managing their access with DRM (Digital Rights Management), according to instructions given by the copyright holder, for example allowing or not the printing and loan of ebooks. (It was replaced with the Adobe LiveCycle Policy Server in November 2004.)
In April 2001, Adobe partnered with Amazon.com, for the online bookstore to include 2,000 copyrighted books for the Acrobat eBook Reader. These were titles of major publishers, travel guides, and children books.
The same year, the Acrobat Reader was available for PDAs, beginning with the Palm Pilot (May 2001) and the Pocket PC (December 2001).
Between 1993 and 2003, over 500 million copies of Acrobat Reader were downloaded worldwide. In 2003, Acrobat Reader was available in many languages and for many platforms (Windows, Mac, Linux, Palm OS, Pocket PC, Symbian OS, etc.). Approximately 10% of the documents on the internet were available in PDF.
In May 2003, Acrobat Reader (5th version) merged with Acrobat eBook Reader (2nd version) to become Adobe Reader (starting with version 6), which could read both standard PDF files and secure PDF files of copyrighted books.
In late 2003, Adobe opened its own online bookstore, the Digital Media Store, with titles in PDF format from major publishers (HarperCollins, Random House, Simon & Schuster, etc.) as well as electronic versions of newspapers and magazines like The New York Times, Popular Science, etc. Adobe also launched Adobe eBooks Central as a service to read, publish, sell and lend ebooks, and Adobe eBook Library as a prototype digital library.
= Open eBook and ePub
In 1999, there were nearly as many ebook formats as ebooks, with each new company creating its own format for its own ebook reader (software) and its own electronic device, for example the Glassbook Reader, the Peanut Reader, the Rocket eBook Reader (for the Rocket eBook), the Franklin Reader (for the eBookMan), the Cytale ebook reader (for the Cybook), the Gemstar eBook Reader (for the Gemstar eBook), the Palm Reader (for the Palm Pilot), etc.
The digital publishing industry felt the need to work on a common format for ebooks. It released in September 1999 the first version of the Open eBook (OeB) format, based on XML (eXtensible Markup Language) and defined by the Open eBook Publication Structure (OeBPS). The Open eBook Forum was created in January 2000 to develop the OeB format and OeBPS specifications. Since 2000, most ebook formats were derived from - or are compatible with the OeB format, for example the PRC format from Mobipocket or the LIT format from Microsoft.
In April 2005, the Open eBook Forum became the International Digital Publishing Forum (IDPF). The OeB format was replaced with the ePub format, a global standard for ebooks with PDF. The PDF files created with recent versions of Adobe Acrobat are compatible with the ePub format.
= Microsoft Reader
Microsoft launched the Microsoft Reader in April 2000, for people to read books in LIT (from "literature") format on its new PDA, the Pocket PC. Four months later, in August 2000, the Microsoft Reader was available for computers, and then for any Windows platform, for example the platforms of the Tablets PC launched in November 2002.
Microsoft billed publishers and distributors for the use of its
DRM technology through the Microsoft DAS Server, with a
commission on each sale. Microsoft also partnered with major
online bookstores - Barnes & Noble.com in January 2000 and
Amazon.com in August 2000 - for them to offer ebooks for the
Microsoft Reader in eBookstores soon to be launched. Barnes &
Noble.com opened its eBookstore in August 2000, followed by
Amazon in November 2000.
= Mobipocket Reader
Mobipocket was founded in March 2000 in Paris, France, by Thierry Brethes and Nathalie Ting, as a company specializing in ebooks for PDAs. The Mobipocket format (PRC, based on the OeB format) and the Mobipocket Reader were "universal" and could be used on any PDA - and also on any computer from April 2002. They quickly became global standards for ebooks on mobile devices.
In October 2001, the Mobipocket Reader received the eBook Technology Award from the International Book Fair in Frankfurt. Mobipocket partnered with Franklin for the Mobipocket Reader to be available on the eBookMan, Franklin's personal assistant, instead of the initially planned Microsoft Reader.
The Mobipocket Web Companion was a software (for a fee) for extracting content from partner news sites. The Mobipocket Publisher was used by individuals (free version for private use, and standard version for a fee) or publishers (professional version for a fee) to create ebooks using the Mobipocket DRM technology for controlling access to copyrighted ebooks. The Mobipocket Publisher could also create ebooks in LIT format for the Microsoft Reader.
In spring 2003, the Mobipocket Reader was available in several languages (French, English, German, Spanish, Italian) and could be used on any PDA and computer, and on the smartphones of Nokia and Sony Ericsson. 6,000 titles in several languages were available on Mobipocket's website and in partner online bookstores.
Mobipocket was bought by Amazon in April 2005. It now operates within the Amazon brand, with a multilingual catalog of 70,000 books in 2008.
2004: AUTHORS ARE CREATIVE ON THE NET
= [Overview]
Some authors have enjoyed creating websites, posting their works and communicating with readers by email. Other authors have begun searching how using hyperlinks could expand their writing towards new directions, while linking it to images and sound. Jean-Paul switched from being a print author to being an hypermedia author, while enjoying the freedom given by online (self-)publishing: "The internet allows me to do without intermediaries such as record companies, publishers and distributors. Most of all, it allows me to crystallize what I have in my head: the print medium only allows me to partly do that. (…) Surfing the web is like radiating in all directions (I am interested in something and I click on all the links on a home page) or like jumping around (from one click to another, as the links appear). You can do this in the written media, of course. But the difference is striking. So the internet changed how I write. You don't write the same way for a website as you do for a script or a play."
= The internet as a research tool
Murray Suid is a writer of educational books and material living in Palo Alto, in the heart of Silicon Valley. He has also written books for kids, multimedia scripts and screenplays. How did using the internet change his professional life? He wrote in September 1998: "The internet has become my major research tool, largely - but not entirely - replacing the traditional library and even replacing person-to-person research. Now, instead of phoning people or interviewing them face to face, I do it via email. Because of speed, it has also enabled me to collaborate with people at a distance, particularly on screenplays. (I've worked with two producers in Germany.) Also, digital correspondence is so easy to store and organize, I find that I have easy access to information exchanged this way. Thus, emailing facilitates keeping track of ideas and materials. The internet has increased my correspondence dramatically. Like most people, I find that email works better than snail mail. My geographic range of correspondents has also increased - extending mainly to Europe. In the old days, I hardly ever did transatlantic penpalling. I also find that emailing is so easy, I am able to find more time to assist other writers with their work - a kind of a virtual writing group. This isn't merely altruistic. I gain a lot when I give feedback. But before the internet, doing so was more of an effort."
Murray was among the first authors to add a website to his books - an opportunity that many would soon adopt: "If a book can be web-extended (living partly in cyberspace), then an author can easily update and correct it, whereas otherwise the author would have to wait a long time for the next edition, if indeed a next edition ever came out. (…) I do not know if I will publish books on the web - as opposed to publishing paper books. Probably that will happen when books become multimedia. (I currently am helping develop multimedia learning materials, and it is a form of teaching that I like a lot - blending text, movies, audio, graphics, and - when possible - interactivity)."
He added in August 1999: "In addition to 'web-extending' books, we are now web-extending our multimedia (CD-ROM) products - to update and enrich them."
In October 2000, "our company - EDVantage Software - has become an internet company instead of a multimedia (CD-ROM) company. We deliver educational material online to students and teachers."
= The internet as a novel "character"
Alain Bron lives in Paris, France. He is a consultant in information systems and a writer. The internet is one of the "characters" of his second novel, "Sanguine sur toile" (Sanguine on the web), available in print from Editions du Choucas in 1999, and in PDF format from Editions 00h00 in 2000.
Alain wrote in November 1999: "In French, 'toile' means the web as well as the canvas of a painting, and 'sanguine' is the red chalk of a drawing as well as one of the adjectives derived from blood ('sang' in French). But would a love of colors justify a murder? 'Sanguine sur toile' is the strange story of an internet surfer caught up in an upheaval inside his own computer, which is being remotely operated by a very mysterious person whose only aim is revenge. I wanted to take the reader into the worlds of painting and enterprise, which intermingle, escaping and meeting up again in the dazzle of software. The reader is invited to try to untangle for himself the threads twisted by passion alone. To penetrate the mystery, he will have to answer many questions. Even with the world at his fingertips, isn't the internet surfer the loneliest person in the world? In view of the competition, what is the greatest degree of violence possible in an enterprise these days? Does painting tend to reflect the world or does it create another one? I also wanted to show that images are not that peaceful. You can use them to take action, even to kill."
What part does the internet play in his novel? "The internet is a character in itself. Instead of being described in its technical complexity, it is depicted as a character that can be either threatening, kind or amusing. Remember the computer screen has a dual role - displaying as well as concealing. This ambivalence is the theme throughout. In such a game, the big winner is of course the one who knows how to free himself from the machine's grip and put humanism and intelligence before everything else."
= The web and its hyperlinks
Like many artists, Jean-Paul began searching how hyperlinks could expand his writing towards new directions. He switched from being a print author to being an hypermedia author, and created "Cotres furtifs" (Furtive Cutters) as a website "telling stories in 3D". He enjoyed the freedom given by online (self-)publishing, and wrote in August 1999: "The internet allows me to do without intermediaries, such as record companies, publishers and distributors. Most of all, it allows me to crystallize what I have in my head: the print medium (desktop publishing, in fact) only allows me to partly do that."
He also insisted on the growing interaction between digital literature and technology. "The future of cyber-literature, techno-literature, digital literature or whatever you want to call it, is set by the technology itself. It is now impossible for an author to handle all by himself the words and their movement and sound. A decade ago, you could know well each of Director, Photoshop or Cubase (to cite just the better known software), using the first version of each. That is not possible any more. Now we have to know how to delegate, find more solid financial partners than Gallimard, and look in the direction of Hachette-Matra, Warner, the Pentagon and Hollywood. At best, the status of multimedia director (?) will be the one of video director, film director, manager of the product. He is the one who receives the golden palms at Cannes, but who would never have been able to earn them just on his own. As twin sister (not a clone) of the cinematograph, cyber- literature (video + the link) will be an industry, with a few isolated craftsmen on the outer edge (and therefore with below- zero copyright)."
Jean-Paul added in June 2004: "Surfing the web is like radiating in all directions (I am interested in something and I click on all the links on a home page) or like jumping around (from one click to another, as the links appear). You can do this in the written media, of course. But the difference is striking. So the internet changed how I write. You don't write the same way for a website as you do for a script or a play. (…)
In fact, it is not the internet which changed how I write, it is the first Mac that I discovered through the self-learning of HyperCard. I still remember how astonished I was during the month when I was learning about buttons, links, surfing by analogies, objects or images. The idea that a simple click on one area of the screen allowed me to open a range of piles of cards, and each card could offer new buttons and each button opened on to a new range, etc. In brief, the learning of everything on the web that today seems really banal, for me it was a revelation (it seems Steve Jobs and his team had the same shock when they discovered the ancestor of the Mac in the laboratories of Rank Xerox). Since then I write directly on the screen: I use the print medium only occasionally, to fix up a text, or to give somebody who is allergic to the screen a kind of photograph, something instantaneous, something approximate. It is only an approximation, because print forces us to have a linear relationship: the text is developing page after page (most of the time), whereas the technique of links allows another relationship to the time and space of imagination. And, for me, it is above all the opportunity to put into practice this reading/writing 'cycle', whereas leafing through a book gives only an idea - which is vague because the book is not conceived for that."
2005: GOOGLE GETS INTERESTED IN EBOOKS
= [Overview]
The beta version of Google Print went live in May 2005. In October 2004, Google launched the first part of Google Print as a project aimed at publishers, for internet users to be able to see excerpts from their books and order them online. In December 2004, Google launched the second part of Google Print as a project intended for libraries, to build up a world digital library by digitizing the collections of main partner libraries. In August 2005, Google Print was stopped until further notice because of lawsuits filed by associations of authors and publishers for copyright infringement. The program resumed in August 2006 under the new name of Google Books. Google Books has offered books digitized in the participating libraries (Harvard, Stanford, Michigan, Oxford, California, Virginia, Wisconsin-Madison, Complutense of Madrid and New York Public Library), with either the full text for public domain books or excerpts for copyrighted books. Google settled a lawsuit with associations of authors and publishers in October 2008, with an agreement to be signed in 2009.
= Google Print
In October 2004, Google launched the first part of Google Print as a project aimed at publishers, for internet users to be able to see excerpts from their books and order them online. In December 2004, Google launched the second part of Google Print as a project intended for libraries, to build up a digital library of 15 million books by digitizing the collections of main partner libraries, beginning with the universities of Michigan (7 million books), Harvard, Stanford and Oxford, and the New York Public Library. The planned cost in 2004 was an average of US $10 per book, and a total budget of $150 to $200 million for ten years. The beta version of Google Print went live in May 2005. In August 2005, Google Print was stopped until further notice because of lawsuits filed by associations of authors and publishers for copyright infringement.
= Google Books
The program resumed in August 2006 under the new name of Google Books. Google Books has offered excerpts from books digitized by Google in the participating libraries - that now included Harvard, Stanford, Michigan, Oxford, California, Virginia, Wisconsin-Madison, Complutense of Madrid and New York Public Library. Google Books provided the full text for public domain books and excerpts for copyrighted books. According to some media buzz, Google was scanning 3,000 books a day.
The inclusion of copyrighted works in Google Books was widely criticized by authors and publishers worldwide. In the U.S., lawsuits were filed by the Authors Guild and the Association of American Publishers (AAP) for alleged copyright infringement. The assumption was that the full scanning and digitizing of copyrighted books infringed copyright laws, even if only snippets were made freely available. Google replied this was "fair use", referring to short excerpts from copyrighted books that could be lawfully quoted in another book or website, as long as the source (author, title, publisher) was mentioned. After three years of conflict, Google reached a settlement with the associations of authors and publishers in October 2008, with an agreement to be signed in 2009.
As of December 2008, Google had 24 library partners, including
a Swiss one (University Library of Lausanne), a French one
(Lyon Municipal Library), a Belgian one (Ghent University
Library), a German one (Bavarian State Library), two Spanish
ones (National Library of Catalonia and University Complutense
of Madrid) and a Japanese one (Keio University Library). The
U.S. partner libraries were, by alphabetical order: Columbia
University, Committee on Institutional Cooperation (CIC),
Cornell University Library, Harvard University, New York Public
Library, Oxford University, Princeton University, Stanford
University, University of California, University of Michigan,
University of Texas at Austin, University of Virginia and
University of Wisconsin-Madison.
2006: TOWARDS A WORLD PUBLIC DIGITAL LIBRARY
= [Overview]
Conceived by the Internet Archive to offer a universal public digital library, the Open Content Alliance (OCA) was launched in October 2005 as a group of cultural, technology, non profit and governmental organizations willing to build a permanent archive of multilingual digitized text and multimedia content. The project took off in 2006, with the digitization of public domain books around the world. Unlike Google Books, the Open Content Alliance (OCA) has made them searchable through any web search engine, and has not scanned copyrighted books, except when the copyright holder has expressly given permission. The first contributors to OCA were the University of California, the University of Toronto, the European Archive, the National Archives in United Kingdom, O’Reilly Media and the Prelinger Archives. The digitized collections are freely available in the Text Archive section of the Internet Archive. In December 2008, one million ebooks were posted under OCA principles by the Internet Archive.
= [In Depth]
The Internet Archive and Yahoo! conceived the Open Content Alliance (OCA) in early 2005 to offer broad public access to the world culture. The OCA also wanted to address the issues of the Google Book project, with its copyright issues and its availability from one search engine only. The OCA was launched with the goal of digitizing only public domain books and making them searchable and downloadable through any search engine.
What exactly is the Internet Archive? Founded in April 1996 by Brewster Kahle, the Internet Archive is a non-profit organization that has built an "internet library" to offer permanent access to historical collections in digital format for researchers, historians and scholars. An archive of the web is stored every two months or so. In late 1999, the Internet Archive started to include more collections of archived webpages on specific topics. It also became an online digital library of text, audio, software, image and video content. In October 2001, with 30 billion stored webpages, the Internet Archive launched the Wayback Machine, for users to be able to surf the archive of the web by date. In 2004, there were 300 terabytes of data, with a growth of 12 terabytes per month. There were 65 billion pages (from 50 million websites) in 2006 and 85 million pages in 2008. The Internet Archive now defines itself as "a nonprofit digital library dedicated to providing universal access to human knowledge."
In October 2005, the Internet Archive launched the Open Content Alliance (OCA) with other contributors as a collective effort for "building a digital archive of global content for universal access" (subtitle of the OCA home page) that would be a permanent repository of multilingual text and multimedia content.
As explained on its website in 2007, the OCA "is a collaborative effort of a group of cultural, technology, nonprofit, and governmental organizations from around the world that helps build a permanent archive of multilingual digitized text and multimedia material. An archive of contributed material is available on the Internet Archive website and through Yahoo! and other search engines and sites. The OCA encourages access to and reuse of collections in the archive, while respecting the content owners and contributors."
The project aims at digitizing public domain books around the world and make them searchable through any web search engine and downloadable for free. Unlike Google Books, the OCA scans and digitizes only public domain books, except when the copyright holder has expressly given permission. The first contributors to the OCA were the University of California, the University of Toronto, the European Archive, the National Archives in United Kingdom, O’Reilly Media and Prelinger Archives. The digitized collections are freely available in the Text Archive section of the Internet Archive. 100,000 ebooks were publicly available in December 2006 (with 12,000 new ebooks added per month), 200,000 ebooks in May 2007, and one million ebooks in December 2008.
Microsoft has been one of the partners of the OCA, while also developing its own project. The beta version of Live Search Books was released in December 2006, with a search possible by keyword for non copyrighted books digitized by Microsoft in partner libraries. The British Library and the libraries of the universities of California and Toronto were the first ones to join in, followed in January 2007 by the New York Public Library and Cornell University. Books offered full text views and could be downloaded in PDF files. In May 2007, Microsoft announced agreements with several publishers, including Cambridge University Press and McGraw Hill, for their books to be available in Live Search Books. After digitizing 750,000 books and indexing 80 million journal articles, Microsoft ended the Live Search Books program in May 2008, to focus on other activities, and closed the website. These books are available in the OCA collections of the Internet Archive.