Shared posts

20 Apr 17:25

Congratulations to Heather Moulaison Sandy, winner of the 2018 LITA/Library Hi Tech Award

by Jenny Levine

Heather Moulaison Sandy has been named the winner of the 2018 LITA/Library Hi Tech Award for Outstanding Communication in Library and Information Technology. Emerald Publishing and the Library and Information Technology Association (LITA) sponsor the Award, which recognizes outstanding individuals or institutions for their long-term contributions in the area of Library and Information Science technology and its application.

The Award Committee selected Moulaison Sandy because it was impressed with her extensive contributions to ongoing professional development across the discipline., which include five books and more than 25 peer-reviewed journal articles. Her work has been presented at over 100 local, national, and international venues in nearly 15 countries as well as at numerous online webinars and talks.

Moulaison Sandy is Associate Professor at the iSchool at the University of Missouri and works primarily at the intersection of the organization of information and the online environment. She is a recipient of this year’s JRLYA/YALSA Writing Award, as well as the ALISE/OCLC 2016 Research Grant and the ALA Carnegie Whitney  2016 grant.

An avid Francophile and traveler, she was named an Associated Researcher at the French national school for library and information science (Enssib) in 2014, and received a Fulbright Senior Scholar grant in 2008-2009 to teach at l’Ecole des sciences de l’information in Morocco. She holds a PhD in Information Science from Rutgers and an MSLIS and MA in French, both from the University of Illinois at Urbana-Champaign.

When notified she was this year’s recipient, Moulaison Sandy said, “Receiving this award is a true honor, and I am thrilled to join the ranks of LITA/Library Hi Tech award recipients whose work I admire so much.” She will receive a citation and a $1,000 stipend.

Members of the 2018 LITA/Library Hi-Tech Award Committee are: Dr. Patrick T. Colegrove (Chair), Vanessa L. Ames (Past Chair), Holli Kubly, and Christina D. Mune.

Thank you to Emerald Publishing for sponsoring this award.

Emerald Publishing Group logo

 

19 Apr 23:12

LITA: Congratulations to Heather Moulaison Sandy, winner of the 2018 LITA/Library Hi Tech Award

by Jenny Levine

Heather Moulaison Sandy has been named the winner of the 2018 LITA/Library Hi Tech Award for Outstanding Communication in Library and Information Technology. Emerald Publishing and the Library and Information Technology Association (LITA) sponsor the Award, which recognizes outstanding individuals or institutions for their long-term contributions in the area of Library and Information Science technology and its application.

The Award Committee selected Moulaison Sandy because it was impressed with her extensive contributions to ongoing professional development across the discipline., which include five books and more than 25 peer-reviewed journal articles. Her work has been presented at over 100 local, national, and international venues in nearly 15 countries as well as at numerous online webinars and talks.

Moulaison Sandy is Associate Professor at the iSchool at the University of Missouri and works primarily at the intersection of the organization of information and the online environment. She is a recipient of this year’s JRLYA/YALSA Writing Award, as well as the ALISE/OCLC 2016 Research Grant and the ALA Carnegie Whitney  2016 grant.

An avid Francophile and traveler, she was named an Associated Researcher at the French national school for library and information science (Enssib) in 2014, and received a Fulbright Senior Scholar grant in 2008-2009 to teach at l’Ecole des sciences de l’information in Morocco. She holds a PhD in Information Science from Rutgers and an MSLIS and MA in French, both from the University of Illinois at Urbana-Champaign.

When notified she was this year’s recipient, Moulaison Sandy said, “Receiving this award is a true honor, and I am thrilled to join the ranks of LITA/Library Hi Tech award recipients whose work I admire so much.” She will receive a citation and a $1,000 stipend.

Members of the 2018 LITA/Library Hi-Tech Award Committee are: Dr. Patrick T. Colegrove (Chair), Vanessa L. Ames (Past Chair), Holli Kubly, and Christina D. Mune.

Thank you to Emerald Publishing for sponsoring this award.

Emerald Publishing Group logo

 

19 May 13:41

Are uncompressed files better for preservation?

by Gary McGath

How big a concern is physical degradation of files, aka “bit rot,” to digital preservation? Should archives eschew data compression in order to minimize the effect of lost bits? In most of my experience, no one’s raised that as a major concern, but some contributors to the TI/A initiative consider it important enough to affect their recommendations.

Damaged Image File

Damaged Image file, Atlas of Digital Damages, placed on Flickr by Paul Wheatley. (CC BY-NC-SA 2.0)

Files can go bad, sometimes just by flipping a few bits. This can happen in the file system, the file header, the metadata, the structural elements, or the content data. Depending on where it happens, changing one bit can make the file unrenderable, degrade the image, or have no effect at all. The usual solution to this risk is digests and backups. The archive computes a digest, such as MD5 or SHA-1, of the file and stores it. When someone retrieves the file, the software recomputes its digest. If it doesn’t match, it warns the user that the file is damaged, and then it’s necessary to recover the backup copy. Not counting catastrophes that ruin whole files, the odds of file damage are low in a decent storage system, and the odds of the original and backup both being damaged are much lower.

Some people in the TI/A discussion argue against accepting compressed files as archival quality TIFF, because of their greater susceptibility to bit rot. In an uncompressed file that isn’t tiny, most of the data will be pixels, and flipping a bit will most likely just change a single pixel. Flipping a bit in a compressed data stream can mess up the decompression algorithm so that a large part of the image is damaged, or the application may crash. The argument is that a slightly damaged file is better than a seriously damaged one.

This theory looks like a bad one to me. First, it implies that the archive will trust damaged files to some extent. An uncompressed file with bit damage may just have a bad pixel, but the damage could be in the file header, the tags, or the ICC profile, seriously damaging the file or making it unusable. Second, the risk of bit damage to an uncompressed file is greater, simply because it’s bigger. At the same time, it takes up more storage space, so the archive can’t do as much backing up on a given budget. Lossless compression (LZW or ZIP) often reduces a file to less than half its original size, which means that an original file and a backup can be stored in the same amount of space as an uncompressed file.

Not all compression is equal. Disallowing lossy compression in archival TIFF files may make sense for other reasons, and TIFF’s original JPEG compression scheme is deprecated. But insisting on uncompressed files to improve their ability to withstand bit rot strikes me as a foolish precaution.


Tagged: compression, preservation, TIFF
06 May 15:29

veraPDF 0.14 released with launch of demo website

by admin
veraPDF-logo-600-300x149
Soon available also on the PREFORMA Open Source Portal
We are pleased to announce the latest release of veraPDF, the definitive, open source PDF/A validator. Version 0.14 features Transparency and Unicode character map validation in PDF/A-2 levels B and U. This is the first release of the final design phase which began on 19 April following the PREFORMA Project EC review at the Open Source Workshop. Please support our efforts by downloading and testing the software. Continue reading →
04 Feb 16:17

Memorial University of Newfoundland selects Ex Libris Solutions, including Rosetta

by Chris Erickson
Edward M. Corrado

Rosetta customer base is growing!

Memorial University of Newfoundland selects Ex Libris Solutions, including Rosetta. Press Release. Ex Libris. January 28, 2015.
The Memorial University in Newfoundland and Labrador, Canada has adopted a suite of Ex Libris solutions comprised of the Alma library management solution, the Primo discovery and delivery solution, and the Rosetta digital asset management and preservation system. These solutions replace multiple disparate legacy systems used by the Library.

Rosetta will enable Memorial University to manage and preserve its important collections of Newfoundland's history, including a huge collection of digitized newspapers. Using the Primo search interface for physical collections, digital and digitized assets, and electronic resources, Memorial will provide a seamless discovery experience to users, whatever their learning and teaching needs. 


01 Feb 19:51

University of Arizona selects Ex Libris Rosetta.

by Chris Erickson
Edward M. Corrado

Rosetta customer base is growing!

University of Arizona selects Ex Libris Rosetta. Press Release. Ex Libris. January 27, 2015.
The University of Arizona has adopted the Rosetta digital management and preservation solution. Rosetta will help the university provide sustained access to scholarly digital content and research to both university members and the broader academic community.

"After evaluating a number of commercial digital preservation systems, we found that Rosetta had the unique capabilities that Arizona requires. Our priorities for 2015 led us to seek a preservation solution that could be used collaboratively by a number of campuses. Rosetta's ability to provide end-to-end digital asset management and preservation for the vast array of assets and research data that the university possesses, its consortial architecture that allows participating institutions to maintain a degree of autonomy, and its ability to act as a transitional component between multiple display layers, made it the clear choice for Arizona."

27 Jan 03:06

Fine Arts Librarian (Binghamton University Libraries, New York)

Fine Arts Librarian (Binghamton University Libraries, New York)
Binghamton University Libraries, Binghamton, New York, is currently accepting applications for a Fine Arts Librarian. Binghamton University is part of the State University of New York (SUNY) system and is located in upstate New York. This tenure-track library faculty position develops and manages the print and digital Fine Arts collections in support of teaching and research in Art, Art History, and Music; provides reference and instructional services; and represents the Libraries to these departments. Required qualifications include an ALA-accredited MLS or equivalent and recent experience in one of the Fine Arts disciplines. Salary and rank will be commensurate with qualifications and experience. Excellent benefits, including TIAA/CREF. Applications Review of applications will begin on March 2, 2015 and continue until the position is filled. For full qualifications, application instructions, and additional information, visit our website at  www.binghamton.edu/libraries/about/employment/faculty.html. Binghamton University is an Equal Opportunity/Affirmative Action Employer.
27 Jan 03:06

Digital Initiatives Librarian (Binghamton University Libraries, New York)

Digital Initiatives Librarian (Binghamton University Libraries, New York)
Binghamton University Libraries, Binghamton, New York, are currently accepting applications for a Digital Initiatives Librarian. Binghamton University is part of the State University of New York (SUNY) system and is located in upstate New York. This tenure-track library faculty position will collaborate in the planning, implementation and monitoring of digital projects including digital curation, preservation and digital exhibits. Required Qualifications include an ALA-accredited MLS or equivalent and knowledge of and experience with current trends in digital preservation, experience developing web applications, and strong UNIX or Linux skills. Salary and rank will be commensurate with qualifications and experience. Excellent benefits, including TIAA/CREF. Applications Review of applications will begin on March 2, 2015 and continue until the position is filled. For full qualifications, application instructions, and additional information, visit our website at  www.binghamton.edu/libraries/about/employment/faculty.html. Binghamton University is an Equal Opportunity/Affirmative Action Employer.
27 Jan 02:56

Fine Arts Librarian

by rss@higheredjobs.com (HigherEdJobs)
Edward M. Corrado

Work @ BINGHAMTON!

Binghamton University (Binghamton, NY)
27 Jan 02:55

Digital Initiatives Librarian

by rss@higheredjobs.com (HigherEdJobs)
Edward M. Corrado

Work at Binghamton!

Binghamton University (Binghamton, NY)
23 Dec 15:13

CFP: Special Issue on Diversity in Library Technology (Code4Lib Journal)

by Corey Seeman
The Code4Lib Journal (C4LJ) exists to foster community and share information among those interested in the intersection of libraries, technology, and the future.
We are now accepting proposals for publication in our 28th issue, a special issue on diversity in library technology. Discussions on the Code4Lib listserv and keynotes by Valerie Aurora and Sumana Harihareswara at Code4Lib 2014 show that diversity is a topic of ongoing importance to the Code4Lib community. A recent editorial in the Code4Lib Journal by Ron Peterson originally sparked discussion of the idea for a special issue among the journal’s editorial committee; the demographic breakdown of both the author community and the committee itself laid bare the fact that diversity is a major challenge even in communities that are highly supportive. With this in mind, the C4LJ editorial committee hopes that this special issue will further the conversation around this important topic, while also encouraging a greater diversity amongst the Journal’s contributors for this and future issues.
C4LJ encourages creativity and flexibility, and the editors welcome submissions across a broad variety of topics that support the mission of the journal. For this issue, we would like to consider perspectives and topics that may not have been considered in-depth in the past, in the spirit of being open to diverse uses, interpretations, and needs of technology. In the context of structural inequalities and group/individual experiences (e.g. based on country, gender, race, ethnicity, class, disability, age, sexual orientation, etc.) people perceive, experience, and create technologies in different ways. It will strengthen our libraries if we enjoy and engage with these differences.
Possible topics could include, but are not limited to:
– Attracting and retaining diverse technology teams
– Implementing a code of conduct and/or assessing its efficacy
– Designing for accessibility
– Partnerships to foster inclusivity in the field
– Library tech programming for underserved populations
– Inclusive project management and communication
– Surfacing diverse items in digital libraries
– Digital projects and programs involving outreach to diverse communities
– International perspectives on library technology and access
– Intersections of social justice and library technology
– Theoretical consideration of digitally sharing information (e.g. big data, crowd work, surveillance, privacy) for different groups
– Critical examination of technology trends, and how they are perceived or adopted, by different groups
C4LJ strives to promote professional communication by minimizing the barriers to publication. While articles should be of a high quality, they need not follow any formal structure. Writers should aim for the middle ground between blog posts and articles in traditional refereed journals. Where appropriate, we encourage authors to submit code samples, algorithms, and pseudo-code. For more information, visit C4LJ’s Article Guidelines or browse articles from the first 26 issues published on our website: http://journal.code4lib.org.
Don’t miss out on this opportunity to share your ideas and experiences. To be included in the 28th issue, which is scheduled for publication in April 2015, please submit articles, abstracts, or proposals athttp://journal.code4lib.org/submit-proposal or to journal@code4lib.org by January 12, 2015. When submitting, please include the title or subject of the proposal in the subject line of the email message and the acceptance of the Journal’s US CC-By 3.0 license in the body of the message. The editorial committee will review all proposals and notify those accepted by January 19, 2015. Please note that submissions are subject to rejection or postponement at any point in the publication process as determined by the Code4Lib Journal’s editorial committee.
Send in a submission. Your peers would like to hear what you are doing.
12 Dec 22:40

Documents

Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Untitled.doc
11 Dec 21:08

Talk at Fall CNI

by David.
I gave a talk at the Fall CNI meeting entitled Improving the Odds of Preservation, with the following abstract:
Attempts have been made, for various types of digital content, to measure the probability of preservation. The consensus is about 50%. Thus the rate of loss to future readers from "never preserved" vastly exceeds that from all other causes, such as bit rot and format obsolescence. Will persisting with current preservation technologies improve the odds of preservation? If not, what changes are needed to improve them?
It covered much of the same material as Costs: Why Do We Care, with some differences in emphasis. Below the fold, the text with links to the sources.

Introduction

I'm David Rosenthal from the LOCKSS (Lots Of Copies Keep Stuff Safe) Program at the Stanford University Libraries. As with all my talks, you don't need to take notes or ask for the slides. The text of the talk, with links to the sources, will go up on my blog shortly.

One of the preservation networks that the LOCKSS Program operates is the CLOCKSS archive, a large dark archive of e-journals and e-books. We operate it under contract to a not-for-profit organization jointly run by publishers and libraries. Earlier this year we completed a more than year-long process that resulted in the CLOCKSS Archive being certified to the Trusted Repository Audit Criteria (TRAC) by CRL. We equalled the previous highest score and gained the first-ever perfect score for Technology. At documents.clockss.org you will find all the non-confidential material upon which the auditors based their assessment. And on my blog you will find posts announcing the certification, describing the process we went through, discussing the lessons learned, and describing how you can run the demos we put on for the auditors.

Although CRL's certification was to TRAC, the documents include a finding aid structured according to ISO16363, the official ISO standard that is superseding TRAC. If you look at the finding aid or at the ISO16363 documents you will see that many of the criteria are concerned with economic sustainability. Among the confidential materials the auditors requested were "Budgets for last three years and projections for next two showing revenue and expenses".

We actually gave them five-year projections. This is an area where we had a good story to tell. The LOCKSS Program got started with grant funds from the NSF, the Andrew W. Mellon Foundation, and Sun Microsystems. But grant funding isn't a sustainable basis for long-term preservation. In 2005, the Mellon Foundation gave us a 2-year grant which we had to match, and after which we had to be off grant funding. For more than 7 years we have been in the black without grant funding. The LOCKSS software is free open source, the LOCKSS team charges for support and services.

Achieving this economic sustainability has required a consistent focus on minimizing the cost of every aspect of our operations. Because the LOCKSS system's Lots Of Copies trades using more disk space for using less of other resources (especially lawyers), I have been researching in particular the costs of storage for some years. In what follows I want to look at the big picture of digital preservation costs and their implications. It is in three sections:
  • The current situation.
  • Cost trends.
  • What can be done?

The Current Situation

How well are we doing at the task of preservation? Attempts have been made to measure the probability that content is preserved in some areas; e-journals, e-theses and the surface Web:
  • In 2010 the ARL reported that the median research library received about 80K serials. Stanford's numbers support this. The Keepers Registry, across its 8 reporting repositories, reports just over 21K "preserved" and about 10.5K "in progress". Thus under 40% of the median research library's serials are at any stage of preservation.
  • Luis Faria and co-authors (PDF) compare information extracted from journal publisher's web sites with the Keepers Registry and conclude:
    We manually repeated this experiment with the more complete Keepers Registry and found that more than 50% of all journal titles and 50% of all attributions were not in the registry and should be added.
  • The Hiberlink project studied the links in 46,000 US theses and determined that about 50% of the linked-to content was preserved in at least one Web archive.
  • Scott Ainsworth and his co-authors tried to estimate the probability that a publicly-visible URI was preserved, as a proxy for the question "How Much of the Web is Archived?" They generated lists of "random" URLs using several different techniques including sending random words to search engines and random strings to the bit.ly URL shortening service. They then:
    • tried to access the URL from the live Web.
    • used Memento to ask the major Web archives whether they had at least one copy of that URL.
    Their results are somewhat difficult to interpret, but for their two more random samples they report:
    URIs from search engine sampling have about 2/3 chance of being archived [at least once] and bit.ly URIs just under 1/3.
So, are we preserving half the stuff that should be preserved? Unfortunately, there are a number of reasons why this simplistic assessment is wildly optimistic.

An Optimistic Assessment

First, the assessment isn't risk-adjusted:
  • As regards the scholarly literature librarians, who are concerned with post-cancellation access not with preserving the record of scholarship, have directed resources to subscription rather than open-access content, and within the subscription category, to the output of large rather than small publishers. Thus they have driven resources towards the content at low risk of loss, and away from content at high risk of loss. Preserving Elsevier's content makes it look like a huge part of the record is safe because Elsevier publishes a huge part of the record. But Elsevier's content is not at any conceivable risk of loss, and is at very low risk of cancellation*, so what have those resources achieved for future readers?
  • As regards Web content, the more links to a page, the more likely the crawlers are to find it, and thus, other things such as robots.txt being equal, the more likely it is to be preserved. But equally, the less at risk of loss.
Second, the assessment isn't adjusted for difficulty:
  • A similar problem of risk-aversion is manifest in the idea that different formats are given different "levels of preservation". Resources are devoted to the formats that are easy to migrate. But precisely because they are easy to migrate, they are at low risk of obsolescence.
  • The same effect occurs in the negotiations needed to obtain permission to preserve copyright content. Negotiating once with a large publisher gains a large amount of low-risk content, where negotiating once with a small publisher gains a small amount of high-risk content.
  • Similarly, the web content that is preserved is the content that is easier to find and collect. Smaller, less linked web-sites are probably less likely to survive.
Harvesting the low-hanging fruit directs resources away from the content at risk of loss.

Third, the assessment is backward-looking:
  • As regards scholarly communication it looks only at the traditional forms, books, theses and papers. It ignores not merely published data, but also all the more modern forms of communication scholars use, including workflows, source code repositories, and social media. These are mostly both at much higher risk of loss than the traditional forms that are being preserved, because they lack well-established and robust business models, and much more difficult to preserve, since the legal framework is unclear and the content is either much larger, or much more dynamic, or in some cases both.
  • As regards the Web, it looks only at the traditional, document-centric surface Web rather than including the newer, dynamic forms of Web content and the deep Web.
Fourth, the assessment is likely to suffer measurement bias:
  • The measurements of the scholarly literature are based on bibliographic metadata, which is notoriously noisy. In particular, the metadata was apparently not de-duplicated, so there will be some amount of double-counting in the results.
  • As regards Web content, Ainsworth et al describe various forms of bias in their paper.
As Cliff Lynch pointed out in his summing-up of the 2014 IDCC conference, the scholarly literature and the surface Web are genres of content for which the denominator of the fraction being preserved (the total amount of genre content) is fairly well known, even if it is difficult to measure the numerator (the amount being preserved). For many other important genres, even the denominator is becoming hard to estimate as the Web enables a variety of distribution channels:
  • Books used to be published through well-defined channels that assigned ISBNs, but now e-books can appear anywhere on the Web.
  • YouTube and other sites now contain vast amounts of video, some of which represents what in earlier times would have been movies.
  • Much music now happens on YouTube (e.g. Pomplamoose)
  • Scientific data is exploding in both size and diversity, and despite efforts to mandate its deposit in managed repositories much still resides in grad students laptops.
Of course, "what we should be preserving" is a judgement call, but clearly even purists who wish to preserve only stuff to which future scholars will undoubtedly require access would be hard pressed to claim that half that stuff is preserved.

Preserving the Rest

Overall, its clear that we are preserving much less than half of the stuff that we should be preserving. What can we do to preserve the rest of it?
  • We can do nothing, in which case we needn't worry about bit rot, format obsolescence, and all the other risks any more because they only lose a few percent. The reason why more than 50% of the stuff won't make it to future readers would be can't afford to preserve.
  • We can more than double the budget for digital preservation. This is so not going to happen; we will be lucky to sustain the current funding levels.
  • We can more than halve the cost per unit content. Doing so requires a radical re-think of our preservation processes and technology.
Such a radical re-think requires understanding where the costs go in our current preservation methodology, and how they can be funded. As an engineer, I'm used to using rules of thumb. The one I use to summarize most of the research into past costs is that ingest takes half the lifetime cost, preservation takes one third, and access takes one sixth.

On this basis, one would think that the most important thing to do would be to reduce the cost of ingest. It is important, but not as important as you might think. The reason is that ingest is a one-time, up-front cost. As such, it is relatively easy to fund. In principle, research grants, author page charges, submission fees and other techniques can transfer the cost of ingest to the originator of the content, and thereby motivate them to explore the many ways that ingest costs can be reduced. But preservation and dissemination costs continue for the life of the data, for "ever". Funding a stream of unpredictable payments stretching into the indefinite future is hard. Reductions in preservation and dissemination costs will have a much bigger effect on sustainability than equivalent reductions in ingest costs.

Cost Trends

We've been able to ignore this problem for a long time, for two reasons. From at least 1980 to 2010 storage costs followed Kryder's Law, the disk analog of Moore's Law, dropping 30-40%/yr. This meant that, if you could afford to store the data for a few years, the cost of storing it for the rest of time could be ignored, because of course Kryder's Law would continue forever. The second is that as the data got older, access to it was expected to become less frequent. Thus the cost of access in the long term could be ignored.

But can we continue to ignore these problems?

Preservation

Kryder's Law held for three decades, an astonishing feat for exponential growth. Something that goes on that long gets built into people's model of the world, but as Randall Munroe points out, in the real world exponential curves cannot continue for ever. They are always the first part of an S-curve.

This graph, from Preeti Gupta of UC Santa Cruz, plots the cost per GB of disk drives against time. In 2010 Kryder's Law abruptly stopped. In 2011 the floods in Thailand destroyed 40% of the world's capacity to build disks, and prices doubled. Earlier this year they finally got back to 2010 levels. Industry projections are for no more than 10-20% per year going forward (the red lines on the graph). This means that disk is now about 7 times as expensive as was expected in 2010 (the green line), and that in 2020 it will be between 100 and 300 times as expensive as 2010 projections.

These are big numbers, but do they matter? After all, preservation is only about one-third of the total. and only about one-third of that is media costs.

Our models of the economics of long-term storage compute the endowment, the amount of money that, deposited with the data and invested at interest, would fund its preservation "for ever". This graph, from my initial rather crude prototype model, is based on hardware cost data from Backblaze and running cost data from the San Diego Supercomputer Center (much higher than Backblaze's) and Google. It plots the endowment needed for three copies of a 117TB dataset to have a 95% probability of not running out of money in 100 years, against the Kryder rate (the annual percentage drop in $/GB). The different curves represent policies of keeping the drives for 1,2,3,4,5 years. Up to 2010, we were in the flat part of the graph, where the endowment is low and doesn't depend much on the exact Kryder rate. This is the environment in which everyone believed that long-term storage was effectively free.

But suppose the Kryder rate were to drop below about 20%/yr. We would be in the steep part of the graph, where the endowment needed is both much higher and also strongly dependent on the exact Kryder rate.

We don't need to suppose. Preeti's graph and industry projections show that now and for the foreseeable future we are in the steep part of the graph. What happened to slow Kryder's Law? There are a lot of factors, we outlined many of them in a paper for UNESCO's Memory of the World conference (PDF). Briefly, both the disk and tape markets have consolidated to a couple of vendors, turning what used to be a low-margin, competitive market into one with much better margins. Each successive technology generation requires a much bigger investment in manufacturing, so requires bigger margins, so drives consolidation. And the technology needs to stay in the market longer to earn back the investment, reducing the rate of technological progress.

Thanks to aggressive marketing, it is commonly believed that "the cloud" solves this problem. Unfortunately, cloud storage is actually made of the same kind of disks as local storage, and is subject to the same slowing of the rate at which it was getting cheaper. In fact, when all costs are taken in to account, cloud storage is not cheaper for long-term preservation than doing it yourself once you get to a reasonable scale. Cloud storage really is cheaper if your demand is spiky, but digital preservation is the canonical base-load application.

You may think that the cloud is a competitive market; in fact it is dominated by Amazon.
Jillian Mirandi, senior analyst at Technology Business Research Group (TBRI), estimated that AWS will generate about $4.7 billion in revenue this year, while comparable estimated IaaS revenue for Microsoft and Google will be $156 million and $66 million, respectively.
When Google recently started to get serious about competing, they pointed out that Amazon's margins may have been minimal at introduction, by then they were extortionate:
cloud prices across the industry were falling by about 6 per cent each year, whereas hardware costs were falling by 20 per cent. And Google didn't think that was fair. ... "The price curve of virtual hardware should follow the price curve of real hardware."
Notice that the major price drop triggered by Google was a one-time event; it was a signal to Amazon that they couldn't have the market to themselves, and to smaller players that they would no longer be able to compete.

In fact commercial cloud storage is a trap. It is free to put data in to a cloud service such as Amazon's S3, but it costs to get it out. For example, getting your data out of Amazon's Glacier without paying an arm and a leg takes 2 years. If you commit to the cloud as long-term storage, you have two choices. Either keep a copy of everything outside the cloud (in other words, don't commit to the cloud), or stay with your original choice of provider no matter how much they raise the rent.

Unrealistic expectations that we can collect and store the vastly increased amounts of data projected by consultants such as IDC within current budgets place currently preserved content at great risk of economic failure.

Here's a graph that illustrates the looming crisis in long-term storage, its cost. The red line is Kryder's Law, at IHS iSuppli's 20%/yr. The blue line is the IT budget, at computereconomics.com's 2%/yr. The green line is the annual cost of storing the data accumulated since year 0 at the 60% growth rate projected by IDC,** all relative to the value in the first year. 10 years from now, storing all the accumulated data would cost over 20 times as much as it does this year. If storage is 5% of your IT budget this year, in 10 years it will be more than 100% of your budget. If you're in the digital preservation business, storage is already way more than 5% of your IT budget.

Dissemination

The storage part of preservation isn't the only on-going cost that will be much higher than people expect, access will be too. In 2010 the Blue Ribbon Task Force on Sustainable Digital Preservation and Access pointed out that the only real justification for preservation is to provide access. With research data this can be a real difficulty; the value of the data may not be evident for a long time. Shang dynasty astronomers inscribed eclipse observations on animal bones. About 3200 years later, researchers used these records to estimate that the accumulated clock error was about 7 hours. From this they derived a value for the viscosity of the Earth's mantle as it rebounds from the weight of the glaciers.

In most cases so far the cost of an access to an individual item has been small enough that archives have not charged the reader. Research into past access patterns to archived data showed that access was rare, sparse, and mostly for integrity checking.

But the advent of "Big Data" techniques mean that, going forward, scholars increasingly don't want to access a few individual items in a collection, they want to ask questions of the collection as a whole. For example, the Library of Congress announced that it was collecting the entire Twitter feed, and almost immediately had 400-odd requests for access to the collection. The scholars weren't interested in a few individual tweets, but in mining information from the entire history of tweets. Unfortunately, the most the Library could afford to do with the feed is to write two copies to tape. There's no way they could afford the compute infrastructure to data-mine from it. We can get some idea of how expensive this is by comparing Amazon's S3, designed for data-mining type access patterns, with Amazon's Glacier, designed for traditional archival access. S3 is currently at least 2.5 times as expensive; until recently it was 5.5 times.

Ingest

Almost everyone agrees that ingest is the big cost element. Where does the money go? The two main cost drivers appear to be the real world, and metadata.

In the real world it is natural that the cost per unit content increases through time, for two reasons. The content that's easy to ingest gets ingested first, so over time the difficulty of ingestion increases. And digital technology evolves rapidly, mostly by adding complexity. For example, the early Web was a collection of linked static documents. Its language was HTML. It was reasonably easy to collect and preserve. The language of today's Web is Javascript, and much of the content you see is dynamic. This is much harder to ingest. In order to find the links much of the collected content now needs to be executed as well as simply being parsed. This is already significantly increasing the cost of Web harvesting, both because executing the content is computationally much more expensive, and because elaborate defenses are required to protect the crawler against the possibility that the content might be malign.

It is worth noting, however, that the very first US web site in 1991 featured dynamic content, a front-end to a database!

The days when a single generic crawler could collect pretty much everything of interest are gone; future harvesting will require more and more custom tailored crawling such as we need to collect subscription e-journals and e-books for the LOCKSS Program. This per-site custom work is expensive in staff time. The cost of ingest seems doomed to increase.

Worse, the W3C's mandating of DRM for HTML5 means that the ingest cost for much of the Web's content will become infinite. It simply won't be legal to ingest it.

Metadata in the real world is widely known to be of poor quality, both format and bibliographic kinds. Efforts to improve the quality are expensive, because they are mostly manual and, inevitably, reducing entropy after it has been generated is a lot more expensive than not generating it in the first place.

What can be done?

We are preserving less than half of the content that needs preservation. The cost per unit content of each stage of our current processes is predicted to rise. Our budgets are not predicted to rise enough to cover the increased cost, let alone more than doubling to preserve the other more than half. We need to change our processes to greatly reduce the cost per unit content.

Preservation

It is often assumed that, because it is possible to store and copy data perfectly, only perfect data preservation is acceptable. There are two problems with this expectation.

To illustrate the first problem, lets examine the technical problem of storing data in its most abstract form. Since 2007 I've been using the example of "A Petabyte for a Century". Think about a black box into which you put a Petabyte, and out of which a century later you take a Petabyte. Inside the box there can be as much redundancy as you want, on whatever media you choose, managed by whatever anti-entropy protocols you want. You want to have a 50% chance that every bit in the Petabyte is the same when it comes out as when it went in.

Now consider every bit in that Petabyte as being like a radioactive atom, subject to a random process that flips it with a very low probability per unit time. You have just specified a half-life for the bits. That half-life is about 60 million times the age of the universe. Think for a moment how you would go about benchmarking a system to show that no process with a half-life less than 60 million times the age of the universe was operating in it. It simply isn't feasible. Since at scale you are never going to know that your system is reliable enough, Murphy's law will guarantee that it isn't.

Here's some back-of-the-envelope hand-waving. Amazon's S3 is a state-of-the-art storage system. Its design goal is an annual probability of loss of a data object of 10-11. If the average object is 10K bytes, the bit half-life is about a million years, way too short to meet the requirement but still really hard to measure.

Note that the 10-11 is a design goal, not the measured performance of the system. There's a lot of research into the actual performance of storage systems at scale, and it all shows them under-performing expectations based on the specifications of the media. Why is this? Real storage systems are large, complex systems subject to correlated failures that are very hard to model.

Worse, the threats against which they have to defend their contents are diverse and almost impossible to model. Nine years ago we documented the threat model we use for the LOCKSS system. We observed that most discussion of digital preservation focused on these threats:
  • Media failure
  • Hardware failure
  • Software failure
  • Network failure
  • Obsolescence
  • Natural Disaster
but that the experience of operators of large data storage facilities was that the significant causes of data loss were quite different:
  • Operator error
  • External Attack
  • Insider Attack
  • Economic Failure
  • Organizational Failure
To illustrate the second problem, consider that building systems to defend against all these threats combined is expensive, and can't ever be perfectly effective. So we have to resign ourselves to the fact that stuff will get lost. This has always been true, it should not be a surprise. And it is subject to the law of diminishing returns. Coming back to the economics, how much should we spend reducing the probability of loss?

Consider two storage systems with the same budget over a decade, one with a loss rate of zero, the other half as expensive per byte but which loses 1% of its bytes each year. Clearly, you would say the cheaper system has an unacceptable loss rate.

However, each year the cheaper system stores twice as much and loses 1% of its accumulated content. At the end of the decade the cheaper system has preserved 1.89 times as much content at the same cost. After 30 years it has preserved more than 5 times as much at the same cost.

Adding each successive nine of reliability gets exponentially more expensive. How many nines do we really need? Is losing a small proportion of a large dataset really a problem? The canonical example of this is the Internet Archive's web collection. Ingest by crawling the Web is a lossy process. Their storage system loses a tiny fraction of its content every year. Access via the Wayback Machine is not completely reliable. Yet for US users archive.org is currently the 150th most visited site, whereas loc.gov is the 1519th. For UK users archive.org is currently the 131st most visited site, whereas bl.uk is the 2744th.

Why is this? Because the collection was always a series of samples of the Web, the losses merely add a small amount of random noise to the samples. But the samples are so huge that this noise is insignificant. This isn't something about the Internet Archive, it is something about very large collections. In the real world they always have noise; questions asked of them are always statistical in nature. The benefit of doubling the size of the sample vastly outweighs the cost of a small amount of added noise. In this case more really is better.

Unrealistic expectations for how well data can be preserved make the best be the enemy of the good. We spend money reducing even further the small probability of even the smallest loss of data that could instead preserve vast amounts of additional data, albeit with a slightly higher risk of loss.

Within the next decade all current popular storage media, disk, tape and flash, will be up against very hard technological barriers. A disruption of the storage market is inevitable. We should work to ensure that the needs of long-term data storage will influence the result. We should pay particular attention to the work underway at Facebook and elsewhere that uses techniques such as erasure coding, geographic diversity, and custom hardware based on mostly spun-down disks and DVDs to achieve major cost savings for cold data at scale.

Every few months there is another press release announcing that some new,  quasi-immortal medium such as fused silica glass or stone DVDs has solved the problem of long-term storage. But the problem stays resolutely unsolved. Why is this? Very long-lived media are inherently more expensive, and are a niche market, so they lack economies of scale. Seagate could easily make disks with archival life, but they did a study of the market for them, and discovered that no-one would pay the relatively small additional cost.

The fundamental problem is that long-lived media only make sense at very low Kryder rates. Even if the rate is only 10%/yr, after 10 years you could store the same data in 1/3 the space. Since space in the data center or even at Iron Mountain isn't free, this is a powerful incentive to move old media out. If you believe that Kryder rates will get back to 30%/yr, after a decade you could store 30 times as much data in the same space.

The reason that the idea of long-lived media is so attractive is that it suggests that you can be lazy and design a system that ignores the possibility of failures. You can't:
  • Media failures are only one of many, many threats to stored data, but they are the only one long-lived media address.
  • Long media life does not imply that the media are more reliable, only that their reliability decreases with time more slowly. As we have seen, current media are many orders of magnitude too unreliable for the task ahead.
Even if you could ignore failures, it wouldn't make economic sense. As Brian Wilson, CTO of BackBlaze points out, in their long-term storage environment:
Double the reliability is only worth 1/10th of 1 percent cost increase. ... Replacing one drive takes about 15 minutes of work. If we have 30,000 drives and 2 percent fail, it takes 150 hours to replace those. In other words, one employee for one month of 8 hour days. Getting the failure rate down to 1 percent means you save 2 weeks of employee salary - maybe $5,000 total? The 30,000 drives costs you $4m.
The $5k/$4m means the Hitachis are worth 1/10th of 1 per cent higher cost to us. ACTUALLY we pay even more than that for them, but not more than a few dollars per drive (maybe 2 or 3 percent more).
Moral of the story: design for failure and buy the cheapest components you can. :-)

Dissemination

The real problem here is that scholars are used to having free access to library collections and research data, but what scholars now want to do with archived data is so expensive that they must be charged for access. This in itself has costs, since access must be controlled and accounting undertaken. Further, data-mining infrastructure at the archive must have enough performance for the peak demand but will likely be lightly used most of the time, increasing the cost for individual scholars. A charging mechanism is needed to pay for the infrastructure. Fortunately, because the scholar's access is spiky, the cloud provides both suitable infrastructure and a charging mechanism.

For smaller collections, Amazon provides Free Public Datasets, Amazon stores a copy of the data with no charge, charging scholars accessing the data for the computation rather than charging the owner of the data for storage.

Even for large and non-public collections it may be possible to use Amazon. Suppose that in addition to keeping the two archive copies of the Twitter feed on tape, the Library of Congress kept one copy in S3's Reduced Redundancy Storage simply to enable researchers to access it. For this year, it would have averaged about $4100/mo, or about $50K. Scholars wanting to access the collection would have to pay for their own computing resources at Amazon, and the per-request charges; because the data transfers would be internal to Amazon there would not be bandwidth charges. The storage charges could be borne by the library or charged back to the researchers. If they were charged back, the 400 initial requests would each need to pay about $125 for a year's access to the collection, not an unreasonable charge. If this idea turned out to be a failure it could be terminated with no further cost, the collection would still be safe on tape. In the short term, using cloud storage for an access copy of large, popular collections may be a cost-effective approach. Because the Library's preservation copy isn't in the cloud, they aren't locked-in.

In the near term, separating the access and preservation copies in this way is a promising way not so much to reduce the cost of access, but to fund it more realistically by transferring it from the archive to the user. In the longer term, architectural changes to preservation systems that closely integrate limited amounts of computation into the storage fabric have the potential for significant cost reductions to both preservation and dissemination. There are encouraging early signs that the storage industry is moving in that direction.

Ingest

There are two parts to the ingest process, the content and the metadata.

The evolution of the Web that poses problems for preservation also poses problems for search engines such as Google. Where they used to parse the HTML of a page into its Document Object Model (DOM) in order to find the links to follow and the text to index, they now have to construct the CSS object model (CSSOM), including executing the Javascript, and combine the DOM and CSSOM into the render tree to find the words in context. Preservation crawlers such as Heritrix used to construct the DOM to find the links, and then preserve the HTML. Now they also have to construct the CSSOM and execute the Javascript. It might be worth investigating whether preserving a representation of the render tree rather than the HTML, CSS, Javascript, and all the other components of the page as separate files would reduce costs.

It is becoming clear that there is much important content that is too big, too dynamic, too proprietary or too DRM-ed for ingestion into an archive to be either feasible or affordable. In these cases where we simply can't ingest it, preserving it in place may be the best we can do; creating a legal framework in which the owner of the dataset commits, for some consideration such as a tax advantage, to preserve their data and allow scholars some suitable access. Of course, since the data will be under a single institution's control it will be a lot more vulnerable than we would like, but this type of arrangement is better than nothing, and not ingesting the content is certainly a lot cheaper than the alternative.

Metadata is commonly regarded as essential for preservation. For example, there are 52 criteria for ISO 16363 Section 4. Of these, 29 (56%) are metadata-related. Creating and validating metadata is expensive:
  • Manually creating metadata is impractical at scale.
  • Extracting metadata from the content scales better, but it is still expensive since:
  • In both cases, extracted metadata is sufficiently noisy to impair its usefulness.
We need less metadata so we can have more data. Two questions need to be asked:
  • When is the metadata required? The discussions in the Preservation at Scale workshop contrasted the pipelines of Portico and the CLOCKSS Archive, which ingest much of the same content. The Portico pipeline is far more expensive because it extracts, generates and validates metadata during the ingest process. CLOCKSS, because it has no need to make content instantly available, implements all its metadata operations as background tasks, to be performed as resources are available.
  • How important is the metadata to the task of preservation? Generating metadata because it is possible, or because it looks good in voluminous reports, is all too common. Format metadata is often considered essential to preservation, but if format obsolescence isn't happening , or if it turns out that emulation rather than format migration is the preferred solution, it is a waste of resources. If the reason to validate the formats of incoming content using error-prone tools is to reject allegedly non-conforming content, it is counter-productive. The majority of content in formats such as HTML and PDF fails validation but renders legibly.
The LOCKSS and CLOCKSS systems take a very parsimonious approach to format metadata. Nevertheless, the requirements of ISO 16363 pretty much forced us to expend resources implementing and using FITS, whose output does not in fact contribute to our preservation strategy, and whose binaries are so large that we have to maintain two separate versions of the LOCKSS daemon, one with FITS for internal use and one without for actual preservation. Further, the demands we face for bibliographic metadata mean that metadata extraction is a major part of ingest costs for both systems. These demands come from requirements for:
  • Access via bibliographic (as opposed to full-text) search, For example, OpenURL resolution.
  • Meta-preservation services such as the Keepers Registry.
  • Competitive marketing.
Bibliographic search, preservation tracking and bragging about exactly how many articles and books your system preserves are all important, but whether they justify the considerable cost involved is open to question. Because they are cleaning up after the milk has been spilt, digital preservation systems are poorly placed to improve metadata quality.

Resources should be devoted to avoiding spilling milk rather than cleanup. For example, given how much the academic community spends on the services publishers allegedly provide in the way of improving the quality of publications, it is an outrage than even major publishers cannot spell their own names consistently, cannot format DOIs correctly, get authors' names wrong, and so on.

The alternative is to accept that metadata correct enough to rely on is impossible, downgrade its importance to that of a hint, and stop wasting resources on it. One of the reasons full-text search dominates bibliographic search is that it handles the messiness of the real world better.

Conclusion

Attempts have been made, for various types of digital content, to measure the probability of preservation. The consensus is about 50%. Thus the rate of loss to future readers from "never preserved" will vastly exceed that from all other causes, such as bit rot and format obsolescence. This raises two questions:
  • Will persisting with current preservation technologies improve the odds of preservation? At each stage of the preservation process current projections of cost per unit content are higher than they were a few years ago. Projections for future preservation budgets are at best no higher. So clearly the answer is no.
  • If not, what changes are needed to improve the odds? At each stage of the preservation process we need to at least halve the cost per unit content. I have set out some ideas, others will have different ideas. But the need for major cost reductions needs to be the focus of discussion and development of digital preservation technology and processes.
Unfortunately, any way of making preservation cheaper can be spun as "doing worse preservation". Jeff Rothenberg's Future Perfect 2012 keynote is an excellent example of this spin in action. Even if we make large cost reductions, institutions have to decide to use them, and "no-one ever got fired for choosing IBM".

We live in a marketplace of competing preservation solutions. A very significant part of the cost of both not-for-profit systems such as CLOCKSS or Portico, and commercial products such as Preservica is the cost of marketing and sales. For example, TRAC certification is a marketing check-off item. The cost of the process CLOCKSS underwent to obtain this check-off item was well in excess of 10% of its annual budget.

Making the tradeoff of preserving more stuff using "worse preservation" would need a mutual non-aggression marketing pact. Unfortunately, the pact would be unstable. The first product to defect and sell itself as "better preservation than those other inferior systems" would win. Thus private interests work against the public interest in preserving more content.

To sum up, we need to talk about major cost reductions. The basis for this conversation must be more and better cost data. I'm on the advisory board for the EU's 4C project, the Collaboration to Clarify the Costs of Curation. They are addressing the need for more and better cost data by setting up the Curation Cost Exchange. Please go there and submit whatever cost data you can come up with for your own curation operations.


* But notice the current stand-off between Dutch libraries and Elsevier.
** Bill Arms intervened to point out that IDC's 60% growth rate is ridiculous, and thus the graph is ridiculous. He is of course correct, but the point is that unless your archive is growing less than the Kryder rate, your annual storage cost is increasing. The Kryder rate may well be as low as 10%/yr, and very few digital preservation systems are growing at less than 10%/yr.
04 Aug 21:03

GRAND OPENING of the Binghamton Brewing Co.!

by admin
Edward M. Corrado

New brew pub

Sat, 08/16/2014 - 2:00pm

2:00 - 10:00 PM

The Grand Opening of Binghamton Brewing Co. will be an event to remember! This is a Vaudevillian Carnival, so bring your bowler caps and suspenders, and grease up your moustache tips.

Your ticket includes:

  • Admission to the Event
  • BingBrew beer samplings
  • A special edition grand opening day tasting glass
  • A behind-the-scenes tour of the brewery and taproom
  • Beer passport and event guide
  • Other cool schwag

The Carnival will have:

read more

09 May 00:58

Corrado, Ed: New Book: Digital Preservation for Libraries, Archives, and Museums

by ecorrado

A few weeks ago a new book I co-authored with Heather Lea Moulaison was published by Rowman and Littlefield. The book is titled Digital Preservation for Libraries, Archives, and Museums. Initial reaction has been extremely positive. It is available through all of the major book sellers such as Amazon where at one point it was #7 in one of its categories! If you interested in digital preservation, please consider purchasing the book or borrowing it from your local library. Below is the publisher’s description of the book:

Digital Preservation in Libraries, Archives, and Museums represents a new approach to getting started with digital preservation: that of what cultural heritage professionals need to know as they begin their work. For administrators and practitioners alike, the information in this book is presented readably, focusing on management issues and best practices. Although this book addresses technology, it is not solely focused on technology. After all, technology changes and digital preservation is aimed for the long term. This is not a how-to book giving step-by-step processes for certain materials in a given kind of system. Instead, it addresses a broad group of resources that could be housed in any number of digital preservation systems. Finally, this book is about “things (not technology; not how-to; not theory) I wish I knew before I got started.”

Digital preservation is concerned with the life cycle of the digital object in a robust and all-inclusive way. Many Europeans and some North Americans may refer to digital curation to mean the same thing, taking digital preservation to be the very limited steps and processes needed to insure access over the long term. The authors take digital preservation in the broadest sense of the term: looking at all aspects of curating and preserving digital content for long term access.
The book is divided into four parts based on the Digital Preservation Triad:

  1. Situating Digital Preservation,
  2. Management Aspects,
  3. Technology Aspects, and
  4. Content-Related Aspects.

The book includes a foreword by Michael Lesk, eminent scholar and forerunner in digital librarianship and preservation. The book features an appendix providing additional information and resources for digital preservationists. Finally, there is a glossary to support a clear understanding of the terms presented in the book.

Digital Preservation will answer questions that you might not have even known you had, leading to more successful digital preservation initiatives.

08 May 16:56

New Book: Digital Preservation for Libraries, Archives, and Museums

by ecorrado

A few weeks ago a new book I co-authored with Heather Lea Moulaison was published by Rowman and Littlefield. The book is titled Digital Preservation for Libraries, Archives, and Museums. Initial reaction has been extremely positive. It is available through all of the major book sellers such as Amazon where at one point it was #7 in one of its categories! If you interested in digital preservation, please consider purchasing the book or borrowing it from your local library. Below is the publisher’s description of the book:

Digital Preservation in Libraries, Archives, and Museums represents a new approach to getting started with digital preservation: that of what cultural heritage professionals need to know as they begin their work. For administrators and practitioners alike, the information in this book is presented readably, focusing on management issues and best practices. Although this book addresses technology, it is not solely focused on technology. After all, technology changes and digital preservation is aimed for the long term. This is not a how-to book giving step-by-step processes for certain materials in a given kind of system. Instead, it addresses a broad group of resources that could be housed in any number of digital preservation systems. Finally, this book is about “things (not technology; not how-to; not theory) I wish I knew before I got started.”

Digital preservation is concerned with the life cycle of the digital object in a robust and all-inclusive way. Many Europeans and some North Americans may refer to digital curation to mean the same thing, taking digital preservation to be the very limited steps and processes needed to insure access over the long term. The authors take digital preservation in the broadest sense of the term: looking at all aspects of curating and preserving digital content for long term access.
The book is divided into four parts based on the Digital Preservation Triad:

  1. Situating Digital Preservation,
  2. Management Aspects,
  3. Technology Aspects, and
  4. Content-Related Aspects.

The book includes a foreword by Michael Lesk, eminent scholar and forerunner in digital librarianship and preservation. The book features an appendix providing additional information and resources for digital preservationists. Finally, there is a glossary to support a clear understanding of the terms presented in the book.

Digital Preservation will answer questions that you might not have even known you had, leading to more successful digital preservation initiatives.

02 Apr 23:20

03/31/14 PHD comic: 'Check it'

Piled Higher & Deeper by Jorge Cham
www.phdcomics.com
Click on the title below to read the comic
title: "Check it" - originally published 3/31/2014

For the latest news in PHD Comics, CLICK HERE!

28 Jun 15:37

The Real World

by bikeyface

Everyday there’s more and more technology to keep us more connected… at work:

The Real World

…at home:

The Real World

…even during our commute:

The Real World

With “social media” connecting us, there’s something missing. The real world. And real people.

You may not be able to change whether you need to work. But you can change how you get to work. Just changing a commute from four to two wheels makes a big difference.

Social Media

Not only that, once you’re outside the box, you’ll find yourself running into real people again. Lot’s of them. Doing interesting things, going interesting places. And there’s always time to stop and talk.

The Real World

You can use all social media you want, but it turns out, bicycles are a better way to actually connect.