2007-08-10

Redefining "Document", Part 1: Information Sharing Problems

Recently I've been working to establish some very basic architectural concepts for large-scale, multiparticipant sharing of data, possibly including restricted data. It's been interesting and a little unsettling to realize just how poorly surveyed this problem space really is. I've increasingly come to believe that any large-scale information sharing effort is terribly exposed to two serious problems:
  • Loss of policy control over shared information;

  • (Probably nonlinear) degradation of information quality.

It's taken me a long time to get to this point, but I think I can state with reasonable confidence that the crux of the problem lies in unexamined concepts of what a "document" is. Fortunately, some conceptual investigation of the nature of documents yields a possible solution, and it's happily a solution that seems to be technically feasible.

To begin with, some definitions:
  • Managed data is persistent structured information whose maintenance and dissemination is at least nominally governed by a single authority.

  • Such a managing authority is called a primary source, or, especially in contexts related to policy, a primary authority.

  • Information sharing refers to all transmissions of managed data between primary sources and/or secondary consumers. (Note that in some contexts, "information sharing" has a narrower meaning, referring only to information transmissions whose destination is human eyes (as opposed to an automated data consumption process). That limitation does not apply here.)

  • A Document Management Application is a software application used by the primary source to manage the information for which it's responsible.


Primary Sourcing and the mindset of inevitability

As stated, the two major dangers involved in information sharing are loss of policy control and degradation of data quality. Explanations of both are necessary before discussing the essential problem of the nature of documents.

Both of these problems are founded on the concept that a primary authority really is responsible for managing its information, and is to be considered the authoritative source for that information. This primary sourcing concept, often reduced to the somewhat simpler formulation of "data ownership", has actually been abandoned by a lot of policy thinkers in the rush to share information: I've heard more than once that "Data ownership is obsolete". I was troubled by that remark the first time I heard it, and I never got more comfortable with the notion. This post covers a lot of territory, but in large part is a rebuttal to that whole idea.

It's instructive, and mildly distressing, to see how quick policy thinkers have been to embrace a kind of technological determinism. This is probably something of a conditioned response to the expensive and monolithic information technologies of previous decades, and the consequent pervasive insertion of IT professionals (like me) into business, social, and political decision processes, where we serve as gatekeepers, telling people what can and cannot be done.

I think, too, that it reflects something of the ruthless, bottom-line-driven, hypercompetitive nature of today's corporate culture. That culture has pressured society into an unfortunate habit of thought, in which any attempt to assess benefit, risk, or cost according to criteria other than the corporate balance sheet is subject to derision and, if persisted in, retaliation.

These two dynamics — IT people telling policymakers "You can't do that, there's no way to do that", and corporate people telling policymakers "Nice little policymaking job you got there, be a shame if something happened to it" — probably explain the widespread readiness to think of technological progress as both deterministic and sovereign: a great big train barreling down a single line of track. You can't stop it, changing its direction is fundamentally impossible, and you'd best not be in the way.

Fortunately, both of these dynamics seem to be increasingly subject to challenge. Current information technologies are rapidly eroding the concept that IT is rigidly constrained in its capabilites; and the remarkable criminality and corruption of the Republican Party since its ascendancy to power in the Reagan years seems finally to be undermining America's unthinking habit of deference to business priorities. I'm hopeful that it's becoming possible to have design discussions that do not begin with pernicious assumptions of inevitability.

Because seriously: loss of policy control and degradation of information quality are the kinds of problems that are immensely more difficult and expensive to remedy after the fact. To the extent that we can work out concepts, designs, and practices that prevent those problems from developing in the first place, we will be in an immensely better position.

I believe it's urgent to work very seriously at the problem. Information sharing as a general project is gaining traction, and all of its manifestations appear to envision a default behavior that is very detrimental.

The default model of information sharing is to make copies of documents and transmit them to sharing recipients. For present purposes, this model will be dubbed Massive Duplication, because its net result is of course many copies everywhere. The reason for this being the default proceeds directly from the fundamental nature of our concept of a Document.

Massive Duplication disastrously amplifies the two central problems addressed here.


Loss of Policy Control

Losing policy control over shared information is defined as the condition in which the primary source lacks authority over — and in some cases, even knowledge of — what the recipient is doing with the data. Is it being copied to a data warehouse or archive? Is is being secondarily disseminated to other parties? Is it being merged into other records owned by another primary source altogether?

Maintaining policy control is, of course, important only if the information being shared is in some way sensitive: if its disclosure to an inappropriate recipient could cause harm of some sort. But information sharing projects are, implicitly, all about trafficking in such sensitive information. After all, if it's not sensitive, if it's truly public data, it's cheap and easy to just post it on a website and fuggedaboudit.

The problems of policy control are all problems of disclosure policy. Control over other kinds of information operations (eg., creating, deleting, and modifying records) is not affected by the existence of alternate copies floating around out in the world. It's disclosure that's the issue.

When information is shared, the governance of subsequent disclosure and usage can never be absolute. (If nothing else, a bad actor viewing shared information could always transcribe it and do something nefarious with it.) However, there's an enormous difference between an information sharing architecture which is vulnerable to bad-faith abuse, and a policy-hostile architecture that makes it impossible to maintain policy control even when all participants are well-behaved.

Massive Duplication is of course such a policy-hostile architecture.


Degradation of Information Quality

"Information Quality" is a broad and not entirely tightly defined term that covers a lot of issues. In general, it speaks to the question, "How trustworthy is my data?"

Information Quality topics include issues like:
  • How reliably sourced/observed is the content of the information?

  • How good are the data sanitation practices of the entities that have had possession of the data?

  • What confidence level has been assigned to the data? What are the criteria used for that assignment? How trustworthy are the entities contributing to that assignment?

  • How well-understood are the transformations and rationalizations (if any) applied to the data? Are they clean, correct mappings, or susceptible to semantic error?

  • Is the data normalized or does it reiterate any part of its content?

  • What is the observed or reported incidence of data error in information from that source? Are internal inconsistencies observable within the data?

  • ...and so on.

Information Quality is an acknowledged problem in all information management activity. Information Sharing just happens to magnify the problem's every aspect and manifestation.

As is the case for policy control, the cause of good information quality is hurt badly by adopting the Massive Duplication model of information sharing. Its worst effects are:
  • The n-generation effect, in which a document is subject to a given probability of replication or transcription error for every sequential event of transmittal, and every event of persistence, outside the primary source. This is very similar to the progressive degradation of image quality as successive photocopies are made, each from the last photocopy. The term "n-generation" is derived from the description of how many sucessive copies-of-copies resulted in a given print.

  • The divergence effect, in which copies transmitted to multiple recipients create a greater likelihood of a "fork" in the data, causing the content to diverge. This is a topologically different problem than the n-generation effect, and its primary risk is the creation of separate version chains of the document: it's difficult and expensive to reconcile inconsistencies in such an information structure.

  • The loss of provenance for all or part of the information in a document. This is the condition in which knowledge of the authoritative source of the information in the document has been lost or erroneously recorded. The most important consequence of this loss is the marked increase in difficulty — and expense — when trying to resolve any questions about the document's content or validity.

  • The patchwork quilt effect, in which successive generations of copies of data have content added, modified, and removed along the way to suit the purposes of the holder of the moment. In this fashion, as the document is forwarded, it becomes increasingly a composite of data from heterogeneous sources. Accuracy and provenance of any given patch in the quilt becomes more difficult to persist and establish; in some situations, essential components of the tracking and identifying information may be among the discards.

  • And finally, there's the devastatingly simple problem of the reverse axis. As hard as it can be, in a massively duplicated information sharing environment, to track a piece of information back to its source, attempts on the part of the primary source to issue corrections, retractions, redactions, etc. forward seem almost certain to fail to reach some of the disseminated copies.

I'm in no position to make authoritative mathematical statements; I haven't constructed a math model for the probability of error in any of these scenarios. And math can surprise you.

But my intution is strong enough to bet anybody a beer that any rigorous probabilistic error-rate predictive analysis would contain significant terms that would be worse than linear. (I speak here of linearity with respect either to the number of information transmissions, or to the number of participants in the exchange.)

The consequences of embarking on massively duplicated information sharing are potentially quite grave. And the consequences would grow over time, as the common slush-pile of copies of copies would grow deeper and higher.

2007-07-27

Remarks

This is a spot to leave any comment that seems good to you.

Sort of like "Open Thread" on blogs with a lot of posting. :-)

2007-07-10

Thank you Mr. Durrell

Lawrence Durrell is one of those few poets whose gifts migrate well to prose.

In my Favorite Books in the profile, I have listed The Alexandria Quartet, Durrell's enduring masterpiece. It is four books: Justine, Baltahzar, Mountolive, and Clea.

I have spent most of my life under the complete spell of Fantasy and Science Fiction. Couldn't really say why, but not because it's unclear to me; it's not unclear at all. I couldn't say simply because it's axiomatic to my mind and heart. How could I not? More on that in another post perhaps.

But The Alexandria Quartet is by no means SF, nor Fantasy. Why and how it gives me what I need, that usually I only find in stories of elsewhen, I can't say either. This really is a mystery, and it pleases me to be mystified and gratified all at once.

I'll let the author help explain, if explanations are called for. This is the beginning: the opening of Justine.
The sea is high again today, with a thrilling flush of wind. In the midst of winter you can feel the inventions of Spring. A sky of hot nude pearl until midday, crickets in sheltered places, and now the wind unpacking the great planes, ransacking the great planes...

I have escaped to this island with a few books and the child—Melissa's child. I do not know why I use the word "escape". The villagers say jokingly that only a sick man would choose such a remote place to rebuild. Well, then, I have come here to heal myself, if you like to put it that way...

At night when the wind roars and the child sleeps quietly in its wooden cot by the echoing chimney-piece I light a lamp and limp about, thinking of my friends—of Justine and Nessim, of Melissa and Balthazar. I return link by link along the iron chains of memory to the city which we inhabited so briefly together: the city which used us as its flora—precipitated in us conflicts which were hers and which we mistook for our own: beloved Alexandria!

I have had to come so far away from it in order to understand it all! Living on this bare promontory, snatched every night from darkness by Arcturus, far from the lime-laden dust of those summer afternoons, I see at last that none of us is properly to be judged for what happened in the past. It is the city which should be judged though we, its children, must pay the price.
I should stop there. But because I'm self-indulgent and this blog, as Red Green would say, "is mine, and I can do what I want with it", more from a few pages on:
For us artists there waits the joyous compromise through art with all that wounded or defeated us in daily life; in this way, not to evade destiny, as the ordinary people try to do, but to fulfil it in its true potential—the imagination. Otherwise why should we hurt one another?
And:
Today the child and I finished the hearth-stone of the house together, quietly talking as we worked. I talk to her as I would talk to myself if I were alone; she answers in an heroic language of her own invention. We buried the rings Cohen had bought for Melissa under the hearth-stone, according to the custom of this island. This will ensure good luck to the inmates of the house.

2007-07-06

N-Word

The always interesting tidalgrrrl has just posted John Lennon's Woman Is The Nigger Of The World as the first installment of Friday Feminism. She also posted the lyrics, but bowdlerized "nigger" to read "n****r" because she dislikes the word, and then wondered about the nature and implications of this self-censorship.

"Nigger" is a word I've thought about a lot. Setting aside its earlier usage by the British (who seemed to apply it, not just to Africans or their descendants, but to dark-skinned peoples generally), its primary existence in this world has been American. In America, Nigger has always been used as a peculiar kind of word: derogatory slang whose purpose is to scornfully disempower a particular segment of society. Americans of African descent have been systematically and unrelentingly disadvantaged since the first slaves were brought across the Atlantic, and that evil tradition continues. America is a profoundly racist society.

(Some years back, I worked in a shipyard. The level of dumbfuck blue-collar racism was pretty high, and I couldn't resist schooling my coworkers from time to time. At one point they were dumbfounded when I said patiently that really there were hardly any "Niggers" in Africa. "'Nigger' is a putdown used on a group of Americans. A Nigger is an American. By definition," I explained. For a while after that I would catch a few of them staring off into space, frowning and moving their lips a little.)

When society decides to squash a group of people, a simple well-known derogatory term is enormously effective and efficient. It's code for "You have no hope of fully belonging, no hope of fair treatment. We will arbitrarily attack or imprison or impoverish you, and there is no recourse. The worst of us will be given preference over the best of you. You're not really fully human, and you have a place among us -- you have a life -- by our sufferance only."

That's what Nigger means. Fantastic semantic compression, really, when you think about it: the great big poison grenade of sneering arrogant disdain, and even really stupid and inarticulate people Get It.

When there's a word like that, one of the most effective and indispensable strategies for the oppressed is to take it as your own, own it proudly, own it defiantly, shove it in their face, so it means something you want it to mean; until one day that entire freight of compressed hatred is just so last year.

Those of us who are dykes, fags, queers, nerds, geeks, freaks, bitches and even Liberals have all benefited from such reclamations (I leave speculation about my membership in those individual categories as an exercise for the reader; on the Internets nobody knows you're a ________). But Nigger, despite a lot of smart, sincere effort by a lot of smart, sincere people over the years, still resists efforts to make it all right. No less stalwart a personage than Richard Pryor publicly renounced the use of Nigger in his stand-up routines, saying (IIRC, ain't got no citation handy) that "yeah, you tell yourself a lot of lies" -- lies about how "Nigger" was an empowering, rather than hurtful, word.

Nigger is an ugly word with an ugly history, and apparently lots of ugly staying power. It's both sad and fascinating that tidalgrrrl is so repulsed and conflicted by it that she first censors herself, then calls attention to the censorship. This from a forthright and expressive woman who doesn't flinch from fuckshitpisscuntcocksuckermotherfuckerandtits when they are called for.

I don't censor my writing, except for concern about whether a term or concept will work for specific audiences. When I'm writing analytically, or when I'm expressing an imagined rightwing opinion (e.g., "Yeah Earl, I know, the war on drugs is a pain in the ass, but it helps keep the niggers down so that's good") I won't hold back.

But I do not and will not use Nigger as a pejorative. And I don't trust it as irony, because there is too much chance of perpetuating memes that I want to see extinguished, or else behaving like a fucking idiot (no, Jonah didn't say that. Tbogg put those words in his mouth, but damn how plausible).

I sincerely hope that one day Nigger will be reclaimed for real. That one day everybody will snort with amused derision whenever someone tries to resurrect the old meanhearted power of Nigger as a putdown, kind of the way people today stare in puzzled semi-bafflement when someone tries to use "fairy" as a homophobic insult.

But I think that day is a long way off. As long as African Americans are singled out, with unrelenting stupid persistence, as permanent inferiors, and as long as there are strident Confederate motherfuckers out there with a desperate need to know they can reliably look down on somebody no matter what, Nigger will remain a cruel word.

Loaded words are like loaded guns. Don't play with them in the livingroom.

2007-07-04

No, it's not Freudian

One of my favorite books is The Worm Ouroboros, by E.R. Eddison. It's a glorious early 20th-century fantasy, in which the dialogue is Elizabethan and the narrative prose a modified Jacobean. Some of the best English outside of Shakespeare, if you have the patience for it; which I sadly observe hardly anyone does these days.

In the chapter Conjuring in the Iron Tower, Eddison paints a climactic scene of alchemical sorcery:

Therewith the King unlocked the greatest of those books that lay by on the massive table, saying in Gro's ear, as one who would not be overheard, "This is that awful book of grammarie wherewith in this same chamber, on such a night, Gorice VII. stirred the vasty deep. And know that from this circumstance alone ensued the ruin of King Gorice VII., in that, having by his hellish science conjured up somewhat from the primaeval dark, and being utterly fordone with the sweat and stress of his conjuring, his mind was clouded for a moment, in such sort that either he forgot the words writ in this grammarie, or the page whereon they were writ, or speech failed him to speak those words that must be spoken, or might to do those things which must be done to complete the charm. Wherefore he kept not his power over that which he had called out of the deep, but it turned upon him and tare him limb from limb. Such like doom will I avoid, renewing in these latter days those self-same spells, if thou durst stand by me undismayed the while I utter my incantations. And shouldst thou mark me fail or waver ere all be accomplished, then shalt thyself lay hand on book and crucible and fulfill whatsoever is needful, as I shall first show thee.


The scene is grim and scary and beautifully described: nuts-and-bolts sorcery, played for keeps.

And now through every window came a light into the chamber as of skies paling to the dawn. Yet not wholly so; for never yet came dawn at midnight, nor from all four quarters of the sky at once, nor with such swift strides of increasing light, nor with a light so ghastly. The candle flames burned filmy as the glare waxed strong from without: an evil pallid light of bale and corruption, wherein the hands and faces of King Gorice and his disciple showed death-pale, and their lips black as the dark skin of a grape where the bloom has been rubbed off from it.


And matters go badly. The King does summon the demonic presence, and assigns it a task...

But now was the King's endurance clean spent, so that his knees failed him and he sank like a sick man into his mighty chair. But the room was filled with a tumult as of rushing waters, and a laughter above the tumult like to the laughter of souls condemned. And the King was reminded that he had left unspoken that word which should dismiss his sending.

...

Yet was Gro mindful, even in that hideous storm of terror, of the ninety-seventh page whereon the King had shown him the word of dismissal, and he wrenched the book from the king's palsied grasp and turned to the page. Scarce had his eye found the word, when a whirlwind of hail and sleet swept into the chamber, and the candles were blown out and the tables overset. And in the plunging darkness beneath the crashing of the thunder Gro pitching headlong felt claws clasp his head and body. He cried in his agony the word, that was the word TRIPSARECOPSEM, and so fell a-swooning.


Damn kids nowadays, with their World Of Warcraft and their Hogwarts Academy, they just don't understand good old-fashined magic, dammnit.