December 10, 2018

My Talk of Bitcoin at Reading Uni

Filed under: TMSR — Diana Coman @ 11:01 p.m.

As mentioned already in the logs1, today I gave a talk for students at Reading University (UK) about Bitcoin. Following the very useful advice from the forum, I centred my talk on "What Is Bitcoin?" and then I built on that with a discussion of actual innovation in Bitcoin (WoT, V, Deedbot) as opposed to the attempts at subversion of Bitcoin ("blockchain technologiezzz") that are usually touted about as "innovative!!" by all those "involved" through mere talking and repeated mentioning of all the buzz words in whatever happens to be the current consensus-approved sequence. Since there is no recording of the talk2 I decided it's best to do this write-up now, while everything is still quite clear in my memory. Who knows - it might help someone else who is looking to actually understand this Bitcoin thing rather than just copy/paste the most commonly copy/pasted stuff.

Initially the University had this bright approach to schedulling of talks and rooms so that they could not even guarantee that I'd have some sort (any sort!) of board on which to write during the talk and so I decided to make the most of it and make some supporting slides, despite the fact that I rather hate doing those things. Nevertheless, I've made them now so you can download the slides for this talk - just be aware that they are meant precisely to support the talk not to replace it so they don't actually contain even 10% of the talk, they just help *illustrate* what I say. So no, here as everywhere else, you can't really replace the person doing the talk and not see any difference, such is life, totally unfair. I'll try though to summarize in here the main ideas I covered - keep in mind that the whole talk was 90 minutes, of which I'd say about 30 minutes were Q&A throughout (i.e. not all at the end, as I took questions as and when they were able to actually ask something).


After an initial brief presentation of myself and Minigame establishing some basis on which I am actually giving this talk, I collected the existing "definitions" that students had about Bitcoin. As expected, they were not much in terms of meaning but surprisingly enough they were also rather difficult to extract - perhaps shyness or in a more optimist interpretation some well-placed notion that no, they don't actually have any idea what the thing is. Anyway, for the record, I got only two definitions: "a very volatile currency" and "an immutable ledger of transactions." So I noted that those touch a teeny tiny bit on what Bitcoin has in common with "currency" and "ledger of transactions" respectively while missing at the same time the much wider part of what is actually new, different and essentially disruptive about it. And I noted that the only way to actually get an idea of a complex notion (such as Bitcoin) as a whole, is by starting from its basic principles and following their characteristics and implications. So without further digression, I moved on to enumerating the 4 basic principles of Bitcoin (asymmetric cryptography, databases, peer-to-peer (p2p) and anonymity/pseudonimity) and then presenting each of them in turn.

For explaining asymmetric cryptography, I used Mircea Popescu's excellent approach of starting with the simple example of obtaining the 2 prime factors of a given number vs obtaining the number itself from those 2 prime factors when they are known. I took though rather bigger numbers because my audience was - it is to be hoped at least - made of people able to use their computers for such a task at short notice (I hope!). Based on this, I introduced the notion of public and secret (private) key, using the slides to graphically build up the whole thing piece by piece, aka keys, arrows, locks and all that bling. Basically I showed them a mini animated cartoon, what! And I'd say it worked - I could clearly see some ears pricking up!

Following on from the above, I discussed how that makes it possible to send messages without divulging the content but also without divulging the intended target either! Cue more animated cartoons with people simply trying to decrypt messages from a pile until they find the one (or at least one) that they *can* decrypt with their private key - hence, if they can decrypt it, then (and only then), it is meant for them! At this point they actually started to sit straighter and even nod from time to time as it - apparently - made sense. I finished this first part with a summary of what I had said so far, essentially the slide behind me in that picture.

The database and p2p principles I covered much faster as it was really just the main ideas that were of most interest to me for the task at hand: a database is essentially structure applied to data and that means any structure; p2p allows participants to find and obtain all the different parts of information out of a given set, without having to rely on any central authority /middle man but simply by communicating and interacting with the other participants. Nevertheless, p2p was also good for another mini-cartoon to illustrate that one needs to be able to assess what part of the whole they might have at any given time, what parts they are missing and to what degree a proposed part is what they are looking for.

After this brief interlude of databases+p2p I got back to building on the asymmetric cryptography introduced earlier aka introducing anonymity and pseudonimity. As the students were by now rather intently listening, I asked them what they could tell about those 3 messages on slide 30: there are 3 signed messages (encrypted with secret keys) of which 2 verify (decrypt correctly) and one doesn't verify with one given public key. After a few hesitations and false starts, they actually got it - yes, one can say that both that decrypt correctly were actually encrypted with the *same* private key! So I took this further and pointed out that this implies authorship (the one controlling the pair of this public key is the author of the first 2 messages), continuity (whoever signed the first message, also signed the second) and repudiability - one can easily prove that they did NOT sign the 3rd message! All very useful things since together they allow one to obtain pseudonimity - in other words, to have a way of gaining authorship and continuity without having to give up privacy and mix up in there personal information such as date of birth or where they hang out with friends on a Saturday night. So no more of "verify your identity by giving us the keys to your house and generally whatever else we might think of asking". Instead, pseudonimity by means of public/private keypair - one neat and simple way of enjoying the results of your work (aka building your reputation) without having to give up more private information than you choose to. This was the first time I mentioned reputation and I took the time to give a few examples since I was preparing the ground for the WoT later on. Judging by the large amount of questions that the WoT sparked, I'd say it hit home.

Putting everything together, I showed this boring slide 36 with classic bullet points. To make up for it and to check to some extent how much of the head-nodding actually meant some sort of understanding, I asked the students to identify in that definition what exactly "caused" each of those main characteristics listed there. They were fine with p2p -> no middle/central authority and they worked their way on pseudonimity. I explained the other two (reputation vs blind trust; irreversibility) in more detail since there were some important aspects there that weren't apparently all that obvious - including the "trust" in banks and governments and parents and what-not. Basically when many still trust without even realising that they really just ..."believe" and nothing more.

For the next step, I pointed out that innovation with/in Bitcoin is simply something that *maintains* all those main characteristics of Bitcoin while attempting otherwise to adapt the world ("how things are done") to it and NOT the other way around. By contrast, anything that tries to chip away even in the slightest at *any* of those main characteristics is nothing more but an attempt at subversion. As soon as one tries to change Bitcoin to fit the world ("what people expect/need/want"), one is - whether they admit it or not, whether they know it or not - trying to subvert Bitcoin, not innovate with it. So the difference being clear and easy enough to make hopefully, one can tell for anything whether it's innovation or subversion and they can then *also* tell whether the author is at least honest about it! At which point of course, they can therefore update the reputation of this author... even quite publicly if they are using the WoT!

From this point on I went online to show the WoT and V directly, exploring it a bit and discussing the actual meaning and use of the WoT and then linking it to deedbot and the voice system + wallet in #trilema. By this point there were several hands raised at the same time so questions started to really pickup. Quite a good number of questions focused on the "but it's not fair" aspect of an existing WoT + a signed-patches-only V:

  • So a newcomer needs to trust implicitly the deedbot owner in order to get into the WoT?
    • No, a newcomer needs first to find at least one person in the WoT who is willing to give them a rating. Unless and before they do that, they are just not part of the WoT and therefore there isn't much that they can do really. This is NOT meant to be (and nothing can be) a replacement for talking to people or getting to know people!
  • But then it's not fair for those that are not in the WoT!
    • Sure, life is not fair, nor will it ever be fair. Yes, it can be hard to get in the WoT if nobody knows you. So make yourself a key and start building up your reputation so that people CAN get to know you!
  • How do I know you did not rate yourself?
    • You talk to those people who rated me. If you can't talk to them, their rating is meaningless for you anyway. And yes, one can make as many "identities" (aka private keys) as they want to but remember that building up each of them takes time and effort. So why exactly would I waste this time and effort in bogus identities instead of increasing my real reputation?
  • But what if everyone negrates me for a tiny mistake I make in a patch and I can't participate to the community anymore3?
    • What of it? You TALK to them and see what you need to do to get them to change their rating; and if they are indeed as unreasonable as you assume them to be then why would you actually want to work with them?
  • Why isn't the WoT stored in the blockchain so that there isn't a "single point of failure" in the person owning deedbot?
    • You are confusing there the WoT with its *representation*! Deedbot simply stores a representation of the WoT but it is NOT the WoT, nor does it own the WoT. It is participating people who actually make and own the WoT. So if deedbot goes away or its owner turns rogue -as it actually happened with assbot before4 - the WoT simply moves/splits/migrates following the people. The ratings I give are neither fixed in stone nor anyone else's property and it is always and everywhere the people that matter, not the bots. And for that matter there can be *any number of WoTs*, what.
  • But what if the owner of deedbot cheats with the ratings?
    • Well, how exactly? People who gave those ratings know what ratings they gave and they will shout/negrate as soon as they notice any foul play. As for "bots" - what of them, their ratings are worthless since nobody rates *them* (and pure circular ratings you to bot and bot to you won't get you anywhere really, at least not with any sort of thinking people, no).
  • Why is the WoT implemented through deedbot, with a central model? Wouldn't it be better5 to rely on the blockchain only for the WoT so you don't have to trust a single person?
    • Why would the WoT need the blockchain? Ratings are NOT immutable. And at any rate, I'd much rather trust a person (who I can negrate if things go wrong!) than a... network as a whole (which for that matter is not even all that trustful, read about mining cartels to start getting some idea on this matter). Once again: there is no substitute to talking to, interacting with and ultimately trusting (or NOT trusting) individual people!
  • But what stops deedbot's owner from running with all the money?
    • Nothing can ever stop anyone from doing what they want to do. But the incentives here are stacked *against* this: his considerable reputation is on the line; and he can much more easily earn money AND increase his reputation by simply charging a fee for services rather than running away with the piggy banks. It's the same with any sort of contracts too, really: what stops the other party from breaking their contract? "The law!" No, the law doesn't and can't *stop* anything. You may trust some particular state that it will enforce that law and therefore give you some compensation for the breach of contract but it is just that - trust, blind-faith.

There were in fact many other questions but those are the ones that I think are rather usual so the answers can help others perhaps as well as the students who asked them this time. Obviously, the answers given above are those given at a talk - without the time to go deeper into details and without the luxury of linking related logs/posts from the abundant existing materials. I still did point out to students that there is much more to read on the subject both in the logs (which I demonstrated a bit) and on the TMSR blogs but whether they will do any reading of this sort or not is entirely up to them - since the benefits to reap for doing so as well as the losses to enjoy for failing to do so will also be for each of them personally and only for them.

All I can say is that I'm genuinely rather curious whether any of them gets to actually register a key and do something with it, now that they nodded at the talk, they asked the questions, they came at the end to me to say that they liked the talk and now that they have in front of them this opening, this way to get into it all6 - for as long as they actually have it, of course, since nothing is for ever!

  1. Yes, that's where pretty much everything worth the mention is... mentioned. Where else? 

  2. And barely any photos as the prof I asked to take some photos followed the talk so intently that he...forgot to take any! So he had to take a couple at the very end, while students were still asking questions, what can I do. 

  3. I let pass the whole community participation as there were bigger fish to fry this time. 

  4. Yes, I did give them the summary of the #bitcoin-assets debacle. 

  5. I honestly think he just-about-said "fairer" 

  6. Yes, I did tell them about #eulora and #ossasepia and even that I will rate them if they show up and convince me that they were at the lecture. Not that it makes in itself a difference - those who would come would come and find their way anyway and those who won't still won't, I know that. 

November 13, 2018

V with VTools, Keccak Hashes and Its Own Tree

Filed under: Coding, TMSR — Diana Coman @ 10:49 p.m.

The republican versioning control system, V for Victory, is very much used, very much needed but nevertheless not yet versioned itself. As this has caused already way more talk in the logs than it's worth it, I promised I'll do a write-up of my own V setup and publish it, with a proper versioning for V itself included. So I've dug up old versions as well as my current setup and packaged everything in 2 different ways:

  1. A V-tree (using Keccak hashes) that captures the changes to v.pl code1 from the first version that I ever used, namely 999942. To use this, simply download the .vpatches, the .sig files and my signature, check them and then press the tree with either a different V that you might have or otherwise semi-manually with phf's vtools (vpatch more precisely). Note that you WILL need vtools (or equivalent) at any rate! Once you pressed V itself successfuly, you should have a v.pl that you can run - it will check its dependencies and complain if it doesn't find something (most notably the vtools parts namely vpatch and ksum).
  2. A signed .zip meant as a starter package for someone who hears of V for the first time in their life. Download starter_v.zip and starter_v.zip.diana_coman.sig. Check the signature! If and ONLY IF the check passes, unzip and then read the scripts in there. The build.sh script will simply build3 the vtools that are included in this starter pack and it will copy to the starter_v directory all the executables that are needed (v.pl renamed as vk.pl to make it clear it uses Keccak hashes, vdiff, vpatch and ksum). The included vtools are the code obtained from pressing current vtools tree up to and including the ksum .vpatch.

My changes to Mod6's v.pl simply replace older sha-based dependencies and calls with the vtools-based ones. Note that you'll need to have ksum and vpatch in your PATH or otherwise ready and accessible as v.pl will simply try to call them when it needs them.

For potential reference, here's my usual workflow to make a .vpatch:
mkdir a
mkdir b
cp -r old_stuff a/stuff
cp -r new_sutff b/stuff
vdiff a b > newpatch.vpatch
gpg --armor --output newpatch.vpatch.diana_coman.sig --detach-sig newpatch.vpatch

To check / press a V tree:
mkdir patches
mkdir .wot
mkdir .seals
cp some_patches patches/
cp corresponding_sig_files .seals
cp corresponding_trusted_pubkeys .wot
vk4 f
vk l
vk p v testdir chosen_patch.vpatch
cd testdir
read, compile, run etc

The .vpatches and .sig files:

The .zip file and corresponding .sig file:

For something to test your new shiny V on, head over to my Reference Code Shelf and take your pick. For trouble and questions, use the comments box below.

  1. Note that you are warmly invited to implement your own V! This version here is mod6's V implementation that was much discussed and iterated upon in the early days. 

  2. Note that V's versions DECREASE rather than increase, as per the explanation

  3. It requires GNAT. If you have no idea what that is, dig around, read the logs, ask humbly. 

  4. As I have all sorts of V implementations living side by side, I tend to give them different names - this is the vk for V-Keccak! 

July 17, 2018

Discriminatory Code Sharing

Filed under: TMSR — Diana Coman @ 2:10 p.m.

While the world at large is making itself busy with the current fashion of discrimination hunting and public pillorying of any offenders it can get its public hands on, TMSR is peacefully and earnestly discussing in the forum the introduction of a new code release paradigm that is quite as discriminatory as it can be and as a result to rather significant benefit to all. The initial proposal as stated by Mircea Popescu in #trilema has the following parts (split and formatted from the logs in a way that I find easier to read):

the following code release paradigm :
client (code) author
a) releases code encrypted to l1, signed and deeded (so basically, gpg -aer asciilifeform -r ave1 -r etc) ;

b) releases precompiled binaries for allcomers.

1. permits us to control binaries, which means stuff like http://btcbase.org/log/2018-07-16#1834888 (which i'm very much impressed with, btw) ;

2. permits to reserve some interest for the author, because the strategic thinking over at minigame is that we'll want client competition (from skinning all the way to all the way) and remuneration by installs (hence all that hash dance in the new c-s protocol) ;

3. very clearly quashes the idiocy of rms-ism AND ers-ism ("open source" bla bla), and makes the strong political statement that indeed there is a difference between nose breathers and mouthbreathers and so on.

this only works if we can rely on l1 to keep a secret ; which means things (such as, that it can't be as big, for instance).

The discussion can be found in the logs but it can be a bit difficult to follow as it spills over into next day and into other topics on the way. The initial focus was on the issue of "keep a secret" and then on that of "controlling binaries". While both those aspects are worth discussing and are certainly covered to some extent in the log throughout yesterday, they are actually NOT at all central to the proposal as I came to understand it at length. And the discussion perhaps focused on those at first mainly because the speakers - both I and Stanislav - have more practice with the technical perspective and so we read the proposal first through that lens. However, as I kept prodding the issue with questions, various bits and pieces fell into place for me and the whole thing started making more sense. Specifically, this is my current understanding of this proposal:

Discriminatory Code Sharing

The proposal is simply a clear and pointy (i.e. with actual practical power and means to use this power) discrimination between:

a. the general "public" who has access to binaries and nothing else.

b. qualified individuals (l1) who have access to sources.

Note the mass noun in a. and the distinct persons in b.

Note also that the a/b distinction above is a political issue first and foremost. It *does not matter* nor it could possibly matter if some non-l1 somewhere gets at some point his hands on some code or the other. So it sees it. So what? For as long as the "seeing" happens outside the walls of TMSR or otherwise put outside the structure of authority, there is no meaning to it. In practical terms, they can of course see the code, come within the walls and contribute as a result and then what's the problem or the loss? Or they can herp and derp outside and be ignored by TMSR just as they were before they found that code in the woods, so again what's the problem or the loss?

Essentially, code is to be shared but not with anyone able to push some keys. Code is to be shared with and even offered to those who can do something meaningful with it - and only to those. What they decide to do with it, if anything, is of course their own call entirely.

There are significant advantages to this approach:

1. It makes explicit and it gives more meaning to an existing and unalterable difference between "users of software": some can and will read source code, others will just execute whatever they download. Those who consider themselves in the first category but possibly unjustly lumped at the moment with the second, have the option of doing some work and getting into l1.

2. It offers quite a few things to those who actually write useful code:

  • a way of getting help from those most able to give it;
  • as much protection as there currently is anywhere to be found against the significant and eternal pressures of the mindless horde1 as well as against the very real monkeys who are always looking to pick up the fruit of someone else's labour when it's ready;
  • a clearer and arguably easier avenue to making a name for themselves and in the process finding their own place, be it in l1, l2, lx or outside the walls entirely.

3. It adds more meaning (power and responsibility, what else) to the l1 status.

4. It puts more pressure on the need for reproducible builds since the practical and technical aspects of most of the above relies to some extent on those and the actual exercise of the new powers will inevitably run into the issue of non-reproducible builds (as well as any other relevant technical issues that are perhaps yet to be revealed as people stumble upon them).

The only disadvantage stated from the beginning was the fact that the approach is unlikely to scale very well as the size of l1 increases - there needs to be a rather close agreement within l1 at the very least on the core aspect here: code is not secret but sharing it is a responsibility and choosing the recipients is a matter not to be taken lightly.

I can perhaps see a potentially different issue with submitted code that keeps growing in volume. However, I'd expect that it is a bit too early to worry about that and the solution is more likely to be naturally found - if nobody actually reads it, there is no effect. For the code's author it's just as if the code wasn't even submitted in the first place if not even worse since he might easily land in the soup for being an idiot who can't read the log and doesn't understand at all how lines of code are weighed in the first place.

Based on my above understanding of this proposal, I must say that I'm all for it. From all I see, it's a rather significant improvement for everyone even remotely touched by it and at relatively little real cost to anyone involved.

It might be of course that I misunderstood the proposal in parts or entirely in which case I very much want to hear in the comments below where I'm mistaken.

  1. Also known as the horde of idiots, mountain of idiots, sewer rats and so on. 

March 28, 2016

When the Messenger Shoots Back

Filed under: TMSR — Diana Coman @ 11:56 p.m.

I could title this: the post I did not want to write. There has been a lot written already on the BitBet issue and the #b-a logs have frothed over it more than enough. Still, seeing how after all this time nobody in the midst of it all seems to either see what I see or otherwise care enough to state it, I have no choice but to write it anyway, because the alternative is that this view is never even put out there at all, for better or for worse. And to make it clear: I do not write this for whatever may come out of it (there's nothing positive I can really see coming either). I write it because this perspective is somehow entirely absent from any public discussion that I can see and therefore I can no longer keep quiet on it.

Let me state from the start that I have no stake in BitBet at all. For full disclosure: I had a few shares bought in the very beginning and I sold those quite some time ago. I bought them because I saw (and still do) huge potential in the underlying idea of BitBet. I sold them when I realised that the infrastructure that BitBet needs to thrive is simply not there1.

The perspective I have to write here as best I can will not go into technical details at all. First, such technical details have been discussed to death in the #b-a logs by people more knowledgeable on this matter than I am. Second, I truly do not consider that I know enough of these technical matters to discuss them at this stage. On such matters, I specifically defer to people such as asciilifeform and mod6. Third, I don't think that they are truly relevant to what I have to say, seeing how the discussion really focused in the end on Mircea Popescu's call on the matter rather than on any of the technical issues involved.

A very short history of the issue here: the betting site owned and run by Matic Kocevar (Kakobrekla) and Mircea Popescu entered into receivership as a result of a irreconcilable difference between the two owners. This difference became apparent over the handling of an incident that started off as a significant delay in the processing of one of BitBet's payment transactions. Mircea Popescu detailed his interpretation and handling of this incident in A Miner Problem. Both his interpretation and his handling have then been discussed in the comments section in Qntra and in the #b-a logs, with people mainly disagreeing on his interpretation of the result as evidence of a miner cartel. After he published the BitBet statement, the discussion focused almost exclusively on the 17BTC lost as a result of the incident and included in the statement as BitBet's loss. Essentially, on one side Mircea Popescu stated that the funds were lost by BitBet and therefore rightfully a business loss, while Kakobrekla stated that they were lost as a result of a mistake made by Mircea Popescu and therefore his own personal loss (or a loss that is to be covered by him). This difference of perspective proved deep enough to cause BitBet to go into receivership and to cause subsequently what seems like a split of people previously in #b-a (known as members of tmsr) and currently in #b-a and/or #trilema.

It's this last split that brought to light very clearly the fact that the issue is truly about the people involved and not at all about any of the technical issues or even the BitBet incident in itself. The BitBet incident was the trigger only. A trigger that proved to be attached to quite a bigger gun than initially thought perhaps, but what difference does that make anyway. In all this however, some misunderstandings seem to persist or are allowed to persist. Compare and contrast those two snippets from the #b-a logs:

On 2 March 2016, Mircea Popescu gives a brief statement of his reasons for his handling of the BitBet incident:

17:16:59 mircea_popescu: so that the problem can be fully exposed, in detailed, solid fact, so as to be handwaved by people.

17:17:04 mircea_popescu: i'm a masochist like that.

On 28 March 2016 phf and kakobrekla frame the discussion again as one of handling competency, while making reference directly to the statement above:

19:14:13 phf: but more importantly to a hypothetical court trial is how much knowledge mp had about this topography, so that way we can say whether or not his call was competent or not

19:14:42 kakobrekla: phf by his own admission hi call was 'masochistic' (but later billed sadistically)

This last part continues into a discussion of what Mircea Popescu actually meant by that statement that he was "a masochist like that." While each of those involved has his own interpretation of it, none of those interpretations seems to me to actually hit the nail on the head2. And it's a rather important nail seeing how all the discussion in #b-a keeps coming back to it.

In my semi-detached, silent-observer view of the whole matter, the BitBet incident was essentially a case of shooting the messenger for bringing up the unpleasant news in such a terrible, hurting manner. And at this stage one can say that both Mircea Popescu and BitBet were unwanted messengers, except that the first is way more difficult to shoot and he clearly shoots back too. The initial incident exposed a significant problem for BitBet first of all and as such one for BitBet to deal with and solve. The masochistic trait of Mircea Popescu in this has nothing to do with losing some BTCs or the like: it has to do with his deliberate choice to bring the bad news in such a way so that people won't ignore it although he quite knows beforehand that they will still do all they can to actually wave it away. In his own words: "so that the problem can be fully exposed, in detailed, solid fact, so as to be handwaved by people." So yes, he expected the double payment to happen, but that was at the same time the only opportunity to get full evidence of a significant problem for BitBet.

The 17BTC in this context was the price BitBet paid to ascertain the extent of the problem and to obtain clear and unavoidable proof of it, forcing it to light in a way which can't be denied in any form (and indeed, post-incident, there hasn't been any denial of the fact that yes, BitBet has a problem). However, despite the acknowledgement of this problem and of its importance for BitBet, the discussion solely focused on the 17BTC in the way of: oh, but they needn't have been lost by BitBet! One has to stop first and consider: by whom should those BTC be lost then exactly to still have the problem exposed? The answer apparently given so far in the #b-a logs is: by Mircea Popescu! Presumably because he insists on exposing the problem - the messenger deserves at least a few lashes for insisting to make the bad news heard, doesn't he?

There is also the opinion that the 17BTC shouldn't have been paid a second time, given the clearly obvious and highly probable result of a double payment and therefore a loss. The question never asked on this is: how clear would the problem have been then? What proof would there be and of what exactly? What measures would have been taken and what value would they have on such shaky grounds?

At the end of the day, I see this as a clash of two approaches that are indeed irreconcilable: either expose rot as early and clearly as possible, at all costs and settling for nothing less than full eradication or otherwise mend and make do, working around the issues as best one can, minimising costs. I must say that I don't really condemn either - people afford what they afford and make their choices accordingly.

I want however to make it as clear as I can that this is the choice being made, the choice that killed BitBet, the choice that split tmsr. Your choice to make at every turn, too.

  1. I include people in "infrastructure" - call me names for this if you need another reason to do it anyway. 

  2. Nobody goes to just ask Mircea Popescu what he meant by that, either. 

July 17, 2014

"Get one just like bitcoin people"

Filed under: TMSR — Diana Coman @ 1:49 p.m.

When too much text is too much text, what do I do to get to read it? Why, get a dump of all data1, throw some automated analysis at it and have the lulz quite guaranteed2. No better test to see text mining fail, it seems, than applying it to irc logs on bitcoin-assets: a careful calibration of state of the art tools3 yielded only a clear case of "by the time you figure out and implement everything needed to obtain even reasonable results, you surely did the "automated" work at least 5 times if not 10, if not 100." Not that it was totally unexpected, of course, but still, given the enthusiasm of text mining people (or possibly just that of text mining people I know), I'd have expected at the very minimum some more robust convo splitting and/or term extraction, with a bit of help4. Not a chance: the results are better even if I split for convos based on the delay between lines (and that's one rough way to do it for sure).

As for extracting key terms, the main result that can be offered is that text mining can find by itself only terms that one has no interest in, or at least not on btc-assets: it did manage to find "BTC" as an important term (go figure) and that was about it all. How terribly useful and incredibly surprising, isn't it? Still, after a bit more fiddling around, it turns out that there is a bit of fun to get out of it. Here's a pretty picture with main "key words" for the logs of May 2014. It makes for good candidate captions such as "never really need to tell," "get one just like bitcoin people" or "mircea_popescu can like just bitcoin people." Real bits of wisdom there, aren't they?

Wordcloud for bitcoin-assets logs from May 2014 Wordcloud for bitcoin-assets logs from May 2014

Still, data is data and text is no exception, even if spewed forth at incredible rates day and night by a bunch of bitcoiners (and the occasionally lost newbie) on an irc log. Hence, back to more basic tools and trusted numbers, via R. And at least I got some pretty pictures!

Easiest thing to find out: who's most active? Top 10 contributers (as number of lines rather than number of words) seem to be quite the same, whether it's the whole period considered or just a month. However, the contributions follow (of course) a power law distribution, meaning that there are a few users who contribute a lot to the discussion and many users who contribute very little5 There is also quite a sharp decline at the top, with mircea_popescu contributing around 20% of the discussion and the next (ThickAsThieves overall or fluffypony in May 2014) barely contributing around 8% and 10% respectively. Here are some charts and lists (I excluded assbot, gribble and ozbot):

Percentage of lines contributed by distinct nicknames on bitcoin-assets logs between 26 March 2013 and 12 June 2014. Percentage of lines contributed by distinct nicknames on bitcoin-assets logs between 26 March 2013 and 12 June 2014.


Percentage of lines contributed by individual nicknames on bitcoin-assets in May 2014. Percentage of lines contributed by individual nicknames on bitcoin-assets in May 2014.


Top 10 contributers on bitcoin-assets between April 2013 and June 2014. Top 10 contributers on bitcoin-assets between 26 April 2013 and 12 June 2014.

Top 10 contributers on bitcoin-assets in May 2014.
Top 10 contributers on bitcoin-assets in May 2014.


Top 10 contributors overall (total number of words) Top 10 contributors overall (total number of words)

Top 10 contributors in May 2014 (total number of words) Top 10 contributors in May 2014 (total number of words)

Mean number of words per line for top 10 contributors overall Mean number of words per line for top 10 contributors overall

Mean number of words per line for top 10 contributors overall Mean number of words per line for top 10 contributors in May

Top 10 overall (mean number of words per line) Top 10 overall (mean number of words per line) Top 10 in May 2014 (mean number of words per line) Top 10 in May 2014 (mean number of words per line)
Mean number of words per line for top 100 contributors (up to at least 12 words per line) Mean number of words per line for top 100 contributors (up to at least 12 words per line) Mean number of words per line for top 100 contributors (limited to those with at least 12 words per line) Mean number of words per line for top 100 contributors in May 2014 (limited to those with at least 12 words per line)


Why is the above interesting? Mainly because it gives the newcomer one reasonable way to start figuring out the whole mess that is otherwise dumped on her head if taking the logs as a whole. Instead of trying to go through all the logs, set a threshold and start by filtering the logs to show first only the contributions of top people, as they are most likely to actually lead the discussions anyway. Considering how fast the percentage of contribution decays, I'd say that taking the first 10 contributors is quite a reasonable option, but for those in a hurry, it will probably do to select even just the first 5. This would reduce the logs effectively by more than half, with minimum chances of truly missing anything really important.

Then again, you could also just hang around in the chan there and the important will not miss you I guess.

  1. Thanks to kakobrekla

  2. And as a side result, I also get to actually read the logs, which was the point in the first place anyway, as I'd much rather read them for the purpose of designing some kind of tool to extract info out of them in the future than just...you know, read them. But that's just me. 

  3. think GATE plus all the cool plugins that can be used with it, as well as some custom-made JAPE grammars for the task at hand. 

  4. To be fair, it probably can be done, but with a TON of help rather than a bit and it kind of defeats the purpose from my point of view right now. Sure, after knowing the logs inside-out and building a good ontology for them and then defining and testing and polishing the rules until they shine, you might be able to get something kind of reasonable from the machine too, but by that time you'd probably get something kind of reasonable on the matter even from your dog. 

  5. or nothing at all, but I did not count those users here. 

Theme and content by Diana Coman