I have always loved programming - its like Lego without gravity.

Basic on my ZX81 graduating to assembler and Turbo Pascal during my teens.

Developed phone OS software - engineer, architect, product manager - but got made irrelevant by the iPhone and redundant by Android.

These days I mostly work with data, big data and fitting big data onto small boxes.

Ruby's Principle of Too Much Power

Previously: Why is Ruby bad security?

TL;DR

  • Ruby libraries need to stop being too powerful.
  • Programmers need to understand the libraries they are using.

So, lets recap on the recent Ruby on Rails security woes.  For some non-obvious reason, Ruby on Rails supported YAML form parameters.

And it turned out that the Ruby YAML parser can … instantiate complex objects.

Why can the Ruby YAML parser instantiate complex objects, and why did the YAML author not see this as a problem?

So we swatted that bug; we stopped Rails allowing YAML.

Only it then turns out that Rails also supported XML form parameters.

Now there are lots of nasty attacks against XML parsers like recursive entities and external entities and ways of DOSing the server; but that’s no different from attacking the hash-table used to store form parameters, I guess.

The nasty thing about the Ruby XML parser is that it can embed YAML.

Why can the XML parser embed YAML?

Lets swat that vulnerability too.  Lets make sure that Rails only supports HTML form parameters.  On, and JSON.

Only it then turns out that the Ruby JSON parser … yes, you’ve guessed it … the Ruby JSON parser can instantiate complex objects too.

I mean, come on, Javascript Simple Object Notation?  Not in Ruby it ain’t.

The idea that its a Rails-only problem really falls down at this point; that Rails supported YAML and XML forms is, well, obviously crazy.  That Rails supports JSON forms is obviously expectable.  Its really a bigger Ruby problem in how the engineering approach across the Ruby community differs from the more classic approaches seen even in similar-placed languages like Python.

Lisp’s boast is that Code is Data.  Ruby accidentally and regrettably allowed Data to become Code.  Oops.

The principle of least power

There’s a good, old-fashioned engineering principle called the principle of least power.  As I see it, its not just for data formats (like JSON, XML and YAML) but also a mindset for developing things.

Don’t take a detour to facilitate future expansion.  Don’t build special-case logic into a general-purpose library.  Compose complex systems from simple libraries instead of large, opaque libraries bristling with special-case support.

I’ve said its a Ruby psyche thing.  Others blame Rubyist inexperience:

[Will] says that “The aim is to be so declarative, so high-level as to no longer see nor understand what is happening beneath and before.” What really happens is that most people that work with ruby are simply not educated enough to understand what is happening “beneath and before.” Ruby tries to go the extra step to make programming seem closer to a normal language. This means that more people will be able to become sufficiently proficient in it to think they actually understand what they are actually doing, and will proceed to try their hand at writing complex systems. Unfortunately without some very solid grounding in much lower level languages, and language design in general, that apparent proficiency is just an illusion.

In other words I wouldn’t necessarily blame ruby for bad code, any more than I would blame potassium nitrate for facilitating gun crime. Both are powerful tools that are useful in the right situations, and extremely dangerous in others. People simply need to learn not to take this power for granted. This is particularly true in the rails ecosystem, which is based around using gem upon gem upon gem without really understanding what most of them do

Perhaps it is inexperience?  You and I might never dream of building in ways to escape from formats like XML to enable the embedding of YAML or that the JSON library should be able to serialise/deserialise complex objects by default and we’d say that was because we were experienced.

But we will agree the whole Rubyist lets-bang-gems-you-couldn’t-write-together mindset has to go.  Programmers should know what the libraries they use do.  And those libraries should do what you expect them to do and nothing more.  There’s a whole principle of least surprise in library functionality.

And always pay double-attention to deserialising data into code; never build that into the deserialisation library, always build that into a library that sits on top of the deserialisation library.

I think the whole XML-embedded YAML - and JSON - vulnerability is under-explored.  These are data interchange formats and supporting arbitrary object instantiating in them is mind-bendingly wrong.  There must be a whole host of applications that process externally-received XML and JSON and, just like the rubygems YAML exploit, some Ruby app somewhere is deserialising XML or JSON that’s been stored in a database right now and … bang!

jump to ↓



performance
Faster searches with non-prefix fields in composite indices
Compressing MySQL databases
What highscalability.com says about Scaling my Server
Scaling my Server: follow-up
old classics
The kid's computer
Making the History of Worlds Religions map
If you defend those involved in the OpenGL ES specification, you are an idiot
Stackoverflow unwinding?
general
Why Swift?
Python annotations and type checking
pycon 2014 Sweden: the bad bits
Table-based Template Translation in C++
recreation
games programming
Perlin Noise
Perlin Noise
Drawing RTS maps fast
WillCity update
ludum-dare
Ludum Dare #35 Mosaic
LudumDare 33 wallpapers
SSIM vs MSE for Mosaics
Ludum Dare 30 results are in!