Is success of Goldman Sachs in HFT and quant development with Slang/SecDB no different than Erlang/O

Bryan Downing
Jan 12, 2012
11 min read

Is success of Goldman Sachs in HFT and quant development with Slang/SecDB no different than Erlang/Oracle Berkeley DB? Just wondering….

Best descriptions of what Goldman Sachs Slang and SecDB is as a proprietary quant development programming language?

Note my thoughts WAAAAAAAAY at the bottom of this.

http://stackoverflow.com/questions/3392636/slang-goldman-sachs-proprietary-programming-language

At the risk of getting sued, let me throw you geeks a bone and part the Goldman veil a bit. The Goldman Sachs risk system is called SecDB (securities database), and everything at Goldman that matters is run out of it. The GUI itself looks like a settings screen from DOS 3.0, but no one cares about UI cosmetics on the Street. The language itself was called SLANG (securities language) and was a Python/Perl like thing, with OOP and the ORM layer baked in. Database replication was near-instant, and pushing to production was two keystrokes. You pushed, and London and Tokyo saw the change as fast as your neighbor on the desk did (and yes, if you fucked things up, you got 4AM phone calls from some British dude telling you to fix it). Regtests ran nightly, and no one could trade a model without thorough testing (that might sound like standard practice, but you have no idea how primitive the development culture is on the Street). The whole thing was so good, I didn’t even know what an ORM really was until I started using Rails and had to wrestle with ActiveRecord. The codebase was roughly 15MM lines when I left, and growing. I suspect my retinas are still scarred by the weird color blue SecDB was by default. ”

The hacker news also got this commentary:

“I’ve had a number of people tell me this system is why GS won the financial crisis. During the financial crisis, GS knew their positions and their risks. They could also calculate the side effects of proposed trades as quickly as their computers could calculate it. This meant the people at the top could actively plan what to do next during the day. In contrast, MS and JPM can only get information like this a few hours after the end of the day, and supposedly Citi just can’t calculate such things without massive effort.” (yummyfajitas)

I’ve had a number of people tell me this system is why GS won the financial crisis. During the financial crisis, GS knew their positions and their risks. They could also calculate the side effects of proposed trades as quickly as their computers could calculate it. This meant the people at the top could actively plan what to do next during the day. In contrast, MS and JPM can only get information like this a few hours after the end of the day, and supposedly Citi just can’t calculate such things without massive effort.” (yummyfajitas)

A good one of an ex Goldman Sachs employee:

http://adgrok.com/why-founding-a-three-person-startup-with-zero-revenue-is-better-than-working-for-goldman-sachs/

Read the first two paragraphs here

http://news.ycombinator.com/item?id=1690001

Where did these ones go at Wilmott?

http://www.wilmott.com/messageview.cfm?catid=16&threadid=34038

http://www.wilmott.com/messageview.cfm?catid=16&threadid=59857

Major news http://www.cnbc.com/id/38584613

http://zerohedge.blogspot.com/2009/07/is-case-of-quant-trading-industrial.html

This could be the best description yet of Slang from the Zerohedge.copm link

Lead development of a distributed real-time co-located high-frequency trading (HFT) platform.The main objective was to engineer a very low latency (microseconds) event-driven market data processing, strategy, and order submission engine. The system was obtaining multicast market data from Nasdaq, Arca/NYSE, CME and running trading algorithms with low latency requirements responsive to changes in market conditions. • Implemented a real-time monitoring solution for the distributed trading system using a combination of technologies (SNMP, Erlang/OTP, boost, ACE, TibcoRV, real-time distributed replicated database, etc) to monitor load and health of trading processes in the mother-ship and co-located sites so that trading decisions can be prioritized based on congestion and queuing delays. • Responsible for development of real-time market feed handlers, order processing engines and trading tools at a Quantitative Equity Trading revenue-making HFT desk.

Also http://www.zerohedge.com/article/goldman-sachs-principal-transactions-update-1-billion-shares-0

And of course PDFs of internal at ZDNET

http://www.secdb.com/

Risk Analysis and Stress Testing: SecDB, an

enterprisewide database and pricing system created by

Goldman Sachs.

Load Balancing: GridServer 4.0, Data Synapse.

Manages risk calculations over server far

…

y. It used computer models

of its own creation and built sophisticated databases to

follow the money at risk and the organizations behind the

entities they did business with. It invested in the human

capital to analyze the data, communicate the risks, and

act accordingly. And when applying extreme scenarios to

analyze risks that might face its investments in housingrelated securities

Trading desk managers are the ﬁ rst line of defense,

responsible for acting within prescribed limits set

by the committees in the Securities and Investment

Management Divisions. These limits are set by

committees after using various risk analysis

techniques.

These techniques include stress tests and scenario

analyses that try to assess up front what could go

wrong with any signiﬁ cant position – such as putting

billions of dollars into housing stock derivatives. The

divisional committees also set limits on how much

“value” the company can put at risk each trading day…

The trick is to throw variables at the system that aren’t

highly likely, but could have devastating impact if they

occur—and do it before your competition does…] SecDB , which is used to evaluate

risks in everything from currency

and commodities trading to stocks

and bonds, does not ignore potential

catastrophes nor is it the only tool

Goldman uses. The company also

has built up relational databases that

help it assess who it is doing business

with and allow it to act on dangers

quickly. The company also maintains

systematic sets of checks and balances

in its own organizational structure

to limit risks (see “Checks, balances

and building lines of defense,” page

7). “…Executives rely on three databases to help identify where

risks might lie with its counterparties. The Product

Master database keeps track of every security sold; the

Account Master keeps track of each individual or corporate

customer served; and an Entity Master database ties the

two together in a search for potentially hidden risks.

You don’t want to get involved with parties whose strength

you can’t judge, Sungard’s Rowe asserts.

The Entity Master, developed in the 1990s, was designed

to keep track of who owns what. Goldman Sachs, in a bid

to break into United Kingdom markets, had picked up, as a

breakthrough client, of sorts, the British newspaper mogul,

Robert Maxwell. PAGE 10 REVEALS THE STRATEGY USED FOR CDO ASSESSMENT

From http://news.ycombinator.com/item?id=1690001

forreca 484 days ago | link Yes. Old = good. New = bad. Tech in finance is about efficiency. Everything else does not matter. The GUIs are usually atrocious. Traders don’t care, as long as they make money. Excel is abused beyond belief and there is a whole cottage industry around Excel plug-ins. A good VBA programmer can command a salary as large as a C++ hot-shot. Sad.

Back to http://zerohedge.blogspot.com/2009/07/is-case-of-quant-trading-industrial.html

Thoughts from a software engineer and high frequency alg trader (self employed – first time poster):

1) Unless someone who works at GS can shed light on the system controls in place – such as whether there is physical access to any USB ports on machines, or CD-RW drivers, it does not strike me that his method of stealing the source code was particularly bad. Yes – he got caught, but if he didn’t have physical access to write the data to any form of physical storage he could smuggle out of the building, encrypting the data and uploading it via HTTPS (FTP was blocked according to the report) seems like a reasonable route. I find it odd only that GS just recently started logging large HTTPS transfers (both in that this seems like an obvious avenue of theft, as well as the coincidental nature of the timing).

2) He knew enough to delete his .bash_history file, but not enough about any audit trail left behind. If the folks at GS are sneaky (well, we know that they are sneaky in some respects), they could have modified their own version of bash so that a history file was stashed someplace not accessible to the end user, and not disclosed via the HISTFILE variable. I’ve done this once before – it doesn’t take more than a few hours of programming work. If GS was not that smart and they were using a stock version of bash – well, then he just doesn’t know as much as he needed to in order to get away with this.

3) 34MB is a hell of a lot of source code. Just for the back of envelope, lets say that the compression (he probably created a tar file and then gzip’ed it) and the encryption offset each other (encrypted data inflates in size), and that the size of the data in question was the compressed and then encrypted tar ball… I just looked at some of my own code, and if my lines of software are about as long as the average GS line of software (and if my compression + encryption cancel out size estimate is right – probably way way too conservative), then we are looking at 850K lines of code (and potentially much more depending on the encryption re-inflation rate). Either there were other things in there besides source code (object code, libraries already compiled or executables), or we are talking about a massive massive massive theft. 850K lines of software if it were that much, could constitute the entire mother-load. This would be far from just a single trading strategy (again if it were just source code). It’s hard to imagine that GS would keep all of their software projects together in such a way that 1 software engineer could get at the other projects he was not working on, but it is certainly possible. I think a more likely scenario is that he got some non-source code files in there as well.

4) There is nothing sinister about placing oneself at the closest point to the trading venues as one can. Literally anyone with enough money to pay for the rental space can do so (it aint cheap). There are brokers that allow one to place machines in their rack space co-located with the exchanges. So, nothing sinister about that (plenty in general about GS perhaps, but nothing about their co-location), and at the same time – someone with good co-location space and their software could eat part of their lunch.

5) I don’t think anyone else stealing GS code would be able to in 1 month understood it well enough to integrate with their own setup and get it working. Even if it were only a few thousand lines of software, it is likely dense enough to require months of work to understand it, and to then adapt it. Off the shelf it would not likely work. Each setup is different enough to require some reengineering. Different vendors are used to get different data feeds, different types of data encapsulation, different servers, server tuning would need to be done. I don’t think even with an all hands on board approach (which would not be likely given the risk of being caught) could another firm get the software by June 1 and be running it effectively in July.

6) Suppose that I am wronge about #5 – it would not explain a lack of volume in the last week+. Rather, if someone had already deployed their stolen GS trading software, one would expect to see 2 entities trying to create the same trades at roughly the same time, such that trade volume would not change, but the quote data coming from the various exchanges would actually increase above and beyond baseline (since now there would be 2 firms trying to place the same trades, and assuming they were equally fast – they would place their limit orders at the same time, hit the exchange at the same time, and then we’d see double updates to the bids and asks on the various ECNs as one of the 2 just edged out the other by a microsecond. In practice, we have seen not only a drop in trading volume, but a drop in the volume of bid/ask updates hitting the various ECNs.

7) If anything, I would guess that GS would stop their trading so that they could see if anyone was placing the same trades that they would if they were not cowering in the back office. Were I in their shoes, I would run the software to output to a log the trades that I would have wanted to place, and I then would look for a high correlation of someone else placing them. This would be easier done by not actually placing the trades, since if GS assumes that they are generally among the fastest, their systems running would potentially interfere with their ability to find someone else trying to run their strategies. #5 – Once a strategy is compromised, it immediately stops being useful to it’s owners. Although it may still work, you will never know when it stops being an asset and starts becoming a liability.

—

think you are right about #5. This codebase is largely useless, except for some signals/strategies, which could potentially save time, given that the new users understand the code structure perfectly. These signals/strategies are not worth much. The strategy without highly skilled quants becomes useless in a matter of weeks. Quant code takes forever to adopt before you can trade with it. I also believe he is a patsy, and may be guilty of some unethical and stupid conduct at worst. The fact that the domestic equity market is an insider game manipulated and dominated by a 5 or so players is the problem. There is too much Goldman lately and they’re setting this poor schmuck up to cover up for something they did.

If he was the main algorithm architect, he would not need or want it saved or transmitted in such a complicated, detailed form as a combination of C++ source and [absent] supporting system components. That would be very distracting, hard to ‘read’, and even incomplete.

FYI I worked at GS as a dev for a couple years…

1) Many devs don’t have local admin access for their XP desktops. My guess is that he didn’t bother to try to get it and instead was lazy and just uploaded through HTTPS thinking that would be enough since it was encrypted.

2) It could have just been they restored from the NETApps filers a snapshot of his home dir. Possibly they have more advanced logging tools now in their standard linux desktops.

3) GS is a big company. Most of the groups have their own version control systems and the average sysadmin has no idea what is sensitive or is not.

5) GS trading and settlement platform and ecosystem is enormous. Moreover there are lots of dependencies on systems that are basically totally proprietary. Recreating and understanding that is not trivial.

I would guess that the dev just wanted to get the source so that he could refer to it if he wanted to write a new system for (or be the lead designer for) his new employer. Dropping the code directly isn’t too feasible anyways since his other employer has their own infrastucture etc….

My .02 cents,

Good analysis. I just read the attached document and this looks to me like an inexperienced programmer trying to take his code with him when he left the company. In my opinion, the website in question is not part of an international conspiracy, but rather a file sharing service like Rapidshare which fits the bill (it’s a German site… not sure about the owner).

Everything this guy did tells me he didn’t in fact plan a major espionage but rather he’s dumb, naive and not very good with computers: Packing 34MB of source, attempting to delete the bash history file (there are much better ways to erase your traces and this isn’t one), using a commonly known and usually monitored protocol like https to send the data to the server — he could have easily used a UDP based protocol and mask this transfer like it’s a video or online radio traffic. And finally, he waived his Miranda rights and talked to the investigators without a lawyer (as far as I can tell).

So move on guys, there’s nothing to see here.

BTW, I would LOVE to know which brain-dead company agreed to pay this guy over a million bucks a year to work for them. I would also LOVE to short that company. They’re hiring/talent evaluation process is pathetic!

—

Obviously, the real-time component(s) couldn’t have been written in Ruby or whatever other JIT paradigm is out there.

However, I can’t exclude the possibility of other, non-RT critical parts being written in such language.

—

I think I saw the code on bit torrent. Just google “GS_supersecret_HFT_code_that_I_stole.tor” by “l33t haXOr”.

I worked on gas pipeline scheduling and billing systems, which are just barely real-time. Most of the guts of the system live in batch processes that proceed according to NAESB rules and guidelines. You can build a decent system in VB forms for data entry, with some backend TSQL/PLSQL for bulk stuff.

In contrast a fast real-time trading system can gain a significant edge by trading the wrinkles. The expertise is writing reliable code in parallel/functional languages such as Erlang is not common and does not come cheap. I believe ZH ran a piece where it was noted that even saving a couple of network hops by colocating the HFT boxes in the NYSE could give an edge to GS.

A memory based Erlang trading engine is probably worth a bundle.

My thoughts:

So is it safe to say that this Slang is the equivelant of an in memory Erlang system as mentioned above. And the SEcDB is something like a Berkeley in memory database aswell. Hmmmmm…….

#successmGoldmanSachs #QuantDevelopment #HFT #db #Erlang #SecDB #Berkeley #Slang #Oracle

Get auto trading tips and tricks from our experts. Join our newsletter now

Is success of Goldman Sachs in HFT and quant development with Slang/SecDB no different than Erlang/O

Recent Posts

Comments

Quantlabs.net

Webinars