Is success of Goldman Sachs in HFT and quant development with Slang/SecDB no different than Erlang/Oracle Berkeley DB? Just wondering….
Best descriptions of what Goldman Sachs Slang and SecDB is as a proprietary quant development programming language?
Note my thoughts WAAAAAAAAY at the bottom of this.
At the risk of getting sued, let me throw you geeks a bone and part the Goldman veil a bit. The Goldman Sachs risk system is called SecDB (securities database), and everything at Goldman that matters is run out of it. The GUI itself looks like a settings screen from DOS 3.0, but no one cares about UI cosmetics on the Street. The language itself was called SLANG (securities language) and was a Python/Perl like thing, with OOP and the ORM layer baked in. Database replication was near-instant, and pushing to production was two keystrokes. You pushed, and London and Tokyo saw the change as fast as your neighbor on the desk did (and yes, if you fucked things up, you got 4AM phone calls from some British dude telling you to fix it). Regtests ran nightly, and no one could trade a model without thorough testing (that might sound like standard practice, but you have no idea how primitive the development culture is on the Street). The whole thing was so good, I didn’t even know what an ORM really was until I started using Rails and had to wrestle with ActiveRecord. The codebase was roughly 15MM lines when I left, and growing. I suspect my retinas are still scarred by the weird color blue SecDB was by default. ”
The hacker news also got this commentary:
“I’ve had a number of people tell me this system is why GS won the financial crisis. During the financial crisis, GS knew their positions and their risks. They could also calculate the side effects of proposed trades as quickly as their computers could calculate it. This meant the people at the top could actively plan what to do next during the day. In contrast, MS and JPM can only get information like this a few hours after the end of the day, and supposedly Citi just can’t calculate such things without massive effort.” (yummyfajitas)
I’ve had a number of people tell me this system is why GS won the financial crisis. During the financial crisis, GS knew their positions and their risks. They could also calculate the side effects of proposed trades as quickly as their computers could calculate it. This meant the people at the top could actively plan what to do next during the day. In contrast, MS and JPM can only get information like this a few hours after the end of the day, and supposedly Citi just can’t calculate such things without massive effort.” (yummyfajitas)
A good one of an ex Goldman Sachs employee:
Read the first two paragraphs here
Where did these ones go at Wilmott?
Major news http://www.cnbc.com/id/38584613
This could be the best description yet of Slang from the Zerohedge.copm link
Lead development of a distributed real-time co-located high-frequency trading (HFT) platform.The main objective was to engineer a very low latency (microseconds) event-driven market data processing, strategy, and order submission engine. The system was obtaining multicast market data from Nasdaq, Arca/NYSE, CME and running trading algorithms with low latency requirements responsive to changes in market conditions. • Implemented a real-time monitoring solution for the distributed trading system using a combination of technologies (SNMP, Erlang/OTP, boost, ACE, TibcoRV, real-time distributed replicated database, etc) to monitor load and health of trading processes in the mother-ship and co-located sites so that trading decisions can be prioritized based on congestion and queuing delays. • Responsible for development of real-time market feed handlers, order processing engines and trading tools at a Quantitative Equity Trading revenue-making HFT desk.
Also http://www.zerohedge.com/article/goldman-sachs-principal-transactions-update-1-billion-shares-0
And of course PDFs of internal at ZDNET
Risk Analysis and Stress Testing: SecDB, an
enterprisewide database and pricing system created by
Goldman Sachs.
Load Balancing: GridServer 4.0, Data Synapse.
Manages risk calculations over server far
…
y. It used computer models
of its own creation and built sophisticated databases to
follow the money at risk and the organizations behind the
entities they did business with. It invested in the human
capital to analyze the data, communicate the risks, and
act accordingly. And when applying extreme scenarios to
analyze risks that might face its investments in housingrelated securities
..
Trading desk managers are the fi rst line of defense,
responsible for acting within prescribed limits set
by the committees in the Securities and Investment
Management Divisions. These limits are set by
committees after using various risk analysis
techniques.
These techniques include stress tests and scenario
analyses that try to assess up front what could go
wrong with any signifi cant position – such as putting
billions of dollars into housing stock derivatives. The
divisional committees also set limits on how much
“value” the company can put at risk each trading day…
The trick is to throw variables at the system that aren’t
highly likely, but could have devastating impact if they
occur—and do it before your competition does…] SecDB , which is used to evaluate
risks in everything from currency
and commodities trading to stocks
and bonds, does not ignore potential
catastrophes nor is it the only tool
Goldman uses. The company also
has built up relational databases that
help it assess who it is doing business
with and allow it to act on dangers
quickly. The company also maintains
systematic sets of checks and balances
in its own organizational structure
to limit risks (see “Checks, balances
and building lines of defense,” page
7). “…Executives rely on three databases to help identify where
risks might lie with its counterparties. The Product
Master database keeps track of every security sold; the
Account Master keeps track of each individual or corporate
customer served; and an Entity Master database ties the
two together in a search for potentially hidden risks.
You don’t want to get involved with parties whose strength
you can’t judge, Sungard’s Rowe asserts.
The Entity Master, developed in the 1990s, was designed
to keep track of who owns what. Goldman Sachs, in a bid
to break into United Kingdom markets, had picked up, as a
breakthrough client, of sorts, the British newspaper mogul,
Robert Maxwell. PAGE 10 REVEALS THE STRATEGY USED FOR CDO ASSESSMENT
forreca 484 days ago | link Yes. Old = good. New = bad. Tech in finance is about efficiency. Everything else does not matter. The GUIs are usually atrocious. Traders don’t care, as long as they make money. Excel is abused beyond belief and there is a whole cottage industry around Excel plug-ins. A good VBA programmer can command a salary as large as a C++ hot-shot. Sad.
Thoughts from a software engineer and high frequency alg trader (self employed – first time poster):
1) Unless someone who works at GS can shed light on the system controls in place – such as whether there is physical access to any USB ports on machines, or CD-RW drivers, it does not strike me that his method of stealing the source code was particularly bad. Yes – he got caught, but if he didn’t have physical access to write the data to any form of physical storage he could smuggle out of the building, encrypting the data and uploading it via HTTPS (FTP was blocked according to the report) seems like a reasonable route. I find it odd only that GS just recently started logging large HTTPS transfers (both in that this seems like an obvious avenue of theft, as well as the coincidental nature of the timing).
2) He knew enough to delete his .bash_history file, but not enough about any audit trail left behind. If the folks at GS are sneaky (well, we know that they are sneaky in some respects), they could have modified their own version of bash so that a history file was stashed someplace not accessible to the end user, and not disclosed via the HISTFILE variable. I’ve done this once before – it doesn’t take more than a few hours of programming work. If GS was not that smart and they were using a stock version of bash – well, then he just doesn’t know as much as he needed to in order to get away with this.
3) 34MB is a hell of a lot of source code. Just for the back of envelope, lets say that the compression (he probably created a tar file and then gzip’ed it) and the encryption offset each other (encrypted data inflates in size), and that the size of the data in question was the compressed and then encrypted tar ball… I just looked at some of my own code, and if my lines of software are about as long as the average GS line of software (and if my compression + encryption cancel out size estimate is right – probably way way too conservative), then we are looking at 850K lines of code (and potentially much more depending on the encryption re-inflation rate). Either there were other things in there besides source code (object code, libraries already compiled or executables), or we are talking about a massive massive massive theft. 850K lines of software if it were that much, could constitute the entire mother-load. This would be far from just a single trading strategy (again if it were just source code). It’s hard to imagine that GS would keep all of their software projects together in such a way that 1 software engineer could get at the other projects he was not working on, but it is certainly possible. I think a more likely scenario is that he got some non-source code files in there as well.
4) There is nothing sinister about placing oneself at the closest point to the trading venues as one can. Literally anyone with enough money to pay for the rental space can do so (it aint cheap). There are brokers that allow one to place machines in their rack space co-located with the exchanges. So, nothing sinister about that (plenty in general about GS perhaps, but nothing about their co-location), and at the same time – someone with good co-location space and their software could eat part of their lunch.
5) I don’t think anyone else stealing GS code would be able to in 1 month understood it well enough to integrate with their own setup and get it working. Even if it were only a few thousand lines of software, it is likely dense enough to require months of work to understand it, and to then adapt it. Off the shelf it would not likely work. Each setup is different enough to require some reengineering. Different vendors are used to get different data feeds, different types of data encapsulation, different servers, server tuning would need to be done. I don’t think even with an all hands on board approach (which would not be likely given the risk of being caught) could another firm get the software by June 1 and be running it effectively in July.
6) Suppose that I am wronge about #5 – it would not explain a lack of volume in the last week+. Rather, if someone had already deployed their stolen GS trading software, one would expect to see 2 entities trying to create the same trades at roughly the same time, such that trade volume would not change, but the quote data coming from the various exchanges would actually increase above and beyond baseline (since now there would be 2 firms trying to place the same trades, and assuming they were equally fast – they would place their limit orders at the same time, hit the exchange at the same time, and then we’d see double updates to the bids and asks on the various ECNs as one of the 2 just edged out the other by a microsecond. In practice, we have seen not only a drop in trading volume, but a drop in the volume of bid/ask updates hitting the various ECNs.
7) If anything, I would guess that GS would stop their trading so that they could see if anyone was placing the same trades that they would if they were not cowering in the back office. Were I in their shoes, I would run the software to output to a log the trades that I would have wanted to place, and I then would look for a high correlation of someone else placing them. This would be easier done by not actually placing the trades, since if GS assumes that they are generally among the fastest, their systems running would potentially interfere with their ability to find someone else trying to run their strategies. #5 – Once a strategy is compromised, it immediately stops being useful to it’s owners. Although it may still work, you will never know when it stops being an asset and starts becoming a liability.
—
think you are right about #5. This codebase is largely useless, except for some signals/strategies, which could potentially save time, given that the new users understand the code structure perfectly. These signals/strategies are not worth much. The strategy without highly skilled quants becomes useless in a matter of weeks. Quant code takes forever to adopt before you can trade with it. I also believe he is a patsy, and may be guilty of some unethical and stupid conduct at worst. The fact that the domestic equity market is an insider game manipulated and dominated by a 5 or so players is the problem. There is too much Goldman lately and they’re setting this poor schmuck up to cover up for something they did.
..
If he was the main algorithm architect, he would not need or want it saved or transmitted in such a complicated, detailed form as a combination of C++ source and [absent] supporting system components. That would be very distracting, hard to ‘read’, and even incomplete.
FYI I worked at GS as a dev for a couple years…
1) Many devs don’t have local admin access for their XP desktops. My guess is that he didn’t bother to try to get it and instead was lazy and just uploaded through HTTPS thinking that would be enough since it was encrypted.
2) It could have just been they restored from the NETApps filers a snapshot of his home dir. Possibly they have more advanced logging tools now in their standard linux desktops.
3) GS is a big company. Most of the groups have their own version control systems and the average sysadmin has no idea what is sensitive or is not.
5) GS trading and settlement platform and ecosystem is enormous. Moreover there are lots of dependencies on systems that are basically totally proprietary. Recreating and understanding that is not trivial.
I would guess that the dev just wanted to get the source so that he could refer to it if he wanted to write a new system for (or be the lead designer for) his new employer. Dropping the code directly isn’t too feasible anyways since his other employer has their own infrastucture etc….
My .02 cents,
M
Good analysis. I just read the attached document and this looks to me like an inexperienced programmer trying to take his code with him when he left the company. In my opinion, the website in question is not part of an international conspiracy, but rather a file sharing service like Rapidshare which fits the bill (it’s a German site… not sure about the owner).
Everything this guy did tells me he didn’t in fact plan a major espionage but rather he’s dumb, naive and not very good with computers: Packing 34MB of source, attempting to delete the bash history file (there are much better ways to erase your traces and this isn’t one), using a commonly known and usually monitored protocol like https to send the data to the server — he could have easily used a UDP based protocol and mask this transfer like it’s a video or online radio traffic. And finally, he waived his Miranda rights and talked to the investigators without a lawyer (as far as I can tell).
So move on guys, there’s nothing to see here.
BTW, I would LOVE to know which brain-dead company agreed to pay this guy over a million bucks a year to work for them. I would also LOVE to short that company. They’re hiring/talent evaluation process is pathetic!
—
Obviously, the real-time component(s) couldn’t have been written in Ruby or whatever other JIT paradigm is out there.
However, I can’t exclude the possibility of other, non-RT critical parts being written in such language.
—
I think I saw the code on bit torrent. Just google “GS_supersecret_HFT_code_that_I_stole.tor” by “l33t haXOr”.
**
I worked on gas pipeline scheduling and billing systems, which are just barely real-time. Most of the guts of the system live in batch processes that proceed according to NAESB rules and guidelines. You can build a decent system in VB forms for data entry, with some backend TSQL/PLSQL for bulk stuff.
In contrast a fast real-time trading system can gain a significant edge by trading the wrinkles. The expertise is writing reliable code in parallel/functional languages such as Erlang is not common and does not come cheap. I believe ZH ran a piece where it was noted that even saving a couple of network hops by colocating the HFT boxes in the NYSE could give an edge to GS.
A memory based Erlang trading engine is probably worth a bundle.
My thoughts:
So is it safe to say that this Slang is the equivelant of an in memory Erlang system as mentioned above. And the SEcDB is something like a Berkeley in memory database aswell. Hmmmmm…….
Comments