* [gentoo-user] PostgreSQL Vs MySQL @Uber @ 2016-07-29 20:58 Mick 2016-07-29 22:24 ` Alan McKinnon 2016-08-02 17:49 ` Rich Freeman 0 siblings, 2 replies; 27+ messages in thread From: Mick @ 2016-07-29 20:58 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 295 bytes --] Interesting article explaining why Uber are moving away from PostgreSQL. I am running both DBs on different desktop PCs for akonadi and I'm also running MySQL on a number of websites. Let's which one goes sideways first. :p https://eng.uber.com/mysql-migration/ -- Regards, Mick [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber 2016-07-29 20:58 [gentoo-user] PostgreSQL Vs MySQL @Uber Mick @ 2016-07-29 22:24 ` Alan McKinnon 2016-07-29 22:38 ` Rich Freeman 2016-08-02 17:49 ` Rich Freeman 1 sibling, 1 reply; 27+ messages in thread From: Alan McKinnon @ 2016-07-29 22:24 UTC (permalink / raw To: gentoo-user On 29/07/2016 22:58, Mick wrote: > Interesting article explaining why Uber are moving away from PostgreSQL. I am > running both DBs on different desktop PCs for akonadi and I'm also running > MySQL on a number of websites. Let's which one goes sideways first. :p > > https://eng.uber.com/mysql-migration/ > I don't think your akonadi and some web sites compares in any way to Uber and what they do. FWIW, my Dev colleagues support and entire large corporate ISP's operational and customer data on PostgreSQL-9.3. With clustering. With no db-related issues :-) ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber 2016-07-29 22:24 ` Alan McKinnon @ 2016-07-29 22:38 ` Rich Freeman 2016-07-29 23:01 ` Mick 2016-08-01 7:16 ` J. Roeleveld 0 siblings, 2 replies; 27+ messages in thread From: Rich Freeman @ 2016-07-29 22:38 UTC (permalink / raw To: gentoo-user On Sat, Jul 30, 2016 at 6:24 AM, Alan McKinnon <alan.mckinnon@gmail.com> wrote: > On 29/07/2016 22:58, Mick wrote: >> >> Interesting article explaining why Uber are moving away from PostgreSQL. >> I am >> running both DBs on different desktop PCs for akonadi and I'm also running >> MySQL on a number of websites. Let's which one goes sideways first. :p >> >> https://eng.uber.com/mysql-migration/ >> > > > I don't think your akonadi and some web sites compares in any way to Uber > and what they do. > > FWIW, my Dev colleagues support and entire large corporate ISP's operational > and customer data on PostgreSQL-9.3. With clustering. With no db-related > issues :-) > Agree, you'd need to be fairly large-scale to have their issues, but I think the article was something anybody interested in databases should read. If nothing else it is a really easy to follow explanation of the underlying architectures. I'll probably post this to my LUG mailing list. I think one of the Postgres devs lurks there so I'm curious to his impressions. I was a bit surprised to hear about the data corruption bug. I've always considered Postgres to have a better reputation for data integrity. And of course almost any FOSS project could have a bug. I don't know if either project does the kind of regression testing to reliably detect this sort of issue. I'd think that it is more likely that the likes of Oracle would (for their flagship DB (not for MySQL), and they'd probably be more likely to send out an engineer to beg forgiveness while they fix your database). Of course, if you're Uber the hit you'd take from downtime/etc isn't made up for entirely by having somebody take a few days to get everything fixed. -- Rich ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber 2016-07-29 22:38 ` Rich Freeman @ 2016-07-29 23:01 ` Mick 2016-08-01 1:48 ` Douglas J Hunley 2016-08-01 7:16 ` J. Roeleveld 1 sibling, 1 reply; 27+ messages in thread From: Mick @ 2016-07-29 23:01 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 1452 bytes --] On Saturday 30 Jul 2016 06:38:01 Rich Freeman wrote: > On Sat, Jul 30, 2016 at 6:24 AM, Alan McKinnon <alan.mckinnon@gmail.com> wrote: > > On 29/07/2016 22:58, Mick wrote: > >> Interesting article explaining why Uber are moving away from PostgreSQL. > >> I am > >> running both DBs on different desktop PCs for akonadi and I'm also > >> running > >> MySQL on a number of websites. Let's which one goes sideways first. :p > >> > >> https://eng.uber.com/mysql-migration/ > > > > I don't think your akonadi and some web sites compares in any way to Uber > > and what they do. > > > > FWIW, my Dev colleagues support and entire large corporate ISP's > > operational and customer data on PostgreSQL-9.3. With clustering. With no > > db-related issues :-) > > Agree, you'd need to be fairly large-scale to have their issues, but I > think the article was something anybody interested in databases should > read. If nothing else it is a really easy to follow explanation of > the underlying architectures. > > I'll probably post this to my LUG mailing list. I think one of the > Postgres devs lurks there so I'm curious to his impressions. > > I was a bit surprised to hear about the data corruption bug. I've > always considered Postgres to have a better reputation for data > integrity. Yes, same here, I would be interested to hear what the Postgres dev says, should he respond to it. -- Regards, Mick [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber 2016-07-29 23:01 ` Mick @ 2016-08-01 1:48 ` Douglas J Hunley 0 siblings, 0 replies; 27+ messages in thread From: Douglas J Hunley @ 2016-08-01 1:48 UTC (permalink / raw To: Gentoo [-- Attachment #1: Type: text/plain, Size: 420 bytes --] On Fri, Jul 29, 2016 at 7:01 PM, Mick <michaelkintzios@gmail.com> wrote: > Yes, same here, I would be interested to hear what the Postgres dev says, > should he respond to it. > One PostgreSQL dev's response - https://t.co/LfPlIPWulc -- { "name": "douglas j hunley", "email": "doug.hunley@gmail.com", "social": [ { "blog": "https://hunleyd.github.io/", "twitter": "@hunleyd" } ] } [-- Attachment #2: Type: text/html, Size: 1182 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber 2016-07-29 22:38 ` Rich Freeman 2016-07-29 23:01 ` Mick @ 2016-08-01 7:16 ` J. Roeleveld 2016-08-01 13:43 ` james 2016-08-01 15:01 ` Rich Freeman 1 sibling, 2 replies; 27+ messages in thread From: J. Roeleveld @ 2016-08-01 7:16 UTC (permalink / raw To: gentoo-user On Saturday, July 30, 2016 06:38:01 AM Rich Freeman wrote: > On Sat, Jul 30, 2016 at 6:24 AM, Alan McKinnon <alan.mckinnon@gmail.com> wrote: > > On 29/07/2016 22:58, Mick wrote: > >> Interesting article explaining why Uber are moving away from PostgreSQL. > >> I am > >> running both DBs on different desktop PCs for akonadi and I'm also > >> running > >> MySQL on a number of websites. Let's which one goes sideways first. :p > >> > >> https://eng.uber.com/mysql-migration/ > > > > I don't think your akonadi and some web sites compares in any way to Uber > > and what they do. > > > > FWIW, my Dev colleagues support and entire large corporate ISP's > > operational and customer data on PostgreSQL-9.3. With clustering. With no > > db-related issues :-) > > Agree, you'd need to be fairly large-scale to have their issues, And also have to design your database by people who think MySQL actually follows common SQL standards. > but I > think the article was something anybody interested in databases should > read. If nothing else it is a really easy to follow explanation of > the underlying architectures. Check the link posted by Douglas. Ubers article has some misunderstandings about the architecture with conclusions drawn that are, at least also, caused by their database design and usage. > I'll probably post this to my LUG mailing list. I think one of the > Postgres devs lurks there so I'm curious to his impressions. > > I was a bit surprised to hear about the data corruption bug. I've > always considered Postgres to have a better reputation for data > integrity. They do. > And of course almost any FOSS project could have a bug. I > don't know if either project does the kind of regression testing to > reliably detect this sort of issue. Not sure either, I do think PostgreSQL does a lot with regression tests. > I'd think that it is more likely > that the likes of Oracle would (for their flagship DB (not for MySQL), Never worked with Oracle (or other big software vendors), have you? :) > and they'd probably be more likely to send out an engineer to beg > forgiveness while they fix your database). Only if you're a big (as in, spend a lot of money with them) customer. > Of course, if you're Uber > the hit you'd take from downtime/etc isn't made up for entirely by > having somebody take a few days to get everything fixed. -- Joost ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber 2016-08-01 7:16 ` J. Roeleveld @ 2016-08-01 13:43 ` james 2016-08-01 16:49 ` J. Roeleveld 2016-08-11 12:43 ` Douglas J Hunley 2016-08-01 15:01 ` Rich Freeman 1 sibling, 2 replies; 27+ messages in thread From: james @ 2016-08-01 13:43 UTC (permalink / raw To: gentoo-user On 08/01/2016 02:16 AM, J. Roeleveld wrote: > On Saturday, July 30, 2016 06:38:01 AM Rich Freeman wrote: >> On Sat, Jul 30, 2016 at 6:24 AM, Alan McKinnon <alan.mckinnon@gmail.com> > wrote: >>> On 29/07/2016 22:58, Mick wrote: >>>> Interesting article explaining why Uber are moving away from PostgreSQL. >>>> I am >>>> running both DBs on different desktop PCs for akonadi and I'm also >>>> running >>>> MySQL on a number of websites. Let's which one goes sideways first. :p >>>> >>>> https://eng.uber.com/mysql-migration/ >>> >>> I don't think your akonadi and some web sites compares in any way to Uber >>> and what they do. >>> >>> FWIW, my Dev colleagues support and entire large corporate ISP's >>> operational and customer data on PostgreSQL-9.3. With clustering. With no >>> db-related issues :-) >> >> Agree, you'd need to be fairly large-scale to have their issues, > > And also have to design your database by people who think MySQL actually > follows common SQL standards. > >> but I >> think the article was something anybody interested in databases should >> read. If nothing else it is a really easy to follow explanation of >> the underlying architectures. > > Check the link posted by Douglas. > Ubers article has some misunderstandings about the architecture with > conclusions drawn that are, at least also, caused by their database design and > usage. > >> I'll probably post this to my LUG mailing list. I think one of the >> Postgres devs lurks there so I'm curious to his impressions. >> >> I was a bit surprised to hear about the data corruption bug. I've >> always considered Postgres to have a better reputation for data >> integrity. > > They do. > >> And of course almost any FOSS project could have a bug. I >> don't know if either project does the kind of regression testing to >> reliably detect this sort of issue. > > Not sure either, I do think PostgreSQL does a lot with regression tests. > >> I'd think that it is more likely >> that the likes of Oracle would (for their flagship DB (not for MySQL), > > Never worked with Oracle (or other big software vendors), have you? :) > >> and they'd probably be more likely to send out an engineer to beg >> forgiveness while they fix your database). > > Only if you're a big (as in, spend a lot of money with them) customer. > >> Of course, if you're Uber >> the hit you'd take from downtime/etc isn't made up for entirely by >> having somebody take a few days to get everything fixed. > > -- > Joost > > I certainly respect your skills and posts on Databases, Joost, as everything you have posted, in the past is 'spot on'. Granted, I'm no database expert, far from it. But I want to share a few thing with you, and hope you (and others) will 'chime in' on these comments. Way back, when the earth was cooling and we all had dinosaurs for pets, some of us hacked on AT&T "3B2" unix systems. They were know for their 'roll back and recovery', triplicated (or more) transaction processes and 'voters' system to ferret out if a transaction was complete and correct. There was no ACID, the current 'gold standard' if you believe what Douglas and other write about concerning databases. In essence, (from crusted up memories) a basic (SS7) transaction related to the local telephone switch, was ran on 3 machines. The results were compared. If they matched, the transaction went forward as valid. If 2/3 matched, and the switch was was configured, then the code would essentially 'vote' and majority ruled. This is what led to phone calls (switched phone calls) having variable delays, often in the order of seconds, mis-connections and other problems we all encountered during periods of excessive demand. That scenario was at the heart of how old, crappy AT&T unix (SVR?) could perform so well and therefore established the gold standard for RT transaction processing, aka the "five 9s" 99.999% of up-time (about 5 minutes per year of downtime). Sure this part is only related to transaction processing as there was much more to the "five 9s" legacy, but imho, that is the heart of what was the precursor to ACID property's now so greatly espoused in SQL codes that Douglas refers to. Do folks concur or disagree at this point? The reason this is important to me (and others?), is that, if this idea (granted there is much more detail to it) is still valid, then it can form the basis for building up superior-ACID processes, that meet or exceed, the properties of an expensive (think Oracle) transaction process on distributed (parallel) or clustered systems, to a degree of accuracy only limited by the limit of the number of odd numbered voter codes involve in the distributed and replicated parts of the transaction. I even added some code where replicated routines were written in different languages, and the results compared to add an additional layer of verification before the voter step. (gotta love assembler?). I guess my point is 'Douglas' is full of stuffing, OR that is what folks are doing when they 'role their own solution specifically customized to their specific needs' as he alludes to near the end of his commentary? (I'd like your opinion of this and maybe some links to current schemes how to have ACID/99.999% accurate transactions on clusters of various architectures.) Douglas, like yourself, writes of these things in a very lucid fashion, so that is why I'm asking you for your thoughts. Robustness of transactions, in a distributed (clustered) environment is fundamental to the usefulness of most codes that are trying to migrate to a cluster based processes in (VM/container/HPC) environments. I do not have the old articles handy but, I'm sure that many/most of those types of inherent processes can be formulated in the algebraic domain, normalized and used to solve decisions often where other forms of advanced logic failed (not that I'm taking a cheap shot at modern programming languages) (wink wink nudge nudge); or at least that's how we did it.... as young whipper_snappers bask in the day... --an_old_farts_logic curiously, James ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber 2016-08-01 13:43 ` james @ 2016-08-01 16:49 ` J. Roeleveld 2016-08-01 18:03 ` Rich Freeman 2016-08-02 5:16 ` james 2016-08-11 12:43 ` Douglas J Hunley 1 sibling, 2 replies; 27+ messages in thread From: J. Roeleveld @ 2016-08-01 16:49 UTC (permalink / raw To: gentoo-user On Monday, August 01, 2016 08:43:49 AM james wrote: > On 08/01/2016 02:16 AM, J. Roeleveld wrote: > > On Saturday, July 30, 2016 06:38:01 AM Rich Freeman wrote: > >> On Sat, Jul 30, 2016 at 6:24 AM, Alan McKinnon <alan.mckinnon@gmail.com> > > > > wrote: > >>> On 29/07/2016 22:58, Mick wrote: > >>>> Interesting article explaining why Uber are moving away from > >>>> PostgreSQL. > >>>> I am > >>>> running both DBs on different desktop PCs for akonadi and I'm also > >>>> running > >>>> MySQL on a number of websites. Let's which one goes sideways first. > >>>> :p > >>>> > >>>> https://eng.uber.com/mysql-migration/ > >>> > >>> I don't think your akonadi and some web sites compares in any way to > >>> Uber > >>> and what they do. > >>> > >>> FWIW, my Dev colleagues support and entire large corporate ISP's > >>> operational and customer data on PostgreSQL-9.3. With clustering. With > >>> no > >>> db-related issues :-) > >> > >> Agree, you'd need to be fairly large-scale to have their issues, > > > > And also have to design your database by people who think MySQL actually > > follows common SQL standards. > > > >> but I > >> think the article was something anybody interested in databases should > >> read. If nothing else it is a really easy to follow explanation of > >> the underlying architectures. > > > > Check the link posted by Douglas. > > Ubers article has some misunderstandings about the architecture with > > conclusions drawn that are, at least also, caused by their database design > > and usage. > > > >> I'll probably post this to my LUG mailing list. I think one of the > >> Postgres devs lurks there so I'm curious to his impressions. > >> > >> I was a bit surprised to hear about the data corruption bug. I've > >> always considered Postgres to have a better reputation for data > >> integrity. > > > > They do. > > > >> And of course almost any FOSS project could have a bug. I > >> don't know if either project does the kind of regression testing to > >> reliably detect this sort of issue. > > > > Not sure either, I do think PostgreSQL does a lot with regression tests. > > > >> I'd think that it is more likely > >> that the likes of Oracle would (for their flagship DB (not for MySQL), > > > > Never worked with Oracle (or other big software vendors), have you? :) > > > >> and they'd probably be more likely to send out an engineer to beg > >> forgiveness while they fix your database). > > > > Only if you're a big (as in, spend a lot of money with them) customer. > > > >> Of course, if you're Uber > >> the hit you'd take from downtime/etc isn't made up for entirely by > >> having somebody take a few days to get everything fixed. > > > > -- > > Joost > > I certainly respect your skills and posts on Databases, Joost, as > everything you have posted, in the past is 'spot on'. Comes with a keen interest and long-term (think decades) of working with different databases. > Granted, I'm no database expert, far from it. Not many people are, nor do they need to be. > But I want to share a few thing with you, > and hope you (and others) will 'chime in' on these comments. > > Way back, when the earth was cooling and we all had dinosaurs for pets, > some of us hacked on AT&T "3B2" unix systems. They were know for their > 'roll back and recovery', triplicated (or more) transaction processes > and 'voters' system to ferret out if a transaction was complete and > correct. There was no ACID, the current 'gold standard' if you believe > what Douglas and other write about concerning databases. > > In essence, (from crusted up memories) a basic (SS7) transaction related > to the local telephone switch, was ran on 3 machines. The results were > compared. If they matched, the transaction went forward as valid. If 2/3 > matched, And what in the likely case when only 1 was correct? Have you seen the movie "minority report"? If yes, think back to why Tom Cruise was found 'guilty' when he wasn't and how often this actually occured. > and the switch was was configured, then the code would > essentially 'vote' and majority ruled. This is what led to phone calls > (switched phone calls) having variable delays, often in the order of > seconds, mis-connections and other problems we all encountered during > periods of excessive demand. Not sure if that was the cause in the past, but these days it can also still take a few seconds before the other end rings. This is due to the phone-system (all PBXs in the path) needing to setup the routing between both end-points prior to the ring-tone actually starting. When the system is busy, these lookups will take time and can even time-out. (Try wishing everyone you know a happy new year using a wired phone and you'll see what I mean. Mobile phones have a seperate problem at that time) > That scenario was at the heart of how old, crappy AT&T unix (SVR?) could > perform so well and therefore established the gold standard for RT > transaction processing, aka the "five 9s" 99.999% of up-time (about 5 > minutes per year of downtime). "Unscheduled" downtime. Regular maintenance will require more than 5 minutes per year. > Sure this part is only related to > transaction processing as there was much more to the "five 9s" legacy, > but imho, that is the heart of what was the precursor to ACID property's > now so greatly espoused in SQL codes that Douglas refers to. > > Do folks concur or disagree at this point? ACID is about data integrity. The "best 2 out of 3" voting was, in my opinion, a work-around for unreliable hardware. It is based on a clever idea, but when 2 computers having the same data and logic come up with 2 different answers, I wouldn't trust either of them. > The reason this is important to me (and others?), is that, if this idea > (granted there is much more detail to it) is still valid, then it can > form the basis for building up superior-ACID processes, that meet or > exceed, the properties of an expensive (think Oracle) transaction > process on distributed (parallel) or clustered systems, to a degree of > accuracy only limited by the limit of the number of odd numbered voter > codes involve in the distributed and replicated parts of the > transaction. I even added some code where replicated routines were > written in different languages, and the results compared to add an > additional layer of verification before the voter step. (gotta love > assembler?). You have seen how "democracies" work, right? :) The more voters involved, the longer it takes for all the votes to be counted. With a small number, it might actually still scale, but when you pass a magic number (no clue what this would be), the counting time starts to exceed any time you might have gained by adding more voters. Also, this, to me, seems to counteract the whole reason for using clusters: Have different nodes handle a different part of the problem. Clusters of multiple compute-nodes is a quick and "simple" way of increasing the amount of computational cores to throw at problems that can be broken down in a lot of individual steps with minimal inter-dependencies. I say "simple" because I think designing a 1,000 core chip is more difficult than building a 1,000-node cluster using single-core, single cpu boxes. I would still consider the cluster to be a single "machine". > I guess my point is 'Douglas' is full of stuffing, OR that is what folks > are doing when they 'role their own solution specifically customized to > their specific needs' as he alludes to near the end of his commentary? The response Douglas linked to is closer to what seems to work when dealing with large amounts of data. > (I'd like your opinion of this and maybe some links to current schemes > how to have ACID/99.999% accurate transactions on clusters of various > architectures.) Douglas, like yourself, writes of these things in a > very lucid fashion, so that is why I'm asking you for your thoughts. The way Uber created the cluster is useful when having 1 node handle all the updates and multiple nodes providing read-only access while also providing failover functionality. > Robustness of transactions, in a distributed (clustered) environment is > fundamental to the usefulness of most codes that are trying to migrate > to a cluster based processes in (VM/container/HPC) environments. Whereas I do consider clusters to be very useful, not all work-loads can be redesigned to scale properly. > I do > not have the old articles handy but, I'm sure that many/most of those > types of inherent processes can be formulated in the algebraic domain, > normalized and used to solve decisions often where other forms of > advanced logic failed (not that I'm taking a cheap shot at modern > programming languages) (wink wink nudge nudge); or at least that's how > we did it.... as young whipper_snappers bask in the day... If you know what you are doing, the language is just a tool. Sometimes a hammer is sufficient, other times one might need to use a screwdriver. > --an_old_farts_logic Thinking back on how long I've been playing with computers, I wonder how long it will be until I am in the "old fart" category? -- Joost ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber 2016-08-01 16:49 ` J. Roeleveld @ 2016-08-01 18:03 ` Rich Freeman 2016-08-02 5:51 ` james 2016-08-02 5:16 ` james 1 sibling, 1 reply; 27+ messages in thread From: Rich Freeman @ 2016-08-01 18:03 UTC (permalink / raw To: gentoo-user On Mon, Aug 1, 2016 at 12:49 PM, J. Roeleveld <joost@antarean.org> wrote: > On Monday, August 01, 2016 08:43:49 AM james wrote: > >> Sure this part is only related to >> transaction processing as there was much more to the "five 9s" legacy, >> but imho, that is the heart of what was the precursor to ACID property's >> now so greatly espoused in SQL codes that Douglas refers to. >> >> Do folks concur or disagree at this point? > > ACID is about data integrity. The "best 2 out of 3" voting was, in my opinion, > a work-around for unreliable hardware. It is based on a clever idea, but when > 2 computers having the same data and logic come up with 2 different answers, I > wouldn't trust either of them. I agree, this was a solution for hardware issues. However, hardware issues can STILL happen today, so there is an argument for it. There are really two ways to get to robustness: clever hardware, and clever software. The old way was to do it in hardware, the newer way is to do it in software (see Google with their racks of cheap motherboards). I suspect software will always be the better way, but you can't just write a check to get better software the way you can with hardware. Doing it right with software means hiring really good people, which is something a LOT of companies don't want to do (well, they think they're doing it, but they're not). Basically I believe the concept with the mainframe was that you could probably open the thing up, break one random board with a hammer, and the application would still keep running just fine. IBM would then magically show up the next day and replace the board without anybody doing anything. All the hardware had redundancy, so you can run your application for a decade or two without fear of a hardware failure. However, you pay a small fortune for all of this. The other trend as I understand it in mainframes is renting your own hardware to you. That is, you buy a box, and you can just pay to turn on extra CPUs/etc. You can imagine what the margins are like for that to be practical, but for non-trendy businesses that don't want to offer free ice cream and pay Silicon Valley wages I guess it is an alternative to building good software. > > You have seen how "democracies" work, right? :) > The more voters involved, the longer it takes for all the votes to be counted. > With a small number, it might actually still scale, but when you pass a magic > number (no clue what this would be), the counting time starts to exceed any > time you might have gained by adding more voters. > > Also, this, to me, seems to counteract the whole reason for using clusters: > Have different nodes handle a different part of the problem. I agree. The old mainframe way of doing things isn't going to make anything faster. I don't think it will necessarily make things much slower as long as all the hardware is in the same box. However, if you want to start doing this at a cluster scale with offsite replicas I imagine the latencies would kill just about anything. That was one of the arguments against the Postgres vacuum approach where replicas could end up having in-use records deleted. The solutions are to delay the replicas (not great), or synchronize back to the master (also not great). The MySQL approach apparently lets all the replicas do their own vacuuming, which does neatly solve that particular problem (presumably at the cost of more work for the replicas, and of course they're no longer binary replicas). > > The way Uber created the cluster is useful when having 1 node handle all the > updates and multiple nodes providing read-only access while also providing > failover functionality. I agree. I do remember listening to a Postgres talk by one of the devs and while everybody's holy grail is the magical replica where you just have a bunch of replicas and you do any operation on any replica and everything is up to date, in reality that is almost impossible to achieve with any solution. -- Rich ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber 2016-08-01 18:03 ` Rich Freeman @ 2016-08-02 5:51 ` james 2016-08-11 12:48 ` Douglas J Hunley 0 siblings, 1 reply; 27+ messages in thread From: james @ 2016-08-02 5:51 UTC (permalink / raw To: gentoo-user On 08/01/2016 01:03 PM, Rich Freeman wrote: > On Mon, Aug 1, 2016 at 12:49 PM, J. Roeleveld <joost@antarean.org> wrote: >> On Monday, August 01, 2016 08:43:49 AM james wrote: >> >>> Sure this part is only related to >>> transaction processing as there was much more to the "five 9s" legacy, >>> but imho, that is the heart of what was the precursor to ACID property's >>> now so greatly espoused in SQL codes that Douglas refers to. >>> >>> Do folks concur or disagree at this point? >> >> ACID is about data integrity. The "best 2 out of 3" voting was, in my opinion, >> a work-around for unreliable hardware. It is based on a clever idea, but when >> 2 computers having the same data and logic come up with 2 different answers, I >> wouldn't trust either of them. > > I agree, this was a solution for hardware issues. However, hardware > issues can STILL happen today, so there is an argument for it. There > are really two ways to get to robustness: clever hardware, and clever > software. The old way was to do it in hardware, the newer way is to > do it in software (see Google with their racks of cheap motherboards). > I suspect software will always be the better way, but you can't just > write a check to get better software the way you can with hardware. > Doing it right with software means hiring really good people, which is > something a LOT of companies don't want to do (well, they think > they're doing it, but they're not). > > Basically I believe the concept with the mainframe was that you could > probably open the thing up, break one random board with a hammer, and > the application would still keep running just fine. IBM would then > magically show up the next day and replace the board without anybody > doing anything. All the hardware had redundancy, so you can run your > application for a decade or two without fear of a hardware failure. Not with todays clusters and cheap hardware. As you pointed out expertise (and common sense) are the quintessential qualities for staff and managers..... > > However, you pay a small fortune for all of this. Not today, that was then those absorbant prices. Sequoia made so much money, I pretty sure that how they ultimately became a VC firm? > The other trend as > I understand it in mainframes is renting your own hardware to you. Yes, find a CPA that spent 10 years or so inside the IRS and you get even more aggressive profitibility vectors. Some accouants move hardware, assest and corporations around and about the world in a shell game and never pay taxes, just recycling assets among billionares. It's pretty sickening, if you really learn the details of what goes on. > That is, you buy a box, and you can just pay to turn on extra > CPUs/etc. You can imagine what the margins are like for that to be > practical, but for non-trendy businesses that don't want to offer free > ice cream and pay Silicon Valley wages I guess it is an alternative to > building good software. Investment credits, sell/rent hardware to overseas divison, then move them to another country that pays you re-locate and bring a few jobs. Heck, event the US stats play that stupid game with recruiting corporations. Get and IRA career agent drunk some time and pull a few stories out of them..... >> You have seen how "democracies" work, right? :) >> The more voters involved, the longer it takes for all the votes to be counted. >> With a small number, it might actually still scale, but when you pass a magic >> number (no clue what this would be), the counting time starts to exceed any >> time you might have gained by adding more voters. >> >> Also, this, to me, seems to counteract the whole reason for using clusters: >> Have different nodes handle a different part of the problem. > > I agree. The old mainframe way of doing things isn't going to make > anything faster. I don't think it will necessarily make things much > slower as long as all the hardware is in the same box. However, if > you want to start doing this at a cluster scale with offsite replicas > I imagine the latencies would kill just about anything. That was one > of the arguments against the Postgres vacuum approach where replicas > could end up having in-use records deleted. The solutions are to > delay the replicas (not great), or synchronize back to the master > (also not great). The MySQL approach apparently lets all the replicas > do their own vacuuming, which does neatly solve that particular > problem (presumably at the cost of more work for the replicas, and of > course they're no longer binary replicas). Why Rich, using common sense? What's wrong with you? I thought you were a good corporate lacky? Bob from accounting has already presented to the BOD and got approval. Rich, can you be a team player (silent idiot) just once for the team? > >> >> The way Uber created the cluster is useful when having 1 node handle all the >> updates and multiple nodes providing read-only access while also providing >> failover functionality. > > I agree. I do remember listening to a Postgres talk by one of the > devs and while everybody's holy grail is the magical replica where you > just have a bunch of replicas and you do any operation on any replica > and everything is up to date, in reality that is almost impossible to > achieve with any solution. Yep NoSQL is floundering mightily when requirements are stringent and other extreme QA issues are fine-grained, from what I read. Sadly, like yourself, I like to put on my 'common sense' glasses after an architectural solution is presented, and I've seen mountains of bad ideas; like BP running prudhoe bay (N. Americas largest oil field) in the Arctic. Bad, bad idea, if you are an engineer and hang out with those 'tards' a few days. Collected data in the arctic, microwaved it to a mainframe in Anchorage, ran software, and then microwave controls signals back to the field controllers. Beyond stupid.They were an embarrassment to the entire petroleum industry back in the 70s, when I did some automation (RF to RF) to mainframe work in the arctic. LIke wise the solution to all of the drilling disasters, world wide, is each country needs to provide RT date to a monitoring station, in the government and status things like the condition of the safety and backup safety systems (Real Time) so keep mid manager from making gargantuanly stupid decisions. There is more than this amount of stupidity in how many cluster (cloud companies) think large amounts of critical data will be 'outsourced'. Bean counters scare me the most. Sales-lizards are rarely trusted, unless they listen to me and do exactly what I tell them to do. It seems that there are many many tards in the cluster (cloud) space lacking of common sense. So that (cluster/cloud) industry is going to implode, just like the "dot-com" bubble of the 90s. Not because there is not lots of valid projects and good ideas, but many tards are managing and they lack the common sense to poor piss out of a boot let alone discern valid solutions for specific industries. Like a 'blind hog':: though they will find an acorn or two. A historical CS class or two on what has been tried what works and does not work and why, along with a few (real) hardware architecture classes) and there would not be so many ridiculous (doomed to fail before getting stared) cluster (cloud) companies out there. Developing unknown but old ideas in java, is still going to fail. Many are the BP of the cloud:: a disaster just waiting to fail.... ymmv. Many folks in the Petroleum industry warned Alaskan government officials that BP was incompetent, back in the 70s. They still are mostly becase the executives would not not how to calculate the weight of drill stem column of fluid and match it up with the expected subsurface pressures to be encountered. It's a simple 'material balance equation' you could teach a HS physics class. Likewise there is a rich history (graveyard) of distributed processing and that body of knowledge is being ignore, mostly because it is getting in the way of vendor hyperbole...... Douglas did manage to pull his own bacon from the fire, in the end of his article, but it wreaks of vendor hyperbole, imho. thanks for the comments, James ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber 2016-08-02 5:51 ` james @ 2016-08-11 12:48 ` Douglas J Hunley 2016-08-12 13:00 ` james 0 siblings, 1 reply; 27+ messages in thread From: Douglas J Hunley @ 2016-08-11 12:48 UTC (permalink / raw To: Gentoo [-- Attachment #1: Type: text/plain, Size: 404 bytes --] On Tue, Aug 2, 2016 at 1:51 AM, james <garftd@verizon.net> wrote: > Douglas did manage to pull his own bacon from the fire, in the end of his > article, but it wreaks of vendor hyperbole, imho. > Again, not the author -- { "name": "douglas j hunley", "email": "doug.hunley@gmail.com", "social": [ { "blog": "https://hunleyd.github.io/", "twitter": "@hunleyd" } ] } [-- Attachment #2: Type: text/html, Size: 1106 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber 2016-08-11 12:48 ` Douglas J Hunley @ 2016-08-12 13:00 ` james 2016-08-12 14:13 ` R0b0t1 0 siblings, 1 reply; 27+ messages in thread From: james @ 2016-08-12 13:00 UTC (permalink / raw To: gentoo-user On 08/11/2016 07:48 AM, Douglas J Hunley wrote: > > On Tue, Aug 2, 2016 at 1:51 AM, james <garftd@verizon.net > <mailto:garftd@verizon.net>> wrote: > > Douglas did manage to pull his own bacon from the fire, in the end > of his article, but it wreaks of vendor hyperbole, imho. > > > Again, not the author > > > -- > { > "name": "douglas j hunley", > "email": "doug.hunley@gmail.com <mailto:doug.hunley@gmail.com>", > "social": [ > { > "blog": "https://hunleyd.github.io/", > "twitter": "@hunleyd" > } > ] > } IFF I made a logical sequence attachment error {a boo_boo}:: 1K apologies IFF I bruised your ego:: 1M apologies IFF I insulted your pride:: 1G apologies IFelse My goal was to clear up common ignorance of where the ACID properties came from:: Mathematics ==>Electro-Mechanical Engineering ==>Electronics Engineering ==>DataBase Weenies ==>(accounting)Codes. OK? That's my thesis and conclusion:: sprinkle with apologies as necessary. I do knowledge that DataBase (weeny) vendors are the Microsoft of Robustness and Reliability, espoused by the current state of affairs in transaction processes, which is now a staple of modern computations, much like MicroSoft made computers so idiots can participate too. No arguments therein. BUT, I take the time to 'educate' folks for a very important reason:: Distributed and parallel processing, now entering it's fourth/fifth/sixth/<whatever> rendition, offers up fundamental mathematically based constructs, that can be realized in (electronic)hardware or Software or both, to build 'systems' that far exceed the robustness of ACID properties currently found in a current database scheme. Furthermore, whores like Oracle, need to be retired from the computational landscape, as they are the robber barrons of yore and we just do not need them any more. I.E. learn the basics and implement new constructs in distributed and parallel schemes (aka the cluster). Fundamental and sound and proven principals of mathematics and EE provide solutions for many 'degrees of freedom' for more robust solutions than the Vendor hyperbole of Database vendors. And yes, your favorite University, and Wiki*, have failed to accurately document this; nothing I can do about that but share, as I am doing here. OK? So, the interested can do their own research, and others can trudge along their merry way. (The apologies are sincere, but, I am a bit crass:: no apologies on that note). hth, James ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber 2016-08-12 13:00 ` james @ 2016-08-12 14:13 ` R0b0t1 2016-08-12 14:15 ` R0b0t1 0 siblings, 1 reply; 27+ messages in thread From: R0b0t1 @ 2016-08-12 14:13 UTC (permalink / raw To: gentoo-user On Fri, Aug 12, 2016 at 8:00 AM, james <garftd@verizon.net> wrote: > Mathematics ==>Electro-Mechanical Engineering ==>Electronics Engineering > ==>DataBase Weenies ==>(accounting)Codes. The study of anything is really the study of war. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber 2016-08-12 14:13 ` R0b0t1 @ 2016-08-12 14:15 ` R0b0t1 2016-08-12 18:01 ` james 0 siblings, 1 reply; 27+ messages in thread From: R0b0t1 @ 2016-08-12 14:15 UTC (permalink / raw To: gentoo-user On Fri, Aug 12, 2016 at 9:13 AM, R0b0t1 <r030t1@gmail.com> wrote: > On Fri, Aug 12, 2016 at 8:00 AM, james <garftd@verizon.net> wrote: >> Mathematics ==>Electro-Mechanical Engineering ==>Electronics Engineering >> ==>DataBase Weenies ==>(accounting)Codes. > > The study of anything is really the study of war. Readers will find it amusing that Machiavelli's writings included convenient descriptions of pike-and-shot formations as "ASCII" art. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber 2016-08-12 14:15 ` R0b0t1 @ 2016-08-12 18:01 ` james 0 siblings, 0 replies; 27+ messages in thread From: james @ 2016-08-12 18:01 UTC (permalink / raw To: gentoo-user On 08/12/2016 09:15 AM, R0b0t1 wrote: > On Fri, Aug 12, 2016 at 9:13 AM, R0b0t1 <r030t1@gmail.com> wrote: >> On Fri, Aug 12, 2016 at 8:00 AM, james <garftd@verizon.net> wrote: >>> Mathematics ==>Electro-Mechanical Engineering ==>Electronics Engineering >>> ==>DataBase Weenies ==>(accounting)Codes. >> >> The study of anything is really the study of war. > > Readers will find it amusing that Machiavelli's writings included > convenient descriptions of pike-and-shot formations as "ASCII" art. > > Plausible, but consider some perspective on Mac:: A medical professor once asked her class to submit a one paragraph thesis on how Machiavelli works affected modern medicine. When the youngest member of the class (quite young actually) Espoused that most acknowledge that Mac was very ill, later in life, His Thesis was that that (catastrophic) illness actually had consumed Mac much earlier in life and therefore, the study and reading of MAC, was more attributable to a manifestation of 'societal sickness', rather than a learned pursuit of that which is worthy of pursuit. He receive a low mark in that class, but truth is truth, especially in the eyes of the author, a brilliant truth most often. There is an ironic posting in Hacker Mews about the lack of credibility amongst their customers, when focused on modern psychiatry, you just might find in interesting. All other forms of modern medicine receive quite high marks, from their customers. caveat emptor, James ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber 2016-08-01 16:49 ` J. Roeleveld 2016-08-01 18:03 ` Rich Freeman @ 2016-08-02 5:16 ` james 2016-08-04 10:09 ` J. Roeleveld 1 sibling, 1 reply; 27+ messages in thread From: james @ 2016-08-02 5:16 UTC (permalink / raw To: gentoo-user On 08/01/2016 11:49 AM, J. Roeleveld wrote: > On Monday, August 01, 2016 08:43:49 AM james wrote: >> On 08/01/2016 02:16 AM, J. Roeleveld wrote: >>> On Saturday, July 30, 2016 06:38:01 AM Rich Freeman wrote: >>>> On Sat, Jul 30, 2016 at 6:24 AM, Alan McKinnon <alan.mckinnon@gmail.com> >>> >>> wrote: >>>>> On 29/07/2016 22:58, Mick wrote: >>>>>> Interesting article explaining why Uber are moving away from >>>>>> PostgreSQL. >>>>>> I am >>>>>> running both DBs on different desktop PCs for akonadi and I'm also >>>>>> running >>>>>> MySQL on a number of websites. Let's which one goes sideways first. >>>>>> :p >>>>>> >>>>>> https://eng.uber.com/mysql-migration/ >>>>> >>>>> I don't think your akonadi and some web sites compares in any way to >>>>> Uber >>>>> and what they do. >>>>> >>>>> FWIW, my Dev colleagues support and entire large corporate ISP's >>>>> operational and customer data on PostgreSQL-9.3. With clustering. With >>>>> no >>>>> db-related issues :-) >>>> >>>> Agree, you'd need to be fairly large-scale to have their issues, >>> >>> And also have to design your database by people who think MySQL actually >>> follows common SQL standards. >>> >>>> but I >>>> think the article was something anybody interested in databases should >>>> read. If nothing else it is a really easy to follow explanation of >>>> the underlying architectures. >>> >>> Check the link posted by Douglas. >>> Ubers article has some misunderstandings about the architecture with >>> conclusions drawn that are, at least also, caused by their database design >>> and usage. >>> >>>> I'll probably post this to my LUG mailing list. I think one of the >>>> Postgres devs lurks there so I'm curious to his impressions. >>>> >>>> I was a bit surprised to hear about the data corruption bug. I've >>>> always considered Postgres to have a better reputation for data >>>> integrity. >>> >>> They do. >>> >>>> And of course almost any FOSS project could have a bug. I >>>> don't know if either project does the kind of regression testing to >>>> reliably detect this sort of issue. >>> >>> Not sure either, I do think PostgreSQL does a lot with regression tests. >>> >>>> I'd think that it is more likely >>>> that the likes of Oracle would (for their flagship DB (not for MySQL), >>> >>> Never worked with Oracle (or other big software vendors), have you? :) >>> >>>> and they'd probably be more likely to send out an engineer to beg >>>> forgiveness while they fix your database). >>> >>> Only if you're a big (as in, spend a lot of money with them) customer. >>> >>>> Of course, if you're Uber >>>> the hit you'd take from downtime/etc isn't made up for entirely by >>>> having somebody take a few days to get everything fixed. >>> >>> -- >>> Joost >> >> I certainly respect your skills and posts on Databases, Joost, as >> everything you have posted, in the past is 'spot on'. > > Comes with a keen interest and long-term (think decades) of working with > different databases. > >> Granted, I'm no database expert, far from it. > > Not many people are, nor do they need to be. > >> But I want to share a few thing with you, >> and hope you (and others) will 'chime in' on these comments. >> >> Way back, when the earth was cooling and we all had dinosaurs for pets, >> some of us hacked on AT&T "3B2" unix systems. They were know for their >> 'roll back and recovery', triplicated (or more) transaction processes >> and 'voters' system to ferret out if a transaction was complete and >> correct. There was no ACID, the current 'gold standard' if you believe >> what Douglas and other write about concerning databases. >> >> In essence, (from crusted up memories) a basic (SS7) transaction related >> to the local telephone switch, was ran on 3 machines. The results were >> compared. If they matched, the transaction went forward as valid. If 2/3 >> matched, > > And what in the likely case when only 1 was correct? 1/3 was a failure, in fact X<1 could be defined (parameter setting) as a failure depending on the need. > Have you seen the movie "minority report"? > If yes, think back to why Tom Cruise was found 'guilty' when he wasn't and how > often this actually occured. Apples to Oranges. The (3) "pre-cons" were not equal, ableit the voted, most of the time all three in agreement, but the dominant pre-con was always on the correct side of the issue. But that is make-believe. Comparing results of codes run on 3 different processors or separate machines for agreement withing tolerances, is quite different. The very essence of using voting where there a result less that 1.0 (that is n-1/n or n-x/n was requisite on identical (replicated) processes all returning the same result ( expecting either a 0 or 1) returned. Results being logical or within rounding error of acceptance. Surely we need not split hairs. I was merely pointing out that the basis telecom systems formed the early and of widespread transaction processing industries and is the grand daddy of the ACID model/norms/constructs of modern transaction processing. And Douglas is dead wrong that those sorts of (ACID) transactions cannot be made to fly on clusters versus a single machine. For massively parallel needs, distributed processing rules, but it is not trivial and hence Uber, with mostly a bunch of kids, seems to be struggling and have made bad decisions. Prolly, there mid managers and software architects are the weak link, or they did get expert guidance that was not inhouse, or poor decisions to get some code running quickly etc etc. I do not really care about UBER. My singular issue is Douglas was completely dead wrong (which nicely promoted himself as a postgress expert and his business credentals, and just barely saved his credibility by stating what UBER is now doing that is superior to a grade ACID, dB solution. Another point, there are single big GPUs that can be run as thousands of different processors on either FPGA or GPU, granted using SIMD/MIMD style processors and thing like 'systolic algorithms' but that sort of this is out of scope here. (Vulcan might change that, in an open source kind of way, maybe). Furthermore, GPU resources combined with DDR-5 can blur the line and may actually be more cost effective for many forms of transaction processing, but clusters, in their current forms are very much general purpose machines. My point:: Douglas is dead wrong about ACID being dominated by Databases, for technical reasons, particularly for advanced teams of experts. Surely most MBA, HR and Finance types of idiots running these new startups would know know a coder from an architect, and that is very sad, because a good consultant could have probably designed several robust systems in a week or two. Grant few consultants has that sort of unbiased integrity, because we all have bills to pay and much is getting outsourced... Integrity has always been the rarest of qualities, particularly with humanoids...... > >> and the switch was was configured, then the code would >> essentially 'vote' and majority ruled. This is what led to phone calls >> (switched phone calls) having variable delays, often in the order of >> seconds, mis-connections and other problems we all encountered during >> periods of excessive demand. > > Not sure if that was the cause in the past, but these days it can also still > take a few seconds before the other end rings. This is due to the phone-system > (all PBXs in the path) needing to setup the routing between both end-points > prior to the ring-tone actually starting. > When the system is busy, these lookups will take time and can even time-out. > (Try wishing everyone you know a happy new year using a wired phone and you'll > see what I mean. Mobile phones have a seperate problem at that time) I did not intend to argue about the minutia of how a particular Baby Bell implemented their SS7 switching systems on unix systems. My point was the 'transaction processing' grew out the early telephone network, the way I remember it:: ymmv. Banks did dual entry accounting by hand and had clerks manually load data sets, then double entry accounting became automated and ACID style transaction processing added later. So what sql folks refer to as ACID properties, comes from the North American switching heritage and eventually the worlds telecom networks, eons ago. >> That scenario was at the heart of how old, crappy AT&T unix (SVR?) could >> perform so well and therefore established the gold standard for RT >> transaction processing, aka the "five 9s" 99.999% of up-time (about 5 >> minutes per year of downtime). > > "Unscheduled" downtime. Regular maintenance will require more than 5 minutes > per year. Yes but the redundancy of 3b2 and other computers (Sequent, Sequoia and Tandem to name a few) meant that the "phone switching" fabric, at any given Central Office (the local building where the copper, Rf and fiber lines are muxed)(was, on average up and available 99.999% of the time. Ironically gentoo now has a 'sys/fabric group :: /usr/portage/sys-fabric, thanks to some forward thinking cluster folk. > >> Sure this part is only related to >> transaction processing as there was much more to the "five 9s" legacy, >> but imho, that is the heart of what was the precursor to ACID property's >> now so greatly espoused in SQL codes that Douglas refers to. >> >> Do folks concur or disagree at this point? > > ACID is about data integrity. The "best 2 out of 3" voting was, in my opinion, > a work-around for unreliable hardware. Absolute true. But the fact that a High Reliability in computer processing (including the billing) could be replicated performed elsewhere and then 'recombined', proves that the need of any ACID function can be split up and ran on clusters and achieve ACID standards or even better. So my point, is that the cluster, if used wisely, will beat the 'dog shit' out of any Oracle fancy-pants database maneuvers. Evidence:: Snoracle is now snapping up billion dollar companies in the cluster space, cause their days of extortion are winding down rather rapidly, imho. Also, just because the kids are writing the codes, have not figured all of this out, does not mean that SQL and any abstraction is better that parallel processing. No way in hell. Cheaper and quicker to set up, surely true, but never superior to a well design properly coded distributed solution. That's my point. Hence, Douglas is full of stuffing, except he alludes to the fact that UBER is doing something much better, beyond what Oracle has an interest in doing, at the last possible moment in his critique. This is back up by Oracles lethargic reaction to the data processing market just leaving Oracle to become the next IBM.... (ymmv). > It is based on a clever idea, but when > 2 computers having the same data and logic come up with 2 different answers, I > wouldn't trust either of them. Yep, That the QA of Transactions is rejected and must be resubmitted, modified or any number of remedies, is quite common in many forms of software. Voting does not correct errors, except maybe a fractional rounding up to 1(pass) or down to zero (failure). It does help to achieve the ACI of ACID Since billions and billions of these (complex) transactions are occurring, it is usually just repeated. If it keeps failing then engineers/coders take a deeper look. Rare statistical anomalies are auto-scrutinized (that would be replications and voting) and the pushed to a logical zero or logical one. > >> The reason this is important to me (and others?), is that, if this idea >> (granted there is much more detail to it) is still valid, then it can >> form the basis for building up superior-ACID processes, that meet or >> exceed, the properties of an expensive (think Oracle) transaction >> process on distributed (parallel) or clustered systems, to a degree of >> accuracy only limited by the limit of the number of odd numbered voter >> codes involve in the distributed and replicated parts of the >> transaction. I even added some code where replicated routines were >> written in different languages, and the results compared to add an >> additional layer of verification before the voter step. (gotta love >> assembler?). > > You have seen how "democracies" work, right? :) Yes I need to shed some light on telecom processing. I never intend to suggest that voting corrected errors; althoght error correction codes are usually part of the overall stack. I tried to suggest that all transactions on phone switches are already (Atomic (pass or fail-redo; Consistent (replications pass on different hardware pathways to satisfaction metrics; Isolated via multiple hardware pathways; Durable passing a voter check scheme and (five nines still is the gold standard for a system (even mil-spec). So the old telecom systems are indeed and infact the heritage for modern ACID transactions. > The more voters involved, the longer it takes for all the votes to be counted. Wrong! Voters are all run in parallel. For this level of redundancy (to achieve a QA result of 99.999% system pristine, it is more expensive, analogous to encryption versus clear text. Nobody, but a business major would use an excessive number of voters in their switching fabric. Telecom incompetences, in my experiences, has been the domain of mid manager too weak to educate upper management on poor ideas many of them have had and continue to have (Verizon comes to mind, too often). > With a small number, it might actually still scale, but when you pass a magic > number (no clue what this would be), the counting time starts to exceed any > time you might have gained by adding more voters. Nope the larger the number, the more expensive. The number of voters rarely goes above 5, but it could for some sorts of physics problems (think quantum mechanics and logic not bound to [0 1] whole numbers. Often logic circuits (constructs for programmers, have "dont care" states that can be handled in a variety of ways (filters, transforms, counters etc etc). > Also, this, to me, seems to counteract the whole reason for using clusters: > Have different nodes handle a different part of the problem. That also occurs. But my point is properly design code for the cluster can replace ACID functions, offered by Oracle and other over priced solutions, on standard cluster hardware. The problem with todays clusters is the vendors that employ the kid-coders, are making things far more complicated that necessary, so the average linux hacker just outsources via the cloud. DUMB, insecure and not a wise choice for many industries. And sooner or later folks are going to get wise can build their own clusters that just solve the problems they have. Surely hybrid clusters will domiant where the owner of the codes does outsource peak loads and mundance collects of ordinary (non-critical) data. Vendors know this and have started another 'smoke and mirrors' campaign called (brace yourself) 'Unikernels'..... Problem with that approach is they should just be using minized (focused) gentoo on striped and optimize linux kernels; but that is another lost art from the linux collection > > Clusters of multiple compute-nodes is a quick and "simple" way of increasing > the amount of computational cores to throw at problems that can be broken down > in a lot of individual steps with minimal inter-dependencies. And surpass the ACID features of either postgresql or Oracle, and spend less money (maybe not with you and postgresql on their team)! > I say "simple" because I think designing a 1,000 core chip is more difficult > than building a 1,000-node cluster using single-core, single cpu boxes. Today, you are correct. Tomorrow you will be wrong. [1]. Besides once that chip or VHDL code or whatever is designed, it can be replicated and resused endlessly. Think ASIC designers, folks to take a fpga project to completing, An EE can codes on large arrays of DSPs, or a GPU (think Khronos group) using Vulcan. > > I would still consider the cluster to be a single "machine". Thats the goal. > >> I guess my point is 'Douglas' is full of stuffing, OR that is what folks >> are doing when they 'role their own solution specifically customized to >> their specific needs' as he alludes to near the end of his commentary? > > The response Douglas linked to is closer to what seems to work when dealing > with large amounts of data. > >> (I'd like your opinion of this and maybe some links to current schemes >> how to have ACID/99.999% accurate transactions on clusters of various >> architectures.) Douglas, like yourself, writes of these things in a >> very lucid fashion, so that is why I'm asking you for your thoughts. > > The way Uber created the cluster is useful when having 1 node handle all the > updates and multiple nodes providing read-only access while also providing > failover functionality. SIMD solution, mimic on a cluster? Cool. > >> Robustness of transactions, in a distributed (clustered) environment is >> fundamental to the usefulness of most codes that are trying to migrate >> to a cluster based processes in (VM/container/HPC) environments. > > Whereas I do consider clusters to be very useful, not all work-loads can be > redesigned to scale properly. Today, correct. Tomorrow, I think you are going to be wrong. It's like the single core, multicore. Granted many old decreped codes had to be redesigned and coded anew with threads and other modern constructs to take advantage of newer processing platforms. Sure the same is true with distributed, but it's far closer than ever. The largest problem with cluster, is Vendors with agendas, are making things more complicated than necessary and completely ignoring many fundamental issues, like kernel stripping and optimizations under the bloated OS they are using. > >> I do >> not have the old articles handy but, I'm sure that many/most of those >> types of inherent processes can be formulated in the algebraic domain, >> normalized and used to solve decisions often where other forms of >> advanced logic failed (not that I'm taking a cheap shot at modern >> programming languages) (wink wink nudge nudge); or at least that's how >> we did it.... as young whipper_snappers bask in the day... > > If you know what you are doing, the language is just a tool. Sometimes a > hammer is sufficient, other times one might need to use a screwdriver. > >> --an_old_farts_logic > > Thinking back on how long I've been playing with computers, I wonder how long > it will be until I am in the "old fart" category? Stay young! I run full court hoops all the time with young college punks; it's one of my greatest joys in life, run with the young stallions, hacking, pushing, shoving, slicing and taunting other athletes. Old farts clubs is not something to be proud of, I just like to share too much...... > Joost Thanks ! James ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber 2016-08-02 5:16 ` james @ 2016-08-04 10:09 ` J. Roeleveld 2016-08-04 17:08 ` james 0 siblings, 1 reply; 27+ messages in thread From: J. Roeleveld @ 2016-08-04 10:09 UTC (permalink / raw To: gentoo-user On Tuesday, August 02, 2016 12:16:32 AM james wrote: > On 08/01/2016 11:49 AM, J. Roeleveld wrote: > > On Monday, August 01, 2016 08:43:49 AM james wrote: <snipped> > >> Way back, when the earth was cooling and we all had dinosaurs for pets, > >> some of us hacked on AT&T "3B2" unix systems. They were know for their > >> 'roll back and recovery', triplicated (or more) transaction processes > >> and 'voters' system to ferret out if a transaction was complete and > >> correct. There was no ACID, the current 'gold standard' if you believe > >> what Douglas and other write about concerning databases. > >> > >> In essence, (from crusted up memories) a basic (SS7) transaction related > >> to the local telephone switch, was ran on 3 machines. The results were > >> compared. If they matched, the transaction went forward as valid. If 2/3 > >> matched, > > > > And what in the likely case when only 1 was correct? > > 1/3 was a failure, in fact X<1 could be defined (parameter setting) as a > failure depending on the need. I actually meant: system A says true system B and C say false And "true" was correct. (Being devil's advocate here) > > Have you seen the movie "minority report"? > > If yes, think back to why Tom Cruise was found 'guilty' when he wasn't and > > how often this actually occured. > > Apples to Oranges. The (3) "pre-cons" were not equal, ableit the voted, > most of the time all three in agreement, but the dominant pre-con was > always on the correct side of the issue. But that is make-believe. Ofcourse, but it was the first example that I could come up with. > Comparing results of codes run on 3 different processors or separate > machines for agreement withing tolerances, is quite different. The very > essence of using voting where there a result less that 1.0 (that is > n-1/n or n-x/n was requisite on identical (replicated) processes all > returning the same result ( expecting either a 0 or 1) returned. Results > being logical or within rounding error of acceptance. Surely we need not > split hairs. I was merely pointing out that the basis telecom systems > formed the early and of widespread transaction processing industries and > is the grand daddy of the ACID model/norms/constructs of modern > transaction processing. Hmm... I am having difficulty following how ACID and ensuring results are correct by double or triple checking are related. > And Douglas is Which Douglas are you referring to? The one in this thread didn't actually write the article he linked to. (Unless he has 2 different identities) > dead wrong that those sorts of (ACID) transactions cannot be made to fly > on clusters versus a single machine. It depends on how you define a cluster. I tend to view a cluster as a single system that just happens to be spread over multiple physical boxes. > For massively parallel needs, > distributed processing rules, but it is not trivial Agreed. > and hence Uber, with > mostly a bunch of kids, seems to be struggling and have made bad > decisions. Lets ignore if the decisions are good or bad. Only thing we can be certain of, without seeing their code and environment, is that it doesn't scale the way they need it to. > Prolly, there mid managers and software architects are the > weak link, or they did get expert guidance that was not inhouse, or poor > decisions to get some code running quickly etc etc. I do not really care > about UBER. Neither do I. And decisions are usually made by a single architect or developer who starts the project. His/her manager usually just accepts his/her word on this and all future decisions. Up until the moment the manager gets replaced. Then it depends on how much the manager trusts the original developer. Other developers (internal or external) usually have a hard time pointing out potential issues if the first developer doesn't agree and/or understand. > My singular issue is Douglas was completely dead wrong > (which nicely promoted himself as a postgress expert and his business > credentals, and just barely saved his credibility by stating what UBER > is now doing that is superior to a grade ACID, dB solution. I didn't see that in the article. Must have missed that part. > Another point, there are single big GPUs that can be run as thousands of > different processors on either FPGA or GPU, granted using SIMD/MIMD > style processors and thing like 'systolic algorithms' but that sort of > this is out of scope here. (Vulcan might change that, in an open source > kind of way, maybe). Furthermore, GPU resources combined with DDR-5 can > blur the line and may actually be more cost effective for many forms of > transaction processing, but clusters, in their current forms are very > much general purpose machines. I don't really agree here. For most software, having a really fast CPU helps. Having a lot of mediocre CPUs means the vast majority isn't doing anything useful. Software running on clusters needs to be written with massive parallel processing in mind. Most developers don't understand this part. > My point:: Douglas is dead wrong about > ACID being dominated by Databases, for technical reasons, particularly > for advanced teams of experts. Wikipedia actually disagrees with you: https://en.wikipedia.org/wiki/ACID "In computer science, ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties of database transactions." In other words, it's related to databases. > Surely most MBA, HR and Finance types of > idiots running these new startups would know know a coder from an > architect, and that is very sad, because a good consultant could have > probably designed several robust systems in a week or two. Grant few > consultants has that sort of unbiased integrity, because we all have > bills to pay and much is getting outsourced... Integrity has always been > the rarest of qualities, particularly with humanoids...... The software Uber uses for their business had to be developed in-house as there, at least at the time, was nothing available they could use ready-made. This usually means, they start with something simple they can get running quickly. If they want to fully design the whole system first, they would never get anything done. Where these projects usually go wrong is that they wait too long with a good robust design, leading to a near impossibility of actually fixing all the, in hindsight obvious, design mistakes. (NOTE: In hindsight, as most of the actual requirements would not be clear on day 1) > >> and the switch was was configured, then the code would > >> essentially 'vote' and majority ruled. This is what led to phone calls > >> (switched phone calls) having variable delays, often in the order of > >> seconds, mis-connections and other problems we all encountered during > >> periods of excessive demand. > > > > Not sure if that was the cause in the past, but these days it can also > > still take a few seconds before the other end rings. This is due to the > > phone-system (all PBXs in the path) needing to setup the routing between > > both end-points prior to the ring-tone actually starting. > > When the system is busy, these lookups will take time and can even > > time-out. (Try wishing everyone you know a happy new year using a wired > > phone and you'll see what I mean. Mobile phones have a seperate problem > > at that time) > I did not intend to argue about the minutia of how a particular Baby > Bell implemented their SS7 switching systems on unix systems. My point > was the 'transaction processing' grew out the early telephone network, > the way I remember it:: ymmv. Banks did dual entry accounting by hand > and had clerks manually load data sets, then double entry accounting > became automated and ACID style transaction processing added later. So > what sql folks refer to as ACID properties, comes from the North > American switching heritage and eventually the worlds telecom networks, > eons ago. There is a similarity, but where ACID is a way of guaranteeing data integrity, a phone-switch does not need this. It simply needs to do the routing correctly. Finance departments still do double-entry accounting and there still is a lot of manual writing/typing going on. > >> That scenario was at the heart of how old, crappy AT&T unix (SVR?) could > >> perform so well and therefore established the gold standard for RT > >> transaction processing, aka the "five 9s" 99.999% of up-time (about 5 > >> minutes per year of downtime). > > > > "Unscheduled" downtime. Regular maintenance will require more than 5 > > minutes per year. > > Yes but the redundancy of 3b2 and other computers (Sequent, Sequoia and > Tandem to name a few) meant that the "phone switching" fabric, at any > given Central Office (the local building where the copper, Rf and fiber > lines are muxed)(was, on average up and available 99.999% of the time. > Ironically gentoo now has a 'sys/fabric group :: > /usr/portage/sys-fabric, thanks to some forward thinking cluster folk. > > >> Sure this part is only related to > >> transaction processing as there was much more to the "five 9s" legacy, > >> but imho, that is the heart of what was the precursor to ACID property's > >> now so greatly espoused in SQL codes that Douglas refers to. > >> > >> Do folks concur or disagree at this point? > > > > ACID is about data integrity. The "best 2 out of 3" voting was, in my > > opinion, a work-around for unreliable hardware. > > Absolute true. But the fact that a High Reliability in computer > processing (including the billing) could be replicated performed > elsewhere and then 'recombined', proves that the need of any ACID > function can be split up and ran on clusters and achieve ACID standards > or even better. So my point, is that the cluster, if used wisely, > will beat the 'dog shit' out of any Oracle fancy-pants database > maneuvers. Evidence:: Snoracle is now snapping up billion dollar > companies in the cluster space, cause their days of extortion are > winding down rather rapidly, imho. I disagree here. For some workloads, clusters are really great. But SQL databases will remain. > Also, just because the kids are writing the codes, have not figured all > of this out, does not mean that SQL and any abstraction is better that > parallel processing. No way in hell. Cheaper and quicker to set up, > surely true, but never superior to a well design properly coded > distributed solution. That's my point. Workloads where you can split the whole processing into small chunks where the same steps can be performed over a random sized chunk and merging at a later stage will lead to correct results. Then yes. However, I deal with processes and reports where the amount of possible chunks is definitely limited and any theoretical benefit of splitting it over multiple nodes will be lost when having to build a very fancy and complex algorithm to merge all the seperate results back together. This algorithm then also needs to be extensively tested analysed and understood by future developers. The additional cost involved will be prohibitive. > Hence, Douglas is full of > stuffing, except he alludes to the fact that UBER is doing something > much better, beyond what Oracle has an interest in doing, at the last > possible moment in his critique. This is back up by Oracles lethargic > reaction to the data processing market just leaving Oracle to become the > next IBM.... (ymmv). I disagree, UBER is still using a relational database as the storage layer with something custom put over it to make it simpler for the developers. Any abstraction layer will have a negative performance impact. > > It is based on a clever idea, but when > > 2 computers having the same data and logic come up with 2 different > > answers, I wouldn't trust either of them. > > Yep, That the QA of Transactions is rejected and must be resubmitted, > modified or any number of remedies, is quite common in many forms of > software. Voting does not correct errors, except maybe a fractional > rounding up to 1(pass) or down to zero (failure). It does help to > achieve the ACI of ACID It's one way of doing it. But it can also cause extra delays due to having to wait for seperate nodes to finish and then to check if they all agree. > Since billions and billions of these (complex) transactions are > occurring, it is usually just repeated. If it keeps failing then > engineers/coders take a deeper look. Rare statistical anomalies are > auto-scrutinized (that would be replications and voting) and the pushed > to a logical zero or logical one. The complexity comes from having to mould the algorithm into that structure. And additional complexity also makes it more fault-likely. > >> The reason this is important to me (and others?), is that, if this idea > >> (granted there is much more detail to it) is still valid, then it can > >> form the basis for building up superior-ACID processes, that meet or > >> exceed, the properties of an expensive (think Oracle) transaction > >> process on distributed (parallel) or clustered systems, to a degree of > >> accuracy only limited by the limit of the number of odd numbered voter > >> codes involve in the distributed and replicated parts of the > >> transaction. I even added some code where replicated routines were > >> written in different languages, and the results compared to add an > >> additional layer of verification before the voter step. (gotta love > >> assembler?). > > > > You have seen how "democracies" work, right? :) > > Yes I need to shed some light on telecom processing. I never intend to > suggest that voting corrected errors; althoght error correction codes > are usually part of the overall stack. I tried to suggest that all > transactions on phone switches are already (Atomic (pass or fail-redo; > Consistent (replications pass on different hardware pathways to > satisfaction metrics; Isolated via multiple hardware pathways; Durable > passing a voter check scheme and (five nines still is the gold standard > for a system (even mil-spec). > > So the old telecom systems are indeed and infact the heritage for > modern ACID transactions. A lot can be described using 'modern' designs. However, the fact remains that ACID was worked out for databases and not for phone systems. Any sane system will have some form of consistency checks, but the extent where this is done for a data storage layer, like a database, will be different to the extent where this is done for a switching layer, like a router or phone switch. Modern phone switches will not implement a redo. > > The more voters involved, the longer it takes for all the votes to be > > counted. > Wrong! Voters are all run in parallel. For this level of redundancy (to > achieve a QA result of 99.999% system pristine, it is more expensive, > analogous to encryption versus clear text. Nobody, but a business major > would use an excessive number of voters in their switching fabric. > Telecom incompetences, in my experiences, has been the domain of mid > manager too weak to educate upper management on poor ideas many of them > have had and continue to have (Verizon comes to mind, too often). Those incompetencies are usually in the domain of finances and services provided. The basic service of a telecoms company is pretty simple: "Pass data/voice between A and B". There are plenty of proven systems available that can do this. The mistakes are usually of the kind: The system that we bought does not handle the load the salesperson promised. > > With a small number, it might actually still scale, but when you pass a > > magic number (no clue what this would be), the counting time starts to > > exceed any time you might have gained by adding more voters. > > Nope the larger the number, the more expensive. The number of voters > rarely goes above 5, but it could for some sorts of physics problems > (think quantum mechanics and logic not bound to [0 1] whole numbers. > Often logic circuits (constructs for programmers, have "dont care" > states that can be handled in a variety of ways (filters, transforms, > counters etc etc). "don't care" values should always be ignored. Never actually used. (Except for randomizer functionality) > > Also, this, to me, seems to counteract the whole reason for using > > clusters: > > Have different nodes handle a different part of the problem. > > That also occurs. But my point is properly design code for the cluster > can replace ACID functions, offered by Oracle and other over priced > solutions, on standard cluster hardware. All commonly used relational databases have ACID functionality as long as they support transactions. There is no need to only choose a commercial version for that. > The problem with todays > clusters is the vendors that employ the kid-coders, are making things > far more complicated that necessary, so the average linux hacker just > outsources via the cloud. DUMB, insecure and not a wise choice for many > industries. Moving your entire business into the cloud often is. > And sooner or later folks are going to get wise can build > their own clusters that just solve the problems they have. Surely hybrid > clusters will domiant where the owner of the codes does outsource peak > loads and mundance collects of ordinary (non-critical) data. Eg. hybrid solutions... > Vendors > know this and have started another 'smoke and mirrors' campaign called > (brace yourself) 'Unikernels'..... "unikernels" is something a small group came up with... I see no practical benefit for that approach. > Problem with that approach is they > should just be using minized (focused) gentoo on striped and optimize > linux kernels; but that is another lost art from the linux collection I see "unikernels" as basically, running the applications directly on top of a hypervisor. I fail to see how this makes more sense than starting an application directly on top of an OS. The whole reason we have an OS is to avoid having to reinvent the wheel (networking, storage, memory handling,....) for every single program. > > Clusters of multiple compute-nodes is a quick and "simple" way of > > increasing the amount of computational cores to throw at problems that > > can be broken down in a lot of individual steps with minimal > > inter-dependencies. > > And surpass the ACID features of either postgresql or Oracle, and spend > less money (maybe not with you and postgresql on their team)! Large clusters are useful when doing Hadoop ("big data") style things (I mostly work with financial systems and the corresponding data). Storing the entire datawarehouse inside a cluster doesn't work with all the additional requirements. Reports still need to be displayed quickly and a decently configured database is usually more beneficial. Where systems like Exadata really help here is by integrating the underlying storage (SAN) with the actual database servers and doing most of the processing in-memory. Eg. it works like a dedicated and custom build cluster environment specifically for a relational database. > > I say "simple" because I think designing a 1,000 core chip is more > > difficult than building a 1,000-node cluster using single-core, single > > cpu boxes. > Today, you are correct. Tomorrow you will be wrong. In that case, clusters will be obsolete tomorrow. > [1]. Besides once > that chip or VHDL code or whatever is designed, it can be replicated and > resused endlessly. Think ASIC designers, folks to take a fpga project to > completing, An EE can codes on large arrays of DSPs, or a GPU > (think Khronos group) using Vulcan. > > > I would still consider the cluster to be a single "machine". > > Thats the goal. That, in my opinion, that goal has already been achieved. Unless you want ALL machines to be part of the same cluster and all machines being able to push work to the entire cluster... In that case, good luck in achieving this as you then also need to handle "randomly dissapearing nodes" > >> I guess my point is 'Douglas' is full of stuffing, OR that is what folks > >> are doing when they 'role their own solution specifically customized to > >> their specific needs' as he alludes to near the end of his commentary? > > > > The response Douglas linked to is closer to what seems to work when > > dealing > > with large amounts of data. > > > >> (I'd like your opinion of this and maybe some links to current schemes > >> how to have ACID/99.999% accurate transactions on clusters of various > >> architectures.) Douglas, like yourself, writes of these things in a > >> very lucid fashion, so that is why I'm asking you for your thoughts. > > > > The way Uber created the cluster is useful when having 1 node handle all > > the updates and multiple nodes providing read-only access while also > > providing failover functionality. > > SIMD solution, mimic on a cluster? Cool. Hmm.... no. This is load balancing on the data-retrieval side. > >> Robustness of transactions, in a distributed (clustered) environment is > >> fundamental to the usefulness of most codes that are trying to migrate > >> to a cluster based processes in (VM/container/HPC) environments. > > > > Whereas I do consider clusters to be very useful, not all work-loads can > > be > > redesigned to scale properly. > > Today, correct. Tomorrow, I think you are going to be wrong. It's like > the single core, multicore. And 90+% of developers still don't understand how to properly code for multi- threading. Just look at how most applications work on your desktop. They all tend to max out a single core and the other x-1 cores tend to idle... > Granted many old decreped codes had to be > redesigned and coded anew with threads and other modern constructs to > take advantage of newer processing platforms. Intel came with Hyperthreading back in 2005 (or even before). We are now in 2016 and the majority of code is still single-threaded. The problem is, the algorithms that are being used need to be converted to parallel methods. > Sure the same is true with > distributed, but it's far closer than ever. The largest problem with > cluster, is Vendors with agendas, are making things more complicated > than necessary and completely ignoring many fundamental issues, like > kernel stripping and optimizations under the bloated OS they are using. I still want a graphical desktop with full multi media support. I still want to easily plugin a USB device or SD-card and use it immediately,..... That requirement is incompatible with stripping the OS. > >> I do > >> not have the old articles handy but, I'm sure that many/most of those > >> types of inherent processes can be formulated in the algebraic domain, > >> normalized and used to solve decisions often where other forms of > >> advanced logic failed (not that I'm taking a cheap shot at modern > >> programming languages) (wink wink nudge nudge); or at least that's how > >> we did it.... as young whipper_snappers bask in the day... > > > > If you know what you are doing, the language is just a tool. Sometimes a > > hammer is sufficient, other times one might need to use a screwdriver. > > > >> --an_old_farts_logic > > > > Thinking back on how long I've been playing with computers, I wonder how > > long it will be until I am in the "old fart" category? > > Stay young! I run full court hoops all the time with young college > punks; it's one of my greatest joys in life, run with the young > stallions, hacking, pushing, shoving, slicing and taunting other > athletes. Old farts clubs is not something to be proud of, I just like > to share too much...... Hehe.... One is only as old as he/she feels. -- Joost ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber 2016-08-04 10:09 ` J. Roeleveld @ 2016-08-04 17:08 ` james 2016-08-04 19:19 ` R0b0t1 0 siblings, 1 reply; 27+ messages in thread From: james @ 2016-08-04 17:08 UTC (permalink / raw To: gentoo-user On 08/04/2016 05:09 AM, J. Roeleveld wrote: > On Tuesday, August 02, 2016 12:16:32 AM james wrote: >> On 08/01/2016 11:49 AM, J. Roeleveld wrote: >>> On Monday, August 01, 2016 08:43:49 AM james wrote: > > <snipped> > >>>> Way back, when the earth was cooling and we all had dinosaurs for pets, >>>> some of us hacked on AT&T "3B2" unix systems. They were know for their >>>> 'roll back and recovery', triplicated (or more) transaction processes >>>> and 'voters' system to ferret out if a transaction was complete and >>>> correct. There was no ACID, the current 'gold standard' if you believe >>>> what Douglas and other write about concerning databases. <snip> >> Comparing results of codes run on 3 different processors or separate >> machines for agreement withing tolerances, is quite different. The very >> essence of using voting where there a result less that 1.0 (that is >> n-1/n or n-x/n was requisite on identical (replicated) processes all >> returning the same result ( expecting either a 0 or 1) returned. Results >> being logical or within rounding error of acceptance. Surely we need not >> split hairs. I was merely pointing out that the basis telecom systems >> formed the early and of widespread transaction processing industries and >> is the grand daddy of the ACID model/norms/constructs of modern >> transaction processing. > > Hmm... I am having difficulty following how ACID and ensuring results are > correct by double or triple checking are related. Atomicity; Consistency; Isolation, Durability == ACID (so we are all on the same page). Not my thesis. My thesis, inspired by these threads, is that all of these (4) properties of ACID, originated in the telephone networks, as separate issues. When telephonic switching moved from electro-mechanical systems to computers, each of these properties where develop by the telephonic software and equipment providers. Banks followed the switching systems and these (4) ACID properties were realized to be universally useful and instituted and rebranded as 'transactions' Database systems, developed by IBM and other quickly realized the value of ACID properties in all sorts of forms of data movement and modification (ie the transaction). Database developers and vendors did not invent ACID properties. Indeed and in fact those properties were first used collectively in the legacy telephonic systems, best desribed by SS(7). Earlier version are a case study in redundancy and reliability of those early telecom systems. Granted latency was a big problem, that moving from electric circuits to digital circuits was fixed; yet still there was the five-nines of quality (99.999%) wonderful. >> For massively parallel needs, >> distributed processing rules, but it is not trivial > > Agreed. <snip> > >> Another point, there are single big GPUs that can be run as thousands of >> different processors on either FPGA or GPU, granted using SIMD/MIMD >> style processors and thing like 'systolic algorithms' but that sort of >> this is out of scope here. (Vulcan might change that, in an open source >> kind of way, maybe). Furthermore, GPU resources combined with DDR-5 can >> blur the line and may actually be more cost effective for many forms of >> transaction processing, but clusters, in their current forms are very >> much general purpose machines. > > I don't really agree here. For most software, having a really fast CPU helps. > Having a lot of mediocre CPUs means the vast majority isn't doing anything > useful. > Software running on clusters needs to be written with massive parallel > processing in mind. Most developers don't understand this part. Where did you get the idea that folks builing clusters, are not as interested in using the fastest processors possible; dude, that's just failed (non-sequitur)logic. Well this premise of yours is a corollary to my thesis; and the early telecom systems developers were historically 'bad ass' and highly intelligent. It has taken the software development world decades to catch up to key systems attributes of hardware design (redundancy and roll-back and recovery). Now that things are digital, you can run codes on a variety of different hardware to abstract the properties of ACID and supercede ACID, with yet more properties of robust hardware design. (Sadly, even most EE professors are severly lacking in this knowledge). Modern EE experts have most of their magic attributed to European Mathmeticians, but that's another issue, too complex for the average java* coder. Curiously, you can read all about, Hilbert, should you need to scratch that itch.... >> My point:: Douglas is dead wrong about ACID being dominated by Databases, >> for technical reasons, particularly for advanced teams of experts. > > Wikipedia actually disagrees with you: > https://en.wikipedia.org/wiki/ACID > "In computer science, ACID (Atomicity, Consistency, Isolation, Durability) is > a set of properties of database transactions." Exactly. Database vendors got the ideas and components (literals and abstractions from the telephonics industries to get a leg up on moving electronic switching (which already had those key components now referred to as ACID) in hardware. When those electro-mechanical systems move to digital circuits, Bell labs ensure those properties where a closely hend secret wrapped up in the 'unix OS' They did promote ACID in their software and the banks were the other customers were likewise saying YES YES YES, we want telcom ACID level of performance in our (developing) computer software too. But the migration to digital let the 'cat out of the bag' on the wonders of ACID (long before Timothy Leary, just so the Californians among us can keep up!). > In other words, it's related to databases They (vendors) copied it from telecom, and wildly promoted it, very successfully. Combine this with the fact that most US EE programs are abysmally weak (always have been), so now we indeed and in fact have this severe lapse in robust and fault tolerant systems. WHY? Nothing (industrial or commercial) had the "Five-nines" of reliability, but those electro-mechanical telephonic systems. *nothing* Everybody wanted it; hence those (4) components were harvested from telephonics and used as a model for all transactions. Take "atomicity" for example. It has it's roots in "call setup". Dialogic is a pc board vendor (from decades ago) that followed those early systems. Here is a document (from the 70s/80s/?) were they have "40 Atomic Functions" that they use in software to control the hardware for 'call setup and management'. Sure many more documents exist, but they may not be publically available in electronic forms. All of this occurred before those folks that write for Wikipedia were ever born, so they could not possible be aware of these issues and historical precedence. [1] https://www.dialogic.com/webhelp/MSP1010/10.4.0/WebHelp/ppl_dg/l3p_cic.htm One can research each of those four properties and discover how telecom integrated them into the phone system of North America (Europe almost evolved simultaneously). Bell Labs is " the data of ACID"; and it was a tightly held secret as long as possible, to delay the expansion of usage and eventual break up of that legacy monopoly. There are many things in the (legacy) communications world that have not accurately made it's way to digital in a form freely available on the internet. (like signal intercept). Think of all of those hidden antennae arrays in the UK when microwave telecom was all the rage. MCI was a key player on exploiting microwave (another tenant of EE). >> Surely most MBA, HR and Finance types of >> idiots running these new startups would know know a coder from an >> architect, and that is very sad, because a good consultant could have >> probably designed several robust systems in a week or two. Grant few >> consultants has that sort of unbiased integrity, because we all have >> bills to pay and much is getting outsourced... Integrity has always been >> the rarest of qualities, particularly with humanoids...... > > The software Uber uses for their business had to be developed in-house as > there, at least at the time, was nothing available they could use ready-made. > This usually means, they start with something simple they can get running > quickly. If they want to fully design the whole system first, they would never > get anything done. > > Where these projects usually go wrong is that they wait too long with a good > robust design, leading to a near impossibility of actually fixing all the, in > hindsight obvious, design mistakes. > (NOTE: In hindsight, as most of the actual requirements would not be clear on > day 1) I could not agree with you more. The more processors, readily available to codes that know how to use them, in parallel the faster and better and more reliable the systems developed (including the software) will be. Some are working on extremly low latency systems where FPGAs are embedded in general purpose processors (Intel is leading on this). The DoD has been using these systems for decades. Clusters are superior to single (or multicore) systems if these kids knew anything about redundancy and fault tolerance; both which originate in hardware and the telecom industries perfected to the 99.999% robustness level (while IBM drulled on their punch-cards. I know, I was there...... And in my opinion,that was the most important of the collective of reasons why AT&T, it's 10,000+ lawyers and assholes in our government fought so hard to keep early unix expansion out of the hands of the masses. At one point it was easier to get a top-secret clearance than it was to code on those early telecom systems. >>>> and the switch was was configured, then the code would >>>> essentially 'vote' and majority ruled. This is what led to phone calls >>>> (switched phone calls) having variable delays, often in the order of >>>> seconds, mis-connections and other problems we all encountered during >>>> periods of excessive demand. >>> >>> Not sure if that was the cause in the past, but these days it can also >>> still take a few seconds before the other end rings. This is due to the >>> phone-system (all PBXs in the path) needing to setup the routing between >>> both end-points prior to the ring-tone actually starting. >>> When the system is busy, these lookups will take time and can even >>> time-out. (Try wishing everyone you know a happy new year using a wired >>> phone and you'll see what I mean. Mobile phones have a seperate problem >>> at that time) >> I did not intend to argue about the minutia of how a particular Baby >> Bell implemented their SS7 switching systems on unix systems. My point >> was the 'transaction processing' grew out the early telephone network, >> the way I remember it:: ymmv. Banks did dual entry accounting by hand >> and had clerks manually load data sets, then double entry accounting >> became automated and ACID style transaction processing added later. So >> what sql folks refer to as ACID properties, comes from the North >> American switching heritage and eventually the worlds telecom networks, >> eons ago. > > There is a similarity, but where ACID is a way of guaranteeing data integrity, > a phone-switch does not need this. It simply needs to do the routing > correctly. Have you every talked to an old military officer that worked in Intelligence? Like the spy plan incidence over Afganistan, circa 1960 [2]? https://en.wikipedia.org/wiki/1960_U-2_incidentf Data integrity almost caused WW2. WRONG. The fives-nines was so coveted by everyone else that there was a feeding frezy on just how these folks at bell labs pulled it off. Early (1950-1970s) computational systems were abysmal to own or operate and yet the sorry ass phone company had 99.999% perfection (thanks to bell labs)? They provided the T1 and T3 lines in/out of the pentagon. Jealousy was outrageous. Database vendors where struggling with assembler and 'board changeouts' as Rich alluded to. <snip> >>> ACID is about data integrity. The "best 2 out of 3" voting was, in my >>> opinion, a work-around for unreliable hardware. Correct. voting was used as the precursor technology to distributed systems (today it's the cluster), It added to the reliablity and robustness. It provided consistency. It demonstrated that the entire string of what was need for ss7, including call setup, could be replicated and run on a cluster (oops another hardware set).... >> Absolute true. But the fact that a High Reliability in computer >> processing (including the billing) could be replicated performed >> elsewhere and then 'recombined', proves that the need of any ACID >> function can be split up and ran on clusters and achieve ACID standards >> or even better. So my point, is that the cluster, if used wisely, >> will beat the 'dog shit' out of any Oracle fancy-pants database >> maneuvers. Evidence:: Snoracle is now snapping up billion dollar >> companies in the cluster space, cause their days of extortion are >> winding down rather rapidly, imho. > I disagree here. For some workloads, clusters are really great. But SQL > databases will remain. As a subset of distributed processing. Oracle (the champion of databases) is going to atrophy and slip into irrelevance, once kids learn how to supersede ACID with judicious cluster hardware and codes on top of heterogeneous clusters..... Granted any corp with billions and billions and deep (illegal?) relationships with government officals will eventually prosper again.... Once again, EE will light the forward path. >> Also, just because the kids are writing the codes, have not figured all >> of this out, does not mean that SQL and any abstraction is better that >> parallel processing. No way in hell. Cheaper and quicker to set up, >> surely true, but never superior to a well design properly coded >> distributed solution. That's my point. > > Workloads where you can split the whole processing into small chunks where the > same steps can be performed over a random sized chunk and merging at a later > stage will lead to correct results. Then yes. True, but it's not quite as restrictive as you think. Large system, with even just a small bit of parallism integrated into the overall architecture, benefit. Howmuch depends on the designers. We do need more EE coders leading on cluster designs, but the Universities (world wide) have let everyone down. > However, I deal with processes and reports where the amount of possible chunks > is definitely limited and any theoretical benefit of splitting it over multiple > nodes will be lost when having to build a very fancy and complex algorithm to > merge all the seperate results back together. NoSQL is an abysmal failure. SQL need to be a small subset of robust parallel systems design and implementation. The latest venue is 'unikernels'. Cluster will dominate because deep pockets can have the latest and fastest and cheapest hardware, in massive quantities before the commoners even learn how it works. Arm64V8 is a prime example and current example. It's heat loading per unit of processing, blows away Cisc based systems. FPGA can implement any processor or memory structure and can it in microseconds. But these are areas where attornies via the patent system, abuse light-weight competition. > This algorithm then also needs to be extensively tested analysed and > understood by future developers. The additional cost involved will be > prohibitive. Don't we need more jobs? Are you kidding me? That's way large corporations are so vehemently aggressive in these spaces. We have all kinds of 'stem graduates' here in the US that cannot get a stem job. (hence trumps appeal to the middle class:: tarrifs and promote competition at home). > I disagree, UBER is still using a relational database as the storage layer > with something custom put over it to make it simpler for the developers. > Any abstraction layer will have a negative performance impact. Wanna bet that UBER and like minded companies change again and again and again, until they start study of what mathematicians and EE have been doing for a very long time. >>> It is based on a clever idea, but when >>> 2 computers having the same data and logic come up with 2 different >>> answers, I wouldn't trust either of them. This is rare occurance in digital systems. However, when you look at other forms of computational mathematics, tolerances have to be used to get consistency (oops another property of acid showing up in legacy literature). I could not care less about UBER's problems, unless they send some funds my way. BUT, I am willing to share knowledge, so they 'wise up' because fundamentally, I love disruption in the status quo. >> >> Yep, That the QA of Transactions is rejected and must be resubmitted, >> modified or any number of remedies, is quite common in many forms of >> software. Voting does not correct errors, except maybe a fractional >> rounding up to 1(pass) or down to zero (failure). It does help to >> achieve the ACI of ACID > > It's one way of doing it. But it can also cause extra delays due to having to > wait for seperate nodes to finish and then to check if they all agree. Once clusters are prototyped on Cisc systems, Those codes will be rapidly moving to DSPs, GPUs and FPGA and DDR5+. Those with deep pockets will 'smoke' the competition and idiots like Verizon will be trying to make more stupid acquisitions. Folks do know that Verizon sold off billions in data centers, close to fiber highway to by Yahoo, right? (It "pays out" because they are actually dumping hundreds of thousands of legacy employees (trump voters); that's what that transaction is all about. They are still doom to fail, because the software idiots advising Verizon, have no clue about the fundamentals and mathematics of Communications. (very sad state of affair for Verizon). >> Since billions and billions of these (complex) transactions are >> occurring, it is usually just repeated. If it keeps failing then >> engineers/coders take a deeper look. Rare statistical anomalies are >> auto-scrutinized (that would be replications and voting) and the pushed >> to a logical zero or logical one. > > The complexity comes from having to mould the algorithm into that structure. > And additional complexity also makes it more fault-likely. Only during development and beta tests. After a while it will become 'rock solid' and pushed down into the lowest levels of hardware, so it is hidden from the average coder. Here is a billionare, who is quite stealthy, that has done this exact thing most recently. [3] https://www.deshawresearch.com/ [4] https://www.quora.com/unanswered/Computer-Architecture-How-its-like-working-for-DESHAW-RESEARCH-as-an-ASIC-designer-architect <snip> > A lot can be described using 'modern' designs. However, the fact remains that > ACID was worked out for databases and not for phone systems. Any sane system > will have some form of consistency checks, but the extent where this is done > for a data storage layer, like a database, will be different to the extent > where this is done for a switching layer, like a router or phone switch. Please reread my previous posts. You, or anyone can do the individual (and robust) research on the ACID components and the history of telecom. Wikipedia and many other sites have failed you here; sorry. <snip> > Those incompetencies are usually in the domain of finances and services > provided. The basic service of a telecoms company is pretty simple: "Pass > data/voice between A and B". > There are plenty of proven systems available that can do this. The mistakes > are usually of the kind: The system that we bought does not handle the load > the salesperson promised. ON the surface, you are absolutely correct. Mass education is severly thrwated by the entire patent system, grotesque lawyers and legal semantics and the 'bought and sold politicians' from around the globe. (the same folks that brought us globalism). So folks are merely "uneducated" in these matters. Yes these globalists continue to consipre against commoners, around the globe. Education and sharing of hardware and software and mathematics and physics will set the captives free (eventually). This is the essence of WW3 imho. The fact that the masses and even most coders are blissfully unaware of where ACID came from, is a testament to the failure of globalism that provides the protection to the billionaire class of manipulators, imho. > >>> With a small number, it might actually still scale, but when you pass a >>> magic number (no clue what this would be), the counting time starts to >>> exceed any time you might have gained by adding more voters. >> >> Nope the larger the number, the more expensive. The number of voters >> rarely goes above 5, but it could for some sorts of physics problems >> (think quantum mechanics and logic not bound to [0 1] whole numbers. >> Often logic circuits (constructs for programmers, have "dont care" >> states that can be handled in a variety of ways (filters, transforms, >> counters etc etc). > > "don't care" values should always be ignored. Never actually used. (Except for > randomizer functionality) Dude, you need to find some Rf/analog folks and learn about what's going on around "noise" in systems. Once thought to be useless, or a hindrance, it is a fertile ground for innovation, again that the masses are blissfully unaware of. Much is termed "classified" just so you know. > >>> Also, this, to me, seems to counteract the whole reason for using >>> clusters: >>> Have different nodes handle a different part of the problem. >> >> That also occurs. But my point is properly design code for the cluster >> can replace ACID functions, offered by Oracle and other over priced >> solutions, on standard cluster hardware. > > All commonly used relational databases have ACID functionality as long as they > support transactions. There is no need to only choose a commercial version for > that. Like the Chinese, they are brilliant copy cats:: nothing wrong with that (see my take on 100% absolution of all patents, globally. > >> The problem with todays >> clusters is the vendors that employ the kid-coders, are making things >> far more complicated that necessary, so the average linux hacker just >> outsources via the cloud. DUMB, insecure and not a wise choice for many >> industries. > > Moving your entire business into the cloud often is. I could not agree more. HYBRID systems, where the chief architect/designer works exclusively for the custer, is where the future will shake out. All of this idiocy on the masses on the web:: who cares where it is processed. The closer to the node-idiot-user-consumer, the better, mathematically. > >> And sooner or later folks are going to get wise can build >> their own clusters that just solve the problems they have. Surely hybrid >> clusters will domiant where the owner of the codes does outsource peak >> loads and mundance collects of ordinary (non-critical) data. > > Eg. hybrid solutions... Yes yes and HELL YES! In fact gentoo stands out for the quintessential 'unikernel' for distributed processing! >> Vendors know this and have started another 'smoke and mirrors' campaign called >> (brace yourself) 'Unikernels'..... > > "unikernels" is something a small group came up with... I see no practical > benefit for that approach. A minimize gentoo system and an optimize and severly stripped linux kernel is pretty much a unikernel. Docker, the leader in commercialization of containers, knows this and has subsummed Alpine linux. Patients my friend, it will become very clear over time, but not exactly the way the current vendors are portraying unikernels. > >> Problem with that approach is they >> should just be using minized (focused) gentoo on striped and optimize >> linux kernels; but that is another lost art from the linux collection > > I see "unikernels" as basically, running the applications directly on top of a > hypervisor. I fail to see how this makes more sense than starting an > application directly on top of an OS. The whole reason we have an OS is to > avoid having to reinvent the wheel (networking, storage, memory handling,....) > for every single program. (see above response). For the last few years, I have run into an astounding number of brilliant folks that have mastered and use gentoo on a daily basis. The more I learn about clusters, the more I realize why this massive of gentoo folks are so silent on these matters. Strategic business plans, brah. Gentoo is the worlds best kept secret. > >>> Clusters of multiple compute-nodes is a quick and "simple" way of >>> increasing the amount of computational cores to throw at problems that >>> can be broken down in a lot of individual steps with minimal >>> inter-dependencies. >> >> And surpass the ACID features of either postgresql or Oracle, and spend >> less money (maybe not with you and postgresql on their team)! > > Large clusters are useful when doing Hadoop ("big data") style things (I > mostly work with financial systems and the corresponding data). > Storing the entire datawarehouse inside a cluster doesn't work with all the > additional requirements. Reports still need to be displayed quickly and a > decently configured database is usually more beneficial. Where systems like > Exadata really help here is by integrating the underlying storage (SAN) with > the actual database servers and doing most of the processing in-memory. > Eg. it works like a dedicated and custom build cluster environment specifically > for a relational database. There is a revolution in hardware memory technologies. In a few more years massive ram will be an integral part of of the computational hardware (think DDR5 and GPUs currently. Most massive systems can be split up into small systems too. Databse vendors have little incentive to do this for customers. The art of the design and implementation of 'transaction processing' need to return to hardware concepts during this transition. > > >>> I say "simple" because I think designing a 1,000 core chip is more >>> difficult than building a 1,000-node cluster using single-core, single >>> cpu boxes. >> Today, you are correct. Tomorrow you will be wrong. > > In that case, clusters will be obsolete tomorrow. No, the chips and the cluster will be one in the same. Real time sequence stepping in problem->solution domains for things like flight simulation and subsurface fluid management are still grand challenges that are a ways off. The average database solution, even for large commercial/global operations, is going to migrate to clusters. Clusters and storage will continue to migrate to silicon. The biggest problem is the patent system and artificial constructs more commonly known in the business world as "cost barrier to entry" economics. These mostly result from the way the local/state/federal/global laws are implemented and enforced. > >> [1]. Besides once >> that chip or VHDL code or whatever is designed, it can be replicated and >> resused endlessly. Think ASIC designers, folks to take a fpga project to >> completing, An EE can codes on large arrays of DSPs, or a GPU >> (think Khronos group) using Vulcan. >> >>> I would still consider the cluster to be a single "machine". >> >> Thats the goal. > > That, in my opinion, that goal has already been achieved. Unless you want ALL > machines to be part of the same cluster and all machines being able to push > work to the entire cluster... > In that case, good luck in achieving this as you then also need to handle > "randomly dissapearing nodes" I think Brexit and Trump will replace globalism with localism and tariffs. Goverments will fight over the spoils of tariffs to finance their glutony, and locals will figure out how to build and operate everything, locally. So you are correct. I actually am promoting hybrids clusters, so the commoners can ;'suck the brain-marrow' out of walstreet, politicans and the globalists. Once groups of locals learn to be self sufficient, think of them and digital omish, the only function governemnts and globallist provide is national security. Folks that like work can join up and kills folks from other like minded collectives. Most will be extraordinarily happy to provide 100% of what they need, locally. There will be some exchange of material and those less innovative will lag a bit, but that is what globalist should concentrate on:: how to teach those less fortunate how to become self sufficient, locally. > And 90+% of developers still don't understand how to properly code for multi- > threading. Just look at how most applications work on your desktop. They all > tend to max out a single core and the other x-1 cores tend to idle... Wonder why Bill Gates (in his tax-dogging world charities) is not teaching this stuff? Rupert Murdock? Rich Arabs? Chineese? The elites of the world are 'selfish bastards' and use the good work that come from their ranks to further screw up localism (self sufficiency on a local basis). Sooner or later these globalist will have to answer to the masses of local citizens, wherever they are hiding. We have seen the purging of the Republican party. The Democratic Elites are currently undergoing a purging. After Brexit, it will rapidly expand in Europe. Saudis are running scared. Pandemic of locals that want to be self sufficient. Folks are tire of listing to some (asshole) expert that does not live down the street from them. Globalism flies in the face of common-sense, and computational competence is not except. There is latency and much deceptions in the work of computations, but that too will fall (eventually). > >> Granted many old decreped codes had to be >> redesigned and coded anew with threads and other modern constructs to >> take advantage of newer processing platforms. > > Intel came with Hyperthreading back in 2005 (or even before). We are now in > 2016 and the majority of code is still single-threaded. > The problem is, the algorithms that are being used need to be converted to > parallel methods. > >> Sure the same is true with >> distributed, but it's far closer than ever. The largest problem with >> cluster, is Vendors with agendas, are making things more complicated >> than necessary and completely ignoring many fundamental issues, like >> kernel stripping and optimizations under the bloated OS they are using. > > I still want a graphical desktop with full multi media support. I still want > to easily plugin a USB device or SD-card and use it immediately,..... > That requirement is incompatible with stripping the OS. Agreed. And I want to build the hardware on my own 3D printer. I am flexible to try out many offerings when 3D printing looses those patents on using metals and semiconductor materials...... This too will come, hopefully sooner than later and without the shedding of blood.... > >>>> I do >>>> not have the old articles handy but, I'm sure that many/most of those >>>> types of inherent processes can be formulated in the algebraic domain, >>>> normalized and used to solve decisions often where other forms of >>>> advanced logic failed (not that I'm taking a cheap shot at modern >>>> programming languages) (wink wink nudge nudge); or at least that's how >>>> we did it.... as young whipper_snappers bask in the day... >>> >>> If you know what you are doing, the language is just a tool. Sometimes a >>> hammer is sufficient, other times one might need to use a screwdriver. >>> >>>> --an_old_farts_logic >>> >>> Thinking back on how long I've been playing with computers, I wonder how >>> long it will be until I am in the "old fart" category? >> >> Stay young! I run full court hoops all the time with young college >> punks; it's one of my greatest joys in life, run with the young >> stallions, hacking, pushing, shoving, slicing and taunting other >> athletes. Old farts clubs is not something to be proud of, I just like >> to share too much...... > > Hehe.... One is only as old as he/she feels. > > -- > Joost Young kids often show amazing wisdom. The educational processes beat this out of kids. Isolation and localism (aka home schooling) does allow kids to explode on both technical competence and creativity. But this flies in the face of the goals of globalism. When I was young, there was a kid that was brilliant and 100% home schooled by mostly uneducated parents. They lived in the bush of Alaska, hundreds of miles from anyone. Brilliance and innovation are the providence of the youth; just look at all of those young, brilliant minds from post-mid-evil Europe. Mass education just beat those traits right out of all children. Communications and localism will yeild many, many brilliant folks and that is the greatest fear of the globalist, who want to remain in power and have dominion over the masses. It's the classic struggle. The path to a better future is espoused in parallel and distributed and local decision/control, from politics to hardware to software. hth, James ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber 2016-08-04 17:08 ` james @ 2016-08-04 19:19 ` R0b0t1 0 siblings, 0 replies; 27+ messages in thread From: R0b0t1 @ 2016-08-04 19:19 UTC (permalink / raw To: gentoo-user On Thu, Aug 4, 2016 at 12:08 PM, james <garftd@verizon.net> wrote: > > Atomicity; Consistency; Isolation, Durability == ACID (so we are all on the > same page). > > Not my thesis. My thesis, inspired by these threads, is that all of these > (4) properties of ACID, originated in the telephone networks, as separate > issues. https://en.wikipedia.org/wiki/Two_Generals'_Problem http://tvtropes.org/pmwiki/pmwiki.php/Main/OlderThanDirt ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber 2016-08-01 13:43 ` james 2016-08-01 16:49 ` J. Roeleveld @ 2016-08-11 12:43 ` Douglas J Hunley 1 sibling, 0 replies; 27+ messages in thread From: Douglas J Hunley @ 2016-08-11 12:43 UTC (permalink / raw To: Gentoo [-- Attachment #1: Type: text/plain, Size: 859 bytes --] On Mon, Aug 1, 2016 at 9:43 AM, james <garftd@verizon.net> wrote: > I guess my point is 'Douglas' is full of stuffing, OR that is what folks > are doing when they 'role their own solution specifically customized to > their specific needs' as he alludes to near the end of his commentary? (I'd > like your opinion of this and maybe some links to current schemes how to > have ACID/99.999% accurate transactions on clusters of various > architectures.) Douglas, like yourself, writes of these things in a very > lucid fashion, so that is why I'm asking you for your thoughts. Douglas didn't write the damn thing, merely added it to the discussion here. Thank you very much -- { "name": "douglas j hunley", "email": "doug.hunley@gmail.com", "social": [ { "blog": "https://hunleyd.github.io/", "twitter": "@hunleyd" } ] } [-- Attachment #2: Type: text/html, Size: 1520 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber 2016-08-01 7:16 ` J. Roeleveld 2016-08-01 13:43 ` james @ 2016-08-01 15:01 ` Rich Freeman 2016-08-01 17:31 ` J. Roeleveld 2016-08-01 23:18 ` Alan McKinnon 1 sibling, 2 replies; 27+ messages in thread From: Rich Freeman @ 2016-08-01 15:01 UTC (permalink / raw To: gentoo-user On Mon, Aug 1, 2016 at 3:16 AM, J. Roeleveld <joost@antarean.org> wrote: > > Check the link posted by Douglas. > Ubers article has some misunderstandings about the architecture with > conclusions drawn that are, at least also, caused by their database design and > usage. I've read it. I don't think it actually alleges any misunderstandings about the Postgres architecture, but rather that it doesn't perform as well in Uber's design. I don't think it actually alleges that Uber's design is a bad one in any way. But, I'm certainly interested in anything else that develops here... > >> And of course almost any FOSS project could have a bug. I >> don't know if either project does the kind of regression testing to >> reliably detect this sort of issue. > > Not sure either, I do think PostgreSQL does a lot with regression tests. > Obviously they missed that bug. Of course, so did Uber in their internal testing. I've seen a DB bug in production (granted, only one so far) and they aren't pretty. A big issue for Uber is that their transaction rate and DB size is such that they really don't have a practical option of restoring backups. Obviously they'd do that in a complete disaster, but short of that they can't really afford to do so. By the time a backup is recorded it would be incredibly out of date. They have the same issue with the lack of online upgrades (which the responding article doesn't really talk about). They really need it to just work all the time. >> I'd think that it is more likely >> that the likes of Oracle would (for their flagship DB (not for MySQL), > > Never worked with Oracle (or other big software vendors), have you? :) Actually, I almost exclusively work with them. Some are better than others. I don't work directly with Oracle, but I can say that the two times I've worked with an Oracle consultant they've been worth their weight in gold, and cost about as much. The one was fixing some kind of RDB data corruption on a VAX that was easily a decade out of date at the time; I was shocked that they could find somebody who knew how to fix it. interestingly, it looks like they only abandoned RDB recently. They do tend to be a solution that involves throwing money at problems. My employer was having issues with a database from another big software vendor which I'm sure was the result of bad application design, but throwing Exadata at it did solve the problem, at an astonishing price. Neither my employer nor the big software provider in question is likely to attract top-notch DB talent (indeed, mine has steadily gotten rid of anybody who knows how to do anything in Oracle beyond creating schemas it seems, though I can only imagine how much they pay annually in their license fees; and yes, I'm sure 99.9% of what they use Oracle (or SQL Server) for would work just fine in Postgres). > > Only if you're a big (as in, spend a lot of money with them) customer. > So, we are that (and I think a few of our IT execs used to be Oracle employees, which I'm sure isn't hurting their business). I'll admit that Uber might not get the same attention. Seems like Oracle is the solution at work from everything to software that runs the entire company to software that hosts one table for 10 employees (well, when somebody notices and gets it out of Access). Well, unless it involves an MS-oriented dev or Sharepoint, in which case somebody inevitably wants it on SQL Server. I did mention that we're not a world-class IT shop, didn't I? -- Rich ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber 2016-08-01 15:01 ` Rich Freeman @ 2016-08-01 17:31 ` J. Roeleveld 2016-08-02 1:07 ` Rich Freeman 2016-08-01 23:18 ` Alan McKinnon 1 sibling, 1 reply; 27+ messages in thread From: J. Roeleveld @ 2016-08-01 17:31 UTC (permalink / raw To: gentoo-user On Monday, August 01, 2016 11:01:28 AM Rich Freeman wrote: > On Mon, Aug 1, 2016 at 3:16 AM, J. Roeleveld <joost@antarean.org> wrote: > > Check the link posted by Douglas. > > Ubers article has some misunderstandings about the architecture with > > conclusions drawn that are, at least also, caused by their database design > > and usage. > > I've read it. I don't think it actually alleges any misunderstandings > about the Postgres architecture, but rather that it doesn't perform as > well in Uber's design. I don't think it actually alleges that Uber's > design is a bad one in any way. It was written quite diplomatic. Seeing the create table for the sample tables already make me wonder how they designed their database schema. Especially from a performance point of view. But that is a seperate discussion :) > But, I'm certainly interested in anything else that develops here... Same here, and I am hoping some others will also come up with some interesting bits. > >> And of course almost any FOSS project could have a bug. I > >> don't know if either project does the kind of regression testing to > >> reliably detect this sort of issue. > > > > Not sure either, I do think PostgreSQL does a lot with regression tests. > > Obviously they missed that bug. Of course, so did Uber in their > internal testing. I've seen a DB bug in production (granted, only one > so far) and they aren't pretty. A big issue for Uber is that their > transaction rate and DB size is such that they really don't have a > practical option of restoring backups. From the slides on their migration from MySQL to PostgreSQL in 2013, I see it took them 45 minutes to migrate 50GB of data. To me, that seems like a very bad transfer-rate for, what I would consider, a dev environment. It's only about 20MB/s. I've seen "bad performing" ETL processes reading from 300GB of XML files and loading that into 3 DB-tables within 1.5 hours. That's about 57MB/s. With the XML-engine using up nearly 98% of the total CPU-load. If the data would have been supplied in CSV-files, it would have been roughly 100GB of data. This could be easily loaded within 20 minutes. Equalling to 85MB/s. (Filling up the network bandwidth) I think their database design and infrastructure isn't optimized for their specific work-load. Which is, unfortunately, quite common. > Obviously they'd do that in a > complete disaster, but short of that they can't really afford to do > so. By the time a backup is recorded it would be incredibly out of > date. They have the same issue with the lack of online upgrades > (which the responding article doesn't really talk about). They really > need it to just work all the time. When I migrate a Postgresql to a new major version, I migrate 1 database at a time to minimize downtime. This is done by piping the output of the backup- process straight into a restore-proces connected to the new server. If it were even more time-critical, I would develop a migration proces that would: 1) copy all the current (as in, needed today) to the new database 2) disable the application 3) copy all the latest changes for today to the new database 4) reenable the application (pointing to new database) 5) copy all the historical data I might need I would add a note on the website and send out an email first informing the customers that the data is being migrated and historical data might be incomplete during this proces. > >> I'd think that it is more likely > >> that the likes of Oracle would (for their flagship DB (not for MySQL), > > > > Never worked with Oracle (or other big software vendors), have you? :) > > Actually, I almost exclusively work with them. Some are better than > others. I don't work directly with Oracle, but I can say that the two > times I've worked with an Oracle consultant they've been worth their > weight in gold, and cost about as much. They do have some good ones... > The one was fixing some kind > of RDB data corruption on a VAX that was easily a decade out of date > at the time; I was shocked that they could find somebody who knew how > to fix it. interestingly, it looks like they only abandoned RDB > recently. Probably one of the few people in the world. And he/she might have been hired in by Oracle for this particular issue. > They do tend to be a solution that involves throwing money at > problems. My employer was having issues with a database from another > big software vendor which I'm sure was the result of bad application > design, but throwing Exadata at it did solve the problem, at an > astonishing price. I was at Collaborate last year and spoke to some of the guys from Oracle. (Not going into specifics to protect their jobs). When asked if one of my customers should be using Oracle RAC or Exadata, the answer came down to: "If you think RAC might be sufficient, it usually is" Exadata, however, is a really nice design. But throwing faster machines at a problem should only be part of the solution. I know someone who claims he can make a "standard" Oracle database outperform an Exadata database. That claim is based on the (usually true) assumption that databases are not designed for performance. Mind, if the same tricks would be done on an Exadata environment, you'd see phenominal performance. > Neither my employer nor the big software provider > in question is likely to attract top-notch DB talent (indeed, mine has > steadily gotten rid of anybody who knows how to do anything in Oracle > beyond creating schemas it seems, Actively? Or by simply letting the good ones go while replacing them with someone less clued up? > though I can only imagine how much > they pay annually in their license fees; and yes, I'm sure 99.9% of > what they use Oracle (or SQL Server) for would work just fine in > Postgres). That is my feeling as well. The problem is that the likes of Informatica (one of the leading ETL software vendors) don't actually support PostgreSQL. That is a bit of a downside. I'd need to use ODBC (yes, that also works on non-MS Windows) to connect. > > Only if you're a big (as in, spend a lot of money with them) customer. > > So, we are that (and I think a few of our IT execs used to be Oracle > employees, which I'm sure isn't hurting their business). I actually didn't join Oracle. I did, however, used to work for one of the companies Oracle bought. I decided not to wait for the inevitable job cuts. In hindsight, that one wasn't too bad as they actually kept that part for nearly 8 years. > I'll admit > that Uber might not get the same attention. Seems like Oracle is the > solution at work from everything to software that runs the entire > company to software that hosts one table for 10 employees (well, when > somebody notices and gets it out of Access). Don't forget the Finance departments. They tend to use Excel files for everything. > Well, unless it involves > an MS-oriented dev or Sharepoint, in which case somebody inevitably > wants it on SQL Server. I did mention that we're not a world-class IT > shop, didn't I? I won't actually name companies, but I've seen plenty of big ones that would fit your description. So not sure what a "world-class" IT shop would look like when having to deal with the internal politics, bureaucracy and procedures that come as standard with big companies. -- Joost ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber 2016-08-01 17:31 ` J. Roeleveld @ 2016-08-02 1:07 ` Rich Freeman 2016-08-02 7:03 ` J. Roeleveld 0 siblings, 1 reply; 27+ messages in thread From: Rich Freeman @ 2016-08-02 1:07 UTC (permalink / raw To: gentoo-user On Mon, Aug 1, 2016 at 1:31 PM, J. Roeleveld <joost@antarean.org> wrote: > On Monday, August 01, 2016 11:01:28 AM Rich Freeman wrote: >> Neither my employer nor the big software provider >> in question is likely to attract top-notch DB talent (indeed, mine has >> steadily gotten rid of anybody who knows how to do anything in Oracle >> beyond creating schemas it seems, > > Actively? Or by simply letting the good ones go while replacing them with > someone less clued up? A bit of both. A big part of it was probably sacking anybody doing anything other than creating tables (since you can't keep operating without that), and outsourcing to 3rd parties and wanting bottom-dollar prices. There are accidentally some reasonably competent people in IT at my company, but I don't think it is because we really are good at targeting world-class talent. > > The problem is that the likes of Informatica (one > of the leading ETL software vendors) don't actually support PostgreSQL. Please tell me that it actually does support xml in a sane way, and it is only our incompetent developers who seem to be hand-generating xml files by printing strings? I have an integration that involves Informatica, and another solution that just synchronizes files from an smb share to a foreign FTP site. Of course I don't have access to the share that lies in-between, so when the interface breaks I get to play with two different groups to try to figure out where the process died. Informatica appears to be running on Unix and I get helpful questions from the maintainers about what path the files are on, as if I'd have any idea where some SMB share (whose path I am not told) is mounted on some Unix server I have no access to. Gotta love division of labor. Heaven forbid anybody have visibility to the full picture so that the right group can be engaged on the first try... -- Rich ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber 2016-08-02 1:07 ` Rich Freeman @ 2016-08-02 7:03 ` J. Roeleveld 0 siblings, 0 replies; 27+ messages in thread From: J. Roeleveld @ 2016-08-02 7:03 UTC (permalink / raw To: gentoo-user On Monday, August 01, 2016 09:07:05 PM Rich Freeman wrote: > On Mon, Aug 1, 2016 at 1:31 PM, J. Roeleveld <joost@antarean.org> wrote: > > On Monday, August 01, 2016 11:01:28 AM Rich Freeman wrote: > >> Neither my employer nor the big software provider > >> in question is likely to attract top-notch DB talent (indeed, mine has > >> steadily gotten rid of anybody who knows how to do anything in Oracle > >> beyond creating schemas it seems, > > > > Actively? Or by simply letting the good ones go while replacing them with > > someone less clued up? > > A bit of both. A big part of it was probably sacking anybody doing > anything other than creating tables (since you can't keep operating > without that), and outsourcing to 3rd parties and wanting > bottom-dollar prices. Yes, one of the more common decisions. Often because the person hired to handle the department comes from an outsourcing company or because they happen to meet at the golf course. > There are accidentally some reasonably competent people in IT at my > company, but I don't think it is because we really are good at > targeting world-class talent. I wonder which companies are actually good at that? > > The problem is that the likes of Informatica (one > > of the leading ETL software vendors) don't actually support PostgreSQL. > > Please tell me that it actually does support xml in a sane way, and it > is only our incompetent developers who seem to be hand-generating xml > files by printing strings? <OT> There are actually 2 supported methods (not counting randomly sticking strings together): 1) The default XML handling (source/target and transformation). This sort-of works for "simple" XML files. The definition for "simple" is in the sales- contract: No more then ?? levels deep, XSD less then ???MB and XML file less than ???MB. I don't remember the actual numbers, but check with whoever has the actual contract in your company. It should be listed there or call Informatica support. 2) B2B / UDO. The UDO stands for Unstructured Data Option. Bit strange, but that's where it lives. It's a proper XML handling engine that should be able to handle any XML you care to throw at it. Also documents with a standardised layout. It's the preferred method of handling XML files with Informatica. (Do use at least 9.6.1 for this. 9.5 has a very annoying feature... </OT> > I have an integration that involves Informatica, and another solution > that just synchronizes files from an smb share to a foreign FTP site. > Of course I don't have access to the share that lies in-between, so > when the interface breaks I get to play with two different groups to > try to figure out where the process died. Informatica appears to be > running on Unix and I get helpful questions from the maintainers about > what path the files are on, as if I'd have any idea where some SMB > share (whose path I am not told) is mounted on some Unix server I have > no access to. Check the session-log (from Informatica), that should contain the actual path Informatica uses to write the file to. > Gotta love division of labor. Heaven forbid anybody have visibility > to the full picture so that the right group can be engaged on the > first try... I see this all too often. They usually claim it's because of security. Not understanding that by obscuring all the details, the first person to get the full picture is the one going to cause havoc and the people that are then tasked to fix it, don't know enough to do it right in a reasonable time-frame. -- Joost ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber 2016-08-01 15:01 ` Rich Freeman 2016-08-01 17:31 ` J. Roeleveld @ 2016-08-01 23:18 ` Alan McKinnon 2016-08-02 0:55 ` Rich Freeman 1 sibling, 1 reply; 27+ messages in thread From: Alan McKinnon @ 2016-08-01 23:18 UTC (permalink / raw To: gentoo-user On 01/08/2016 17:01, Rich Freeman wrote: > On Mon, Aug 1, 2016 at 3:16 AM, J. Roeleveld <joost@antarean.org> wrote: >> > >> > Check the link posted by Douglas. >> > Ubers article has some misunderstandings about the architecture with >> > conclusions drawn that are, at least also, caused by their database design and >> > usage. > I've read it. I don't think it actually alleges any misunderstandings > about the Postgres architecture, but rather that it doesn't perform as > well in Uber's design. I don't think it actually alleges that Uber's > design is a bad one in any way. He does also make the stinger at the end: On 2013 Uber migrated FROM mysql TO postgres, and now in 2016 they migrated FROM postgres TO Schemaless (with just happens to have InnoDB as backend). So the original article very much seems to have been written with a skewed bias and wrong focus. That's bias as in "shifted to one side as used in math" not bias as in "opinionated asshat beating some special drum" ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber 2016-08-01 23:18 ` Alan McKinnon @ 2016-08-02 0:55 ` Rich Freeman 0 siblings, 0 replies; 27+ messages in thread From: Rich Freeman @ 2016-08-02 0:55 UTC (permalink / raw To: gentoo-user On Mon, Aug 1, 2016 at 7:18 PM, Alan McKinnon <alan.mckinnon@gmail.com> wrote: > > So the original article very much seems to have been written with a skewed > bias and wrong focus. That's bias as in "shifted to one side as used in > math" not bias as in "opinionated asshat beating some special drum" > Well, I wouldn't say "wrong focus" so much as "particular focus." The original article doesn't really purport to be a holistic comparison of the two systems, just an explanation of why they're migrating. I think people are reading a bit too much into it. However, the original article would probably benefit from a few caveats thrown in. -- Rich ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber 2016-07-29 20:58 [gentoo-user] PostgreSQL Vs MySQL @Uber Mick 2016-07-29 22:24 ` Alan McKinnon @ 2016-08-02 17:49 ` Rich Freeman 1 sibling, 0 replies; 27+ messages in thread From: Rich Freeman @ 2016-08-02 17:49 UTC (permalink / raw To: gentoo-user On Fri, Jul 29, 2016 at 4:58 PM, Mick <michaelkintzios@gmail.com> wrote: > Interesting article explaining why Uber are moving away from PostgreSQL. I am > running both DBs on different desktop PCs for akonadi and I'm also running > MySQL on a number of websites. Let's which one goes sideways first. :p > > https://eng.uber.com/mysql-migration/ > There is a thread on this on the Postgres lists as well (unsurprisingly): https://www.postgresql.org/message-id/flat/579795DF.10502%40commandprompt.com#579795DF.10502@commandprompt.com I'm only halfway through it but the Postgres devs strike me as being very levelheaded and competent. They seem to acknowledge the genuine issues and point of some of the tradeoffs that Uber is making without pointing them out. One thing I really did like about the Uber post was that even if it isn't a complete/fair comparison/etc it is really informative as an introduction into how some of the architecture works. The same applies to much of the Postgres thread. I found it really useful for understanding how both indexing/replication solutions work under the hood. -- Rich ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2016-08-12 18:01 UTC | newest] Thread overview: 27+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-07-29 20:58 [gentoo-user] PostgreSQL Vs MySQL @Uber Mick 2016-07-29 22:24 ` Alan McKinnon 2016-07-29 22:38 ` Rich Freeman 2016-07-29 23:01 ` Mick 2016-08-01 1:48 ` Douglas J Hunley 2016-08-01 7:16 ` J. Roeleveld 2016-08-01 13:43 ` james 2016-08-01 16:49 ` J. Roeleveld 2016-08-01 18:03 ` Rich Freeman 2016-08-02 5:51 ` james 2016-08-11 12:48 ` Douglas J Hunley 2016-08-12 13:00 ` james 2016-08-12 14:13 ` R0b0t1 2016-08-12 14:15 ` R0b0t1 2016-08-12 18:01 ` james 2016-08-02 5:16 ` james 2016-08-04 10:09 ` J. Roeleveld 2016-08-04 17:08 ` james 2016-08-04 19:19 ` R0b0t1 2016-08-11 12:43 ` Douglas J Hunley 2016-08-01 15:01 ` Rich Freeman 2016-08-01 17:31 ` J. Roeleveld 2016-08-02 1:07 ` Rich Freeman 2016-08-02 7:03 ` J. Roeleveld 2016-08-01 23:18 ` Alan McKinnon 2016-08-02 0:55 ` Rich Freeman 2016-08-02 17:49 ` Rich Freeman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox