[gentoo-user] PostgreSQL Vs MySQL @Uber

public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed

* [gentoo-user] PostgreSQL Vs MySQL @Uber
@ 2016-07-29 20:58 Mick
  2016-07-29 22:24 ` Alan McKinnon
  2016-08-02 17:49 ` Rich Freeman
  0 siblings, 2 replies; 27+ messages in thread
From: Mick @ 2016-07-29 20:58 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 295 bytes --]

Interesting article explaining why Uber are moving away from PostgreSQL.  I am 
running both DBs on different desktop PCs for akonadi and I'm also running 
MySQL on a number of websites.  Let's which one goes sideways first.  :p

 https://eng.uber.com/mysql-migration/

-- 
Regards,
Mick

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
  2016-07-29 20:58 [gentoo-user] PostgreSQL Vs MySQL @Uber Mick
@ 2016-07-29 22:24 ` Alan McKinnon
  2016-07-29 22:38   ` Rich Freeman
  2016-08-02 17:49 ` Rich Freeman
  1 sibling, 1 reply; 27+ messages in thread
From: Alan McKinnon @ 2016-07-29 22:24 UTC (permalink / raw
  To: gentoo-user

On 29/07/2016 22:58, Mick wrote:
> Interesting article explaining why Uber are moving away from PostgreSQL.  I am
> running both DBs on different desktop PCs for akonadi and I'm also running
> MySQL on a number of websites.  Let's which one goes sideways first.  :p
>
>  https://eng.uber.com/mysql-migration/
>


I don't think your akonadi and some web sites compares in any way to 
Uber and what they do.

FWIW, my Dev colleagues support and entire large corporate ISP's 
operational and customer data on PostgreSQL-9.3. With clustering. With 
no db-related issues :-)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
  2016-07-29 22:24 ` Alan McKinnon
@ 2016-07-29 22:38   ` Rich Freeman
  2016-07-29 23:01     ` Mick
  2016-08-01  7:16     ` J. Roeleveld
  0 siblings, 2 replies; 27+ messages in thread
From: Rich Freeman @ 2016-07-29 22:38 UTC (permalink / raw
  To: gentoo-user

On Sat, Jul 30, 2016 at 6:24 AM, Alan McKinnon <alan.mckinnon@gmail.com> wrote:
> On 29/07/2016 22:58, Mick wrote:
>>
>> Interesting article explaining why Uber are moving away from PostgreSQL.
>> I am
>> running both DBs on different desktop PCs for akonadi and I'm also running
>> MySQL on a number of websites.  Let's which one goes sideways first.  :p
>>
>>  https://eng.uber.com/mysql-migration/
>>
>
>
> I don't think your akonadi and some web sites compares in any way to Uber
> and what they do.
>
> FWIW, my Dev colleagues support and entire large corporate ISP's operational
> and customer data on PostgreSQL-9.3. With clustering. With no db-related
> issues :-)
>

Agree, you'd need to be fairly large-scale to have their issues, but I
think the article was something anybody interested in databases should
read.  If nothing else it is a really easy to follow explanation of
the underlying architectures.

I'll probably post this to my LUG mailing list.  I think one of the
Postgres devs lurks there so I'm curious to his impressions.

I was a bit surprised to hear about the data corruption bug.  I've
always considered Postgres to have a better reputation for data
integrity.  And of course almost any FOSS project could have a bug.  I
don't know if either project does the kind of regression testing to
reliably detect this sort of issue.  I'd think that it is more likely
that the likes of Oracle would (for their flagship DB (not for MySQL),
and they'd probably be more likely to send out an engineer to beg
forgiveness while they fix your database).  Of course, if you're Uber
the hit you'd take from downtime/etc isn't made up for entirely by
having somebody take a few days to get everything fixed.

-- 
Rich

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
  2016-07-29 22:38   ` Rich Freeman
@ 2016-07-29 23:01     ` Mick
  2016-08-01  1:48       ` Douglas J Hunley
  2016-08-01  7:16     ` J. Roeleveld
  1 sibling, 1 reply; 27+ messages in thread
From: Mick @ 2016-07-29 23:01 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 1452 bytes --]

On Saturday 30 Jul 2016 06:38:01 Rich Freeman wrote:
> On Sat, Jul 30, 2016 at 6:24 AM, Alan McKinnon <alan.mckinnon@gmail.com> 
wrote:
> > On 29/07/2016 22:58, Mick wrote:
> >> Interesting article explaining why Uber are moving away from PostgreSQL.
> >> I am
> >> running both DBs on different desktop PCs for akonadi and I'm also
> >> running
> >> MySQL on a number of websites.  Let's which one goes sideways first.  :p
> >> 
> >>  https://eng.uber.com/mysql-migration/
> > 
> > I don't think your akonadi and some web sites compares in any way to Uber
> > and what they do.
> > 
> > FWIW, my Dev colleagues support and entire large corporate ISP's
> > operational and customer data on PostgreSQL-9.3. With clustering. With no
> > db-related issues :-)
> 
> Agree, you'd need to be fairly large-scale to have their issues, but I
> think the article was something anybody interested in databases should
> read.  If nothing else it is a really easy to follow explanation of
> the underlying architectures.
> 
> I'll probably post this to my LUG mailing list.  I think one of the
> Postgres devs lurks there so I'm curious to his impressions.
> 
> I was a bit surprised to hear about the data corruption bug.  I've
> always considered Postgres to have a better reputation for data
> integrity.  

Yes, same here, I would be interested to hear what the Postgres dev says, 
should he respond to it.

-- 
Regards,
Mick

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
  2016-07-29 23:01     ` Mick
@ 2016-08-01  1:48       ` Douglas J Hunley
  0 siblings, 0 replies; 27+ messages in thread
From: Douglas J Hunley @ 2016-08-01  1:48 UTC (permalink / raw
  To: Gentoo

[-- Attachment #1: Type: text/plain, Size: 420 bytes --]

On Fri, Jul 29, 2016 at 7:01 PM, Mick <michaelkintzios@gmail.com> wrote:

> Yes, same here, I would be interested to hear what the Postgres dev says,
> should he respond to it.
>

One PostgreSQL dev's response - https://t.co/LfPlIPWulc


-- 
{
  "name": "douglas j hunley",
  "email": "doug.hunley@gmail.com",
  "social": [
    {
        "blog": "https://hunleyd.github.io/",
        "twitter": "@hunleyd"
    }
    ]
}

[-- Attachment #2: Type: text/html, Size: 1182 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
  2016-07-29 22:38   ` Rich Freeman
  2016-07-29 23:01     ` Mick
@ 2016-08-01  7:16     ` J. Roeleveld
  2016-08-01 13:43       ` james
  2016-08-01 15:01       ` Rich Freeman
  1 sibling, 2 replies; 27+ messages in thread
From: J. Roeleveld @ 2016-08-01  7:16 UTC (permalink / raw
  To: gentoo-user

On Saturday, July 30, 2016 06:38:01 AM Rich Freeman wrote:
> On Sat, Jul 30, 2016 at 6:24 AM, Alan McKinnon <alan.mckinnon@gmail.com> 
wrote:
> > On 29/07/2016 22:58, Mick wrote:
> >> Interesting article explaining why Uber are moving away from PostgreSQL.
> >> I am
> >> running both DBs on different desktop PCs for akonadi and I'm also
> >> running
> >> MySQL on a number of websites.  Let's which one goes sideways first.  :p
> >> 
> >>  https://eng.uber.com/mysql-migration/
> > 
> > I don't think your akonadi and some web sites compares in any way to Uber
> > and what they do.
> > 
> > FWIW, my Dev colleagues support and entire large corporate ISP's
> > operational and customer data on PostgreSQL-9.3. With clustering. With no
> > db-related issues :-)
> 
> Agree, you'd need to be fairly large-scale to have their issues,

And also have to design your database by people who think MySQL actually 
follows common SQL standards.

> but I
> think the article was something anybody interested in databases should
> read.  If nothing else it is a really easy to follow explanation of
> the underlying architectures.

Check the link posted by Douglas.
Ubers article has some misunderstandings about the architecture with 
conclusions drawn that are, at least also, caused by their database design and 
usage.

> I'll probably post this to my LUG mailing list.  I think one of the
> Postgres devs lurks there so I'm curious to his impressions.
> 
> I was a bit surprised to hear about the data corruption bug.  I've
> always considered Postgres to have a better reputation for data
> integrity.

They do.

> And of course almost any FOSS project could have a bug.  I
> don't know if either project does the kind of regression testing to
> reliably detect this sort of issue.

Not sure either, I do think PostgreSQL does a lot with regression tests.

> I'd think that it is more likely
> that the likes of Oracle would (for their flagship DB (not for MySQL),

Never worked with Oracle (or other big software vendors), have you? :)

> and they'd probably be more likely to send out an engineer to beg
> forgiveness while they fix your database).

Only if you're a big (as in, spend a lot of money with them) customer.

> Of course, if you're Uber
> the hit you'd take from downtime/etc isn't made up for entirely by
> having somebody take a few days to get everything fixed.

--
Joost


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
  2016-08-01  7:16     ` J. Roeleveld
@ 2016-08-01 13:43       ` james
  2016-08-01 16:49         ` J. Roeleveld
  2016-08-11 12:43         ` Douglas J Hunley
  2016-08-01 15:01       ` Rich Freeman
  1 sibling, 2 replies; 27+ messages in thread
From: james @ 2016-08-01 13:43 UTC (permalink / raw
  To: gentoo-user

On 08/01/2016 02:16 AM, J. Roeleveld wrote:
> On Saturday, July 30, 2016 06:38:01 AM Rich Freeman wrote:
>> On Sat, Jul 30, 2016 at 6:24 AM, Alan McKinnon <alan.mckinnon@gmail.com>
> wrote:
>>> On 29/07/2016 22:58, Mick wrote:
>>>> Interesting article explaining why Uber are moving away from PostgreSQL.
>>>> I am
>>>> running both DBs on different desktop PCs for akonadi and I'm also
>>>> running
>>>> MySQL on a number of websites.  Let's which one goes sideways first.  :p
>>>>
>>>>  https://eng.uber.com/mysql-migration/
>>>
>>> I don't think your akonadi and some web sites compares in any way to Uber
>>> and what they do.
>>>
>>> FWIW, my Dev colleagues support and entire large corporate ISP's
>>> operational and customer data on PostgreSQL-9.3. With clustering. With no
>>> db-related issues :-)
>>
>> Agree, you'd need to be fairly large-scale to have their issues,
>
> And also have to design your database by people who think MySQL actually
> follows common SQL standards.
>
>> but I
>> think the article was something anybody interested in databases should
>> read.  If nothing else it is a really easy to follow explanation of
>> the underlying architectures.
>
> Check the link posted by Douglas.
> Ubers article has some misunderstandings about the architecture with
> conclusions drawn that are, at least also, caused by their database design and
> usage.
>
>> I'll probably post this to my LUG mailing list.  I think one of the
>> Postgres devs lurks there so I'm curious to his impressions.
>>
>> I was a bit surprised to hear about the data corruption bug.  I've
>> always considered Postgres to have a better reputation for data
>> integrity.
>
> They do.
>
>> And of course almost any FOSS project could have a bug.  I
>> don't know if either project does the kind of regression testing to
>> reliably detect this sort of issue.
>
> Not sure either, I do think PostgreSQL does a lot with regression tests.
>
>> I'd think that it is more likely
>> that the likes of Oracle would (for their flagship DB (not for MySQL),
>
> Never worked with Oracle (or other big software vendors), have you? :)
>
>> and they'd probably be more likely to send out an engineer to beg
>> forgiveness while they fix your database).
>
> Only if you're a big (as in, spend a lot of money with them) customer.
>
>> Of course, if you're Uber
>> the hit you'd take from downtime/etc isn't made up for entirely by
>> having somebody take a few days to get everything fixed.
>
> --
> Joost
>
>

I certainly respect your skills and posts on Databases, Joost, as 
everything you have posted, in the past is 'spot on'. Granted, I'm no 
database expert, far from it. But I want to share a few thing with you, 
and hope you  (and others) will 'chime in' on these comments.

Way back, when the earth was cooling and we all had dinosaurs for pets,
some of us hacked on AT&T "3B2" unix systems. They were know for their
'roll back and recovery', triplicated (or more) transaction processes 
and 'voters' system to ferret out if a transaction was complete and 
correct. There was no ACID, the current 'gold standard' if you believe 
what Douglas and other write about concerning databases.

In essence, (from crusted up memories) a basic (SS7) transaction related 
to the local telephone switch, was ran  on 3 machines. The results were 
compared. If they matched, the transaction went forward as valid. If 2/3 
matched, and the switch was was configured, then the code would 
essentially 'vote' and majority ruled. This is what led to phone calls 
(switched phone calls) having variable delays, often in the order of 
seconds, mis-connections and other problems we all encountered during 
periods of excessive demand.

That scenario was at the heart of how old, crappy AT&T unix (SVR?) could 
perform so well and therefore established the gold standard for RT 
transaction processing, aka the "five  9s" 99.999% of up-time (about 5 
minutes per year of downtime). Sure this part is only related to 
transaction processing as there was much more to the "five 9s" legacy, 
but imho, that is the heart of what was the precursor to ACID property's 
now so greatly espoused in SQL codes that Douglas refers to.

Do folks concur or disagree at this point?

The reason this is important to me (and others?), is that, if this idea 
(granted there is much more detail to it) is still valid, then it can 
form  the basis for building up superior-ACID processes, that meet or 
exceed, the properties of an expensive (think Oracle) transaction 
process on distributed (parallel) or clustered systems, to a degree of 
accuracy only limited by the limit of the number of odd numbered voter 
codes involve in the distributed and replicated parts of the 
transaction. I even added some code where replicated routines were 
written in different languages, and the results compared to add an 
additional layer of verification before the voter step. (gotta love 
assembler?).

I guess my point is 'Douglas' is full of stuffing, OR that is what folks 
are doing when they 'role their own solution specifically customized to 
their specific needs' as he alludes to near the end of his commentary? 
(I'd like your opinion of this and maybe some links to current schemes 
how to have ACID/99.999% accurate transactions on clusters of various 
architectures.)  Douglas, like yourself, writes of these things in a 
very lucid fashion, so that is why I'm asking you for your thoughts.

Robustness of transactions, in a distributed (clustered) environment is
fundamental to the usefulness of most codes that are trying to migrate 
to a cluster based processes in (VM/container/HPC) environments. I do 
not have the old articles handy but, I'm sure that many/most of those 
types of inherent processes can be formulated in the algebraic domain, 
normalized and used to solve decisions often where other forms of 
advanced logic failed (not that I'm taking a cheap shot at modern 
programming languages) (wink wink nudge nudge); or at least that's how 
we did it.... as young whipper_snappers bask in the day...

--an_old_farts_logic

curiously,
James

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
  2016-08-01 13:43       ` james
@ 2016-08-01 16:49         ` J. Roeleveld
  2016-08-01 18:03           ` Rich Freeman
  2016-08-02  5:16           ` james
  2016-08-11 12:43         ` Douglas J Hunley
  1 sibling, 2 replies; 27+ messages in thread
From: J. Roeleveld @ 2016-08-01 16:49 UTC (permalink / raw
  To: gentoo-user

On Monday, August 01, 2016 08:43:49 AM james wrote:
> On 08/01/2016 02:16 AM, J. Roeleveld wrote:
> > On Saturday, July 30, 2016 06:38:01 AM Rich Freeman wrote:
> >> On Sat, Jul 30, 2016 at 6:24 AM, Alan McKinnon <alan.mckinnon@gmail.com>
> > 
> > wrote:
> >>> On 29/07/2016 22:58, Mick wrote:
> >>>> Interesting article explaining why Uber are moving away from
> >>>> PostgreSQL.
> >>>> I am
> >>>> running both DBs on different desktop PCs for akonadi and I'm also
> >>>> running
> >>>> MySQL on a number of websites.  Let's which one goes sideways first. 
> >>>> :p
> >>>> 
> >>>>  https://eng.uber.com/mysql-migration/
> >>> 
> >>> I don't think your akonadi and some web sites compares in any way to
> >>> Uber
> >>> and what they do.
> >>> 
> >>> FWIW, my Dev colleagues support and entire large corporate ISP's
> >>> operational and customer data on PostgreSQL-9.3. With clustering. With
> >>> no
> >>> db-related issues :-)
> >> 
> >> Agree, you'd need to be fairly large-scale to have their issues,
> > 
> > And also have to design your database by people who think MySQL actually
> > follows common SQL standards.
> > 
> >> but I
> >> think the article was something anybody interested in databases should
> >> read.  If nothing else it is a really easy to follow explanation of
> >> the underlying architectures.
> > 
> > Check the link posted by Douglas.
> > Ubers article has some misunderstandings about the architecture with
> > conclusions drawn that are, at least also, caused by their database design
> > and usage.
> > 
> >> I'll probably post this to my LUG mailing list.  I think one of the
> >> Postgres devs lurks there so I'm curious to his impressions.
> >> 
> >> I was a bit surprised to hear about the data corruption bug.  I've
> >> always considered Postgres to have a better reputation for data
> >> integrity.
> > 
> > They do.
> > 
> >> And of course almost any FOSS project could have a bug.  I
> >> don't know if either project does the kind of regression testing to
> >> reliably detect this sort of issue.
> > 
> > Not sure either, I do think PostgreSQL does a lot with regression tests.
> > 
> >> I'd think that it is more likely
> >> that the likes of Oracle would (for their flagship DB (not for MySQL),
> > 
> > Never worked with Oracle (or other big software vendors), have you? :)
> > 
> >> and they'd probably be more likely to send out an engineer to beg
> >> forgiveness while they fix your database).
> > 
> > Only if you're a big (as in, spend a lot of money with them) customer.
> > 
> >> Of course, if you're Uber
> >> the hit you'd take from downtime/etc isn't made up for entirely by
> >> having somebody take a few days to get everything fixed.
> > 
> > --
> > Joost
> 
> I certainly respect your skills and posts on Databases, Joost, as
> everything you have posted, in the past is 'spot on'.

Comes with a keen interest and long-term (think decades) of working with 
different databases.

> Granted, I'm no database expert, far from it.

Not many people are, nor do they need to be.

> But I want to share a few thing with you,
> and hope you  (and others) will 'chime in' on these comments.
> 
> Way back, when the earth was cooling and we all had dinosaurs for pets,
> some of us hacked on AT&T "3B2" unix systems. They were know for their
> 'roll back and recovery', triplicated (or more) transaction processes
> and 'voters' system to ferret out if a transaction was complete and
> correct. There was no ACID, the current 'gold standard' if you believe
> what Douglas and other write about concerning databases.
> 
> In essence, (from crusted up memories) a basic (SS7) transaction related
> to the local telephone switch, was ran  on 3 machines. The results were
> compared. If they matched, the transaction went forward as valid. If 2/3
> matched,

And what in the likely case when only 1 was correct?
Have you seen the movie "minority report"?
If yes, think back to why Tom Cruise was found 'guilty' when he wasn't and how 
often this actually occured.

> and the switch was was configured, then the code would
> essentially 'vote' and majority ruled. This is what led to phone calls
> (switched phone calls) having variable delays, often in the order of
> seconds, mis-connections and other problems we all encountered during
> periods of excessive demand.

Not sure if that was the cause in the past, but these days it can also still 
take a few seconds before the other end rings. This is due to the phone-system 
(all PBXs in the path) needing to setup the routing between both end-points 
prior to the ring-tone actually starting.
When the system is busy, these lookups will take time and can even time-out. 
(Try wishing everyone you know a happy new year using a wired phone and you'll 
see what I mean. Mobile phones have a seperate problem at that time)

> That scenario was at the heart of how old, crappy AT&T unix (SVR?) could
> perform so well and therefore established the gold standard for RT
> transaction processing, aka the "five  9s" 99.999% of up-time (about 5
> minutes per year of downtime).

"Unscheduled" downtime. Regular maintenance will require more than 5 minutes 
per year.

> Sure this part is only related to
> transaction processing as there was much more to the "five 9s" legacy,
> but imho, that is the heart of what was the precursor to ACID property's
> now so greatly espoused in SQL codes that Douglas refers to.
> 
> Do folks concur or disagree at this point?

ACID is about data integrity. The "best 2 out of 3" voting was, in my opinion, 
a work-around for unreliable hardware. It is based on a clever idea, but when 
2 computers having the same data and logic come up with 2 different answers, I 
wouldn't trust either of them.

> The reason this is important to me (and others?), is that, if this idea
> (granted there is much more detail to it) is still valid, then it can
> form  the basis for building up superior-ACID processes, that meet or
> exceed, the properties of an expensive (think Oracle) transaction
> process on distributed (parallel) or clustered systems, to a degree of
> accuracy only limited by the limit of the number of odd numbered voter
> codes involve in the distributed and replicated parts of the
> transaction. I even added some code where replicated routines were
> written in different languages, and the results compared to add an
> additional layer of verification before the voter step. (gotta love
> assembler?).

You have seen how "democracies" work, right? :)
The more voters involved, the longer it takes for all the votes to be counted.
With a small number, it might actually still scale, but when you pass a magic 
number (no clue what this would be), the counting time starts to exceed any 
time you might have gained by adding more voters.

Also, this, to me, seems to counteract the whole reason for using clusters: 
Have different nodes handle a different part of the problem.

Clusters of multiple compute-nodes is a quick and "simple" way of increasing 
the amount of computational cores to throw at problems that can be broken down 
in a lot of individual steps with minimal inter-dependencies.
I say "simple" because I think designing a 1,000 core chip is more difficult 
than building a 1,000-node cluster using single-core, single cpu boxes.

I would still consider the cluster to be a single "machine".

> I guess my point is 'Douglas' is full of stuffing, OR that is what folks
> are doing when they 'role their own solution specifically customized to
> their specific needs' as he alludes to near the end of his commentary?

The response Douglas linked to is closer to what seems to work when dealing 
with large amounts of data.

> (I'd like your opinion of this and maybe some links to current schemes
> how to have ACID/99.999% accurate transactions on clusters of various
> architectures.)  Douglas, like yourself, writes of these things in a
> very lucid fashion, so that is why I'm asking you for your thoughts.

The way Uber created the cluster is useful when having 1 node handle all the 
updates and multiple nodes providing read-only access while also providing 
failover functionality.

> Robustness of transactions, in a distributed (clustered) environment is
> fundamental to the usefulness of most codes that are trying to migrate
> to a cluster based processes in (VM/container/HPC) environments.

Whereas I do consider clusters to be very useful, not all work-loads can be 
redesigned to scale properly.

> I do
> not have the old articles handy but, I'm sure that many/most of those
> types of inherent processes can be formulated in the algebraic domain,
> normalized and used to solve decisions often where other forms of
> advanced logic failed (not that I'm taking a cheap shot at modern
> programming languages) (wink wink nudge nudge); or at least that's how
> we did it.... as young whipper_snappers bask in the day...

If you know what you are doing, the language is just a tool. Sometimes a 
hammer is sufficient, other times one might need to use a screwdriver.

> --an_old_farts_logic

Thinking back on how long I've been playing with computers, I wonder how long 
it will be until I am in the "old fart" category?

--
Joost


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
  2016-08-01 16:49         ` J. Roeleveld
@ 2016-08-01 18:03           ` Rich Freeman
  2016-08-02  5:51             ` james
  2016-08-02  5:16           ` james
  1 sibling, 1 reply; 27+ messages in thread
From: Rich Freeman @ 2016-08-01 18:03 UTC (permalink / raw
  To: gentoo-user

On Mon, Aug 1, 2016 at 12:49 PM, J. Roeleveld <joost@antarean.org> wrote:
> On Monday, August 01, 2016 08:43:49 AM james wrote:
>
>> Sure this part is only related to
>> transaction processing as there was much more to the "five 9s" legacy,
>> but imho, that is the heart of what was the precursor to ACID property's
>> now so greatly espoused in SQL codes that Douglas refers to.
>>
>> Do folks concur or disagree at this point?
>
> ACID is about data integrity. The "best 2 out of 3" voting was, in my opinion,
> a work-around for unreliable hardware. It is based on a clever idea, but when
> 2 computers having the same data and logic come up with 2 different answers, I
> wouldn't trust either of them.

I agree, this was a solution for hardware issues.  However, hardware
issues can STILL happen today, so there is an argument for it.  There
are really two ways to get to robustness: clever hardware, and clever
software.  The old way was to do it in hardware, the newer way is to
do it in software (see Google with their racks of cheap motherboards).
I suspect software will always be the better way, but you can't just
write a check to get better software the way you can with hardware.
Doing it right with software means hiring really good people, which is
something a LOT of companies don't want to do (well, they think
they're doing it, but they're not).

Basically I believe the concept with the mainframe was that you could
probably open the thing up, break one random board with a hammer, and
the application would still keep running just fine.  IBM would then
magically show up the next day and replace the board without anybody
doing anything.  All the hardware had redundancy, so you can run your
application for a decade or two without fear of a hardware failure.

However, you pay a small fortune for all of this.  The other trend as
I understand it in mainframes is renting your own hardware to you.
That is, you buy a box, and you can just pay to turn on extra
CPUs/etc.  You can imagine what the margins are like for that to be
practical, but for non-trendy businesses that don't want to offer free
ice cream and pay Silicon Valley wages I guess it is an alternative to
building good software.

>
> You have seen how "democracies" work, right? :)
> The more voters involved, the longer it takes for all the votes to be counted.
> With a small number, it might actually still scale, but when you pass a magic
> number (no clue what this would be), the counting time starts to exceed any
> time you might have gained by adding more voters.
>
> Also, this, to me, seems to counteract the whole reason for using clusters:
> Have different nodes handle a different part of the problem.

I agree.  The old mainframe way of doing things isn't going to make
anything faster.  I don't think it will necessarily make things much
slower as long as all the hardware is in the same box.  However, if
you want to start doing this at a cluster scale with offsite replicas
I imagine the latencies would kill just about anything.  That was one
of the arguments against the Postgres vacuum approach where replicas
could end up having in-use records deleted.  The solutions are to
delay the replicas (not great), or synchronize back to the master
(also not great).  The MySQL approach apparently lets all the replicas
do their own vacuuming, which does neatly solve that particular
problem (presumably at the cost of more work for the replicas, and of
course they're no longer binary replicas).

>
> The way Uber created the cluster is useful when having 1 node handle all the
> updates and multiple nodes providing read-only access while also providing
> failover functionality.

I agree.  I do remember listening to a Postgres talk by one of the
devs and while everybody's holy grail is the magical replica where you
just have a bunch of replicas and you do any operation on any replica
and everything is up to date, in reality that is almost impossible to
achieve with any solution.

-- 
Rich

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
  2016-08-01 18:03           ` Rich Freeman
@ 2016-08-02  5:51             ` james
  2016-08-11 12:48               ` Douglas J Hunley
  0 siblings, 1 reply; 27+ messages in thread
From: james @ 2016-08-02  5:51 UTC (permalink / raw
  To: gentoo-user

On 08/01/2016 01:03 PM, Rich Freeman wrote:
> On Mon, Aug 1, 2016 at 12:49 PM, J. Roeleveld <joost@antarean.org> wrote:
>> On Monday, August 01, 2016 08:43:49 AM james wrote:
>>
>>> Sure this part is only related to
>>> transaction processing as there was much more to the "five 9s" legacy,
>>> but imho, that is the heart of what was the precursor to ACID property's
>>> now so greatly espoused in SQL codes that Douglas refers to.
>>>
>>> Do folks concur or disagree at this point?
>>
>> ACID is about data integrity. The "best 2 out of 3" voting was, in my opinion,
>> a work-around for unreliable hardware. It is based on a clever idea, but when
>> 2 computers having the same data and logic come up with 2 different answers, I
>> wouldn't trust either of them.
>
> I agree, this was a solution for hardware issues.  However, hardware
> issues can STILL happen today, so there is an argument for it.  There
> are really two ways to get to robustness: clever hardware, and clever
> software.  The old way was to do it in hardware, the newer way is to
> do it in software (see Google with their racks of cheap motherboards).
> I suspect software will always be the better way, but you can't just
> write a check to get better software the way you can with hardware.
> Doing it right with software means hiring really good people, which is
> something a LOT of companies don't want to do (well, they think
> they're doing it, but they're not).
>
> Basically I believe the concept with the mainframe was that you could
> probably open the thing up, break one random board with a hammer, and
> the application would still keep running just fine.  IBM would then
> magically show up the next day and replace the board without anybody
> doing anything.  All the hardware had redundancy, so you can run your
> application for a decade or two without fear of a hardware failure.

Not with todays clusters and cheap hardware. As you pointed out 
expertise (and common sense) are the quintessential qualities for staff 
and managers.....

>
> However, you pay a small fortune for all of this.

Not today, that was then those absorbant prices. Sequoia made so much 
money, I pretty sure that how they ultimately became a VC firm?

> The other trend as
> I understand it in mainframes is renting your own hardware to you.

Yes, find a CPA that spent 10 years or so inside the IRS and you get 
even more aggressive  profitibility vectors. Some accouants move 
hardware, assest and corporations around and about the world in a shell 
game and never pay taxes, just recycling assets among billionares. It's 
pretty sickening, if you really learn the details of what goes on.

> That is, you buy a box, and you can just pay to turn on extra
> CPUs/etc.  You can imagine what the margins are like for that to be
> practical, but for non-trendy businesses that don't want to offer free
> ice cream and pay Silicon Valley wages I guess it is an alternative to
> building good software.

Investment credits, sell/rent hardware to overseas divison, then move 
them to another country that pays you re-locate and bring a few jobs. 
Heck, event the US stats play that stupid game with recruiting 
corporations. Get and IRA career agent drunk some time and pull a few 
stories out of them.....

>> You have seen how "democracies" work, right? :)
>> The more voters involved, the longer it takes for all the votes to be counted.
>> With a small number, it might actually still scale, but when you pass a magic
>> number (no clue what this would be), the counting time starts to exceed any
>> time you might have gained by adding more voters.
>>
>> Also, this, to me, seems to counteract the whole reason for using clusters:
>> Have different nodes handle a different part of the problem.
>
> I agree.  The old mainframe way of doing things isn't going to make
> anything faster.  I don't think it will necessarily make things much
> slower as long as all the hardware is in the same box.  However, if
> you want to start doing this at a cluster scale with offsite replicas
> I imagine the latencies would kill just about anything.  That was one
> of the arguments against the Postgres vacuum approach where replicas
> could end up having in-use records deleted.  The solutions are to
> delay the replicas (not great), or synchronize back to the master
> (also not great).  The MySQL approach apparently lets all the replicas
> do their own vacuuming, which does neatly solve that particular
> problem (presumably at the cost of more work for the replicas, and of
> course they're no longer binary replicas).

Why Rich, using common sense? What's wrong with you? I thought you were 
a good corporate lacky?  Bob from accounting has already presented to 
the BOD and got approval. Rich, can you be a team player (silent idiot) 
just once for the team?

>
>>
>> The way Uber created the cluster is useful when having 1 node handle all the
>> updates and multiple nodes providing read-only access while also providing
>> failover functionality.
>
> I agree.  I do remember listening to a Postgres talk by one of the
> devs and while everybody's holy grail is the magical replica where you
> just have a bunch of replicas and you do any operation on any replica
> and everything is up to date, in reality that is almost impossible to
> achieve with any solution.

Yep NoSQL is floundering mightily when requirements are stringent and 
other extreme QA issues are fine-grained, from what I read. Sadly, like 
yourself, I like to put on my 'common sense' glasses after an 
architectural solution is presented, and I've seen mountains of bad 
ideas; like BP running prudhoe bay (N. Americas largest oil field) in 
the Arctic. Bad, bad idea, if you are an engineer and hang out with 
those 'tards' a few days. Collected data in the arctic, microwaved it to 
a mainframe in Anchorage, ran software, and then microwave controls 
signals back to the field controllers. Beyond stupid.They were an 
embarrassment to the entire petroleum industry back in the 70s, when I 
did some automation (RF to RF) to mainframe work in the arctic. LIke 
wise the solution to all of the drilling disasters, world wide, is each 
country needs to provide RT date to a monitoring station, in the 
government and status things like the condition of the safety and backup 
safety systems (Real Time) so keep mid manager from making gargantuanly 
stupid decisions. There is more than this amount of stupidity in how 
many cluster (cloud companies) think large amounts of critical data will 
be 'outsourced'. Bean counters scare me the most.
Sales-lizards are rarely trusted, unless they listen to me and do 
exactly what I tell them to do.

It seems that there are many many tards in the cluster (cloud) space 
lacking of common sense. So that (cluster/cloud) industry is going to 
implode, just like the "dot-com" bubble of the 90s. Not because there is 
not lots of valid projects and good ideas, but many tards are managing 
and they lack the common sense to poor piss out of a boot let alone 
discern valid solutions for specific industries. Like a 'blind hog':: 
though they will find an acorn or two. A historical CS class or two on 
what has been tried what works and does not work and why, along with a 
few (real) hardware architecture classes) and there would not be so many 
ridiculous (doomed to fail before getting stared) cluster (cloud) 
companies out there. Developing unknown but old ideas in java, is still 
going to fail. Many are the BP of the cloud:: a disaster just waiting to 
fail.... ymmv. Many folks in the Petroleum industry warned Alaskan 
government officials that BP was incompetent, back in the 70s.
They still are mostly becase the executives  would not not how to 
calculate the weight of drill stem column of fluid and match it up with 
the expected subsurface pressures to be encountered. It's a simple 
'material balance equation' you could teach a HS physics class.

Likewise there is a rich history (graveyard) of distributed processing 
and  that body of knowledge is being ignore, mostly because it is 
getting in the way of vendor hyperbole......

Douglas did manage to pull his own bacon from the fire, in the end of 
his article, but it wreaks of vendor hyperbole, imho.

thanks for the comments,
James

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
  2016-08-02  5:51             ` james
@ 2016-08-11 12:48               ` Douglas J Hunley
  2016-08-12 13:00                 ` james
  0 siblings, 1 reply; 27+ messages in thread
From: Douglas J Hunley @ 2016-08-11 12:48 UTC (permalink / raw
  To: Gentoo

[-- Attachment #1: Type: text/plain, Size: 404 bytes --]

On Tue, Aug 2, 2016 at 1:51 AM, james <garftd@verizon.net> wrote:

> Douglas did manage to pull his own bacon from the fire, in the end of his
> article, but it wreaks of vendor hyperbole, imho.
>

Again, not the author


-- 
{
  "name": "douglas j hunley",
  "email": "doug.hunley@gmail.com",
  "social": [
    {
        "blog": "https://hunleyd.github.io/",
        "twitter": "@hunleyd"
    }
    ]
}

[-- Attachment #2: Type: text/html, Size: 1106 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
  2016-08-11 12:48               ` Douglas J Hunley
@ 2016-08-12 13:00                 ` james
  2016-08-12 14:13                   ` R0b0t1
  0 siblings, 1 reply; 27+ messages in thread
From: james @ 2016-08-12 13:00 UTC (permalink / raw
  To: gentoo-user

On 08/11/2016 07:48 AM, Douglas J Hunley wrote:
>
> On Tue, Aug 2, 2016 at 1:51 AM, james <garftd@verizon.net
> <mailto:garftd@verizon.net>> wrote:
>
>     Douglas did manage to pull his own bacon from the fire, in the end
>     of his article, but it wreaks of vendor hyperbole, imho.
>
>
> Again, not the author
>
>
> --
> {
>   "name": "douglas j hunley",
>   "email": "doug.hunley@gmail.com <mailto:doug.hunley@gmail.com>",
>   "social": [
>     {
>         "blog": "https://hunleyd.github.io/",
>         "twitter": "@hunleyd"
>     }
>     ]
> }

IFF I made a logical sequence attachment error {a boo_boo}::
	1K apologies

IFF I bruised your ego::
	1M apologies

IFF I insulted your pride::
	1G apologies

IFelse

My goal was to clear up common ignorance of where the ACID properties
came from::

Mathematics ==>Electro-Mechanical Engineering ==>Electronics Engineering
==>DataBase Weenies ==>(accounting)Codes.

OK? That's my thesis and conclusion:: sprinkle with apologies as 
necessary. I do knowledge that DataBase (weeny) vendors are the 
Microsoft of Robustness and Reliability, espoused by the current state 
of affairs in transaction processes, which is now a staple of modern 
computations, much like MicroSoft made computers so idiots can 
participate too. No arguments therein.

BUT, I take the time to 'educate' folks for a very important reason::
Distributed and parallel processing, now entering it's 
fourth/fifth/sixth/<whatever> rendition, offers up fundamental 
mathematically based constructs, that can be realized in 
(electronic)hardware or Software
or both, to build 'systems' that far exceed the robustness of ACID
properties currently found in a current database scheme. Furthermore, 
whores like Oracle, need to be retired from the computational landscape, 
as they are the robber barrons of yore and we just do not need them any 
more. I.E. learn the basics and implement new constructs
in distributed and parallel schemes (aka  the cluster).

Fundamental and sound and proven principals of mathematics and EE 
provide solutions for many 'degrees of freedom' for more robust 
solutions than the Vendor hyperbole of Database vendors. And yes, your 
favorite University, and Wiki*, have failed to accurately document this; 
nothing I can do about that but share, as I am doing here.

OK? So, the interested can do their own research, and others can trudge 
along their merry way. (The apologies are sincere, but, I am a bit 
crass:: no apologies on that note).

hth,
James

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
  2016-08-12 13:00                 ` james
@ 2016-08-12 14:13                   ` R0b0t1
  2016-08-12 14:15                     ` R0b0t1
  0 siblings, 1 reply; 27+ messages in thread
From: R0b0t1 @ 2016-08-12 14:13 UTC (permalink / raw
  To: gentoo-user

On Fri, Aug 12, 2016 at 8:00 AM, james <garftd@verizon.net> wrote:
> Mathematics ==>Electro-Mechanical Engineering ==>Electronics Engineering
> ==>DataBase Weenies ==>(accounting)Codes.

The study of anything is really the study of war.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
  2016-08-12 14:13                   ` R0b0t1
@ 2016-08-12 14:15                     ` R0b0t1
  2016-08-12 18:01                       ` james
  0 siblings, 1 reply; 27+ messages in thread
From: R0b0t1 @ 2016-08-12 14:15 UTC (permalink / raw
  To: gentoo-user

On Fri, Aug 12, 2016 at 9:13 AM, R0b0t1 <r030t1@gmail.com> wrote:
> On Fri, Aug 12, 2016 at 8:00 AM, james <garftd@verizon.net> wrote:
>> Mathematics ==>Electro-Mechanical Engineering ==>Electronics Engineering
>> ==>DataBase Weenies ==>(accounting)Codes.
>
> The study of anything is really the study of war.

Readers will find it amusing that Machiavelli's writings included
convenient descriptions of pike-and-shot formations as "ASCII" art.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
  2016-08-12 14:15                     ` R0b0t1
@ 2016-08-12 18:01                       ` james
  0 siblings, 0 replies; 27+ messages in thread
From: james @ 2016-08-12 18:01 UTC (permalink / raw
  To: gentoo-user

On 08/12/2016 09:15 AM, R0b0t1 wrote:
> On Fri, Aug 12, 2016 at 9:13 AM, R0b0t1 <r030t1@gmail.com> wrote:
>> On Fri, Aug 12, 2016 at 8:00 AM, james <garftd@verizon.net> wrote:
>>> Mathematics ==>Electro-Mechanical Engineering ==>Electronics Engineering
>>> ==>DataBase Weenies ==>(accounting)Codes.
>>
>> The study of anything is really the study of war.
>
> Readers will find it amusing that Machiavelli's writings included
> convenient descriptions of pike-and-shot formations as "ASCII" art.
>
>

Plausible, but consider some perspective on Mac::

A medical professor once asked her class to submit a one paragraph 
thesis on how Machiavelli works affected modern medicine. When the 
youngest member of the class (quite young actually) Espoused that most 
acknowledge that  Mac was very ill, later in life, His Thesis was that 
that (catastrophic) illness actually had consumed Mac much earlier in 
life and therefore, the study and reading of MAC, was more attributable 
to a manifestation of 'societal sickness', rather than a learned pursuit 
of that which is worthy of pursuit.

He receive a low mark in that class, but truth is truth, especially in 
the eyes of the author, a brilliant truth most often.

There is an ironic posting in Hacker Mews about the lack of credibility 
amongst their customers, when focused on modern psychiatry, you just 
might find in interesting. All other forms of modern medicine receive 
quite high marks, from their customers.

caveat emptor,
James

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
  2016-08-01 16:49         ` J. Roeleveld
  2016-08-01 18:03           ` Rich Freeman
@ 2016-08-02  5:16           ` james
  2016-08-04 10:09             ` J. Roeleveld
  1 sibling, 1 reply; 27+ messages in thread
From: james @ 2016-08-02  5:16 UTC (permalink / raw
  To: gentoo-user

On 08/01/2016 11:49 AM, J. Roeleveld wrote:
> On Monday, August 01, 2016 08:43:49 AM james wrote:
>> On 08/01/2016 02:16 AM, J. Roeleveld wrote:
>>> On Saturday, July 30, 2016 06:38:01 AM Rich Freeman wrote:
>>>> On Sat, Jul 30, 2016 at 6:24 AM, Alan McKinnon <alan.mckinnon@gmail.com>
>>>
>>> wrote:
>>>>> On 29/07/2016 22:58, Mick wrote:
>>>>>> Interesting article explaining why Uber are moving away from
>>>>>> PostgreSQL.
>>>>>> I am
>>>>>> running both DBs on different desktop PCs for akonadi and I'm also
>>>>>> running
>>>>>> MySQL on a number of websites.  Let's which one goes sideways first.
>>>>>> :p
>>>>>>
>>>>>>  https://eng.uber.com/mysql-migration/
>>>>>
>>>>> I don't think your akonadi and some web sites compares in any way to
>>>>> Uber
>>>>> and what they do.
>>>>>
>>>>> FWIW, my Dev colleagues support and entire large corporate ISP's
>>>>> operational and customer data on PostgreSQL-9.3. With clustering. With
>>>>> no
>>>>> db-related issues :-)
>>>>
>>>> Agree, you'd need to be fairly large-scale to have their issues,
>>>
>>> And also have to design your database by people who think MySQL actually
>>> follows common SQL standards.
>>>
>>>> but I
>>>> think the article was something anybody interested in databases should
>>>> read.  If nothing else it is a really easy to follow explanation of
>>>> the underlying architectures.
>>>
>>> Check the link posted by Douglas.
>>> Ubers article has some misunderstandings about the architecture with
>>> conclusions drawn that are, at least also, caused by their database design
>>> and usage.
>>>
>>>> I'll probably post this to my LUG mailing list.  I think one of the
>>>> Postgres devs lurks there so I'm curious to his impressions.
>>>>
>>>> I was a bit surprised to hear about the data corruption bug.  I've
>>>> always considered Postgres to have a better reputation for data
>>>> integrity.
>>>
>>> They do.
>>>
>>>> And of course almost any FOSS project could have a bug.  I
>>>> don't know if either project does the kind of regression testing to
>>>> reliably detect this sort of issue.
>>>
>>> Not sure either, I do think PostgreSQL does a lot with regression tests.
>>>
>>>> I'd think that it is more likely
>>>> that the likes of Oracle would (for their flagship DB (not for MySQL),
>>>
>>> Never worked with Oracle (or other big software vendors), have you? :)
>>>
>>>> and they'd probably be more likely to send out an engineer to beg
>>>> forgiveness while they fix your database).
>>>
>>> Only if you're a big (as in, spend a lot of money with them) customer.
>>>
>>>> Of course, if you're Uber
>>>> the hit you'd take from downtime/etc isn't made up for entirely by
>>>> having somebody take a few days to get everything fixed.
>>>
>>> --
>>> Joost
>>
>> I certainly respect your skills and posts on Databases, Joost, as
>> everything you have posted, in the past is 'spot on'.
>
> Comes with a keen interest and long-term (think decades) of working with
> different databases.
>
>> Granted, I'm no database expert, far from it.
>
> Not many people are, nor do they need to be.
>
>> But I want to share a few thing with you,
>> and hope you  (and others) will 'chime in' on these comments.
>>
>> Way back, when the earth was cooling and we all had dinosaurs for pets,
>> some of us hacked on AT&T "3B2" unix systems. They were know for their
>> 'roll back and recovery', triplicated (or more) transaction processes
>> and 'voters' system to ferret out if a transaction was complete and
>> correct. There was no ACID, the current 'gold standard' if you believe
>> what Douglas and other write about concerning databases.
>>
>> In essence, (from crusted up memories) a basic (SS7) transaction related
>> to the local telephone switch, was ran  on 3 machines. The results were
>> compared. If they matched, the transaction went forward as valid. If 2/3
>> matched,
>
> And what in the likely case when only 1 was correct?

1/3 was a failure, in fact X<1 could be defined (parameter setting) as a 
failure depending on the need.

> Have you seen the movie "minority report"?
> If yes, think back to why Tom Cruise was found 'guilty' when he wasn't and how
> often this actually occured.

Apples to Oranges. The (3) "pre-cons" were  not equal, ableit the voted, 
most of the time all three in agreement, but the dominant pre-con was 
always on the correct side of the issue. But that is make-believe. 
Comparing results of codes run on 3 different processors or separate 
machines for agreement withing tolerances, is quite different.  The very 
essence of using voting where there a result less that 1.0 (that is 
n-1/n or n-x/n  was requisite on identical (replicated) processes all 
returning the same result ( expecting either a 0 or 1) returned. Results 
being logical or within rounding error of acceptance. Surely we need not 
split hairs. I was merely pointing out that the basis telecom systems 
formed the early and of widespread transaction processing industries and 
is the grand daddy of the ACID model/norms/constructs of modern 
transaction processing. And Douglas is
dead wrong that those sorts of (ACID) transactions cannot be made to fly 
on clusters versus a single machine. For massively parallel needs,
distributed processing rules, but it is not trivial and hence Uber, with 
mostly a bunch of kids, seems to be struggling and have made bad 
decisions. Prolly, there mid managers and software architects are the 
weak link, or they did get expert guidance that was not inhouse, or poor 
decisions to get some code running quickly etc etc. I do not really care 
about UBER. My singular issue is Douglas was completely dead wrong 
(which nicely promoted himself as a postgress expert and his business 
credentals, and just barely saved his credibility by stating what UBER 
is now doing that is superior to a grade ACID, dB solution.

Another point, there are single big GPUs that can be run as thousands of 
different processors on either FPGA or GPU, granted using SIMD/MIMD 
style processors and thing like 'systolic algorithms' but that sort of 
this is out of scope here. (Vulcan might change that, in an open source
kind of way, maybe). Furthermore, GPU resources combined with DDR-5 can 
blur the line and may actually be more cost effective for many forms of 
transaction processing, but clusters, in their current forms are very 
much general purpose machines. My point:: Douglas is dead wrong about 
ACID being dominated by Databases, for technical reasons, particularly 
for advanced teams of experts. Surely most MBA, HR and Finance types of 
idiots running these new startups would know know a coder from an 
architect, and that is very sad, because a good consultant could have 
probably designed several robust systems in a week or two. Grant few 
consultants has that sort of unbiased integrity, because we all have 
bills to pay and much is getting outsourced... Integrity has always been 
the rarest of qualities, particularly with humanoids......

>
>> and the switch was was configured, then the code would
>> essentially 'vote' and majority ruled. This is what led to phone calls
>> (switched phone calls) having variable delays, often in the order of
>> seconds, mis-connections and other problems we all encountered during
>> periods of excessive demand.
>
> Not sure if that was the cause in the past, but these days it can also still
> take a few seconds before the other end rings. This is due to the phone-system
> (all PBXs in the path) needing to setup the routing between both end-points
> prior to the ring-tone actually starting.
> When the system is busy, these lookups will take time and can even time-out.
> (Try wishing everyone you know a happy new year using a wired phone and you'll
> see what I mean. Mobile phones have a seperate problem at that time)

I did not intend to argue about the minutia of how a particular Baby 
Bell implemented their SS7 switching systems on unix systems. My point 
was the 'transaction processing' grew out the early telephone network, 
the way I remember it:: ymmv. Banks did dual entry accounting by hand 
and had clerks manually load data sets, then double entry accounting 
became automated and ACID style transaction processing added later. So 
what sql folks refer to as ACID properties, comes from the North 
American  switching heritage and eventually the worlds telecom networks, 
eons ago.

>> That scenario was at the heart of how old, crappy AT&T unix (SVR?) could
>> perform so well and therefore established the gold standard for RT
>> transaction processing, aka the "five  9s" 99.999% of up-time (about 5
>> minutes per year of downtime).
>
> "Unscheduled" downtime. Regular maintenance will require more than 5 minutes
> per year.

Yes but the redundancy of 3b2 and other computers (Sequent, Sequoia and 
Tandem to name a few) meant that the "phone switching" fabric, at any 
given Central Office (the local building where the copper, Rf and fiber 
lines are muxed)(was, on average up and available 99.999% of the time.
Ironically gentoo now has a 'sys/fabric group :: 
/usr/portage/sys-fabric, thanks to some forward thinking cluster folk.

>
>> Sure this part is only related to
>> transaction processing as there was much more to the "five 9s" legacy,
>> but imho, that is the heart of what was the precursor to ACID property's
>> now so greatly espoused in SQL codes that Douglas refers to.
>>
>> Do folks concur or disagree at this point?
>
> ACID is about data integrity. The "best 2 out of 3" voting was, in my opinion,
> a work-around for unreliable hardware.

Absolute true. But the fact that a High Reliability in computer 
processing (including the billing) could be replicated performed 
elsewhere and then 'recombined', proves that the need of any ACID 
function can be split up and ran on clusters and achieve ACID standards 
or even better. So my point, is that the cluster, if used wisely,
will beat the 'dog shit' out of any Oracle fancy-pants database 
maneuvers. Evidence:: Snoracle is now snapping up billion dollar 
companies in the cluster space, cause their days of extortion are 
winding down rather rapidly, imho.

Also, just because the kids are writing the codes, have not figured all 
of this out, does not mean that SQL and any abstraction is better that 
parallel processing. No way in hell. Cheaper and quicker to set up, 
surely true, but never superior to a well design properly coded 
distributed solution. That's my point. Hence, Douglas is full of 
stuffing, except he alludes to the fact that UBER is doing something 
much better, beyond what Oracle has an interest in doing, at the last 
possible moment in his critique. This is back up by Oracles lethargic 
reaction to the data processing market just leaving Oracle to become the 
next IBM.... (ymmv).

> It is based on a clever idea, but when
> 2 computers having the same data and logic come up with 2 different answers, I
> wouldn't trust either of them.

Yep, That the QA of  Transactions is rejected and must be resubmitted, 
modified or any number of remedies, is quite common in many forms of 
software. Voting does not correct errors, except maybe a fractional 
rounding up to 1(pass) or down to zero (failure). It does help to 
achieve the ACI of ACID

Since billions and billions of these (complex) transactions are 
occurring, it is usually just repeated. If it keeps failing then 
engineers/coders take a deeper look. Rare statistical anomalies are 
auto-scrutinized (that would be replications and voting) and the pushed 
to a logical zero or logical one.

>
>> The reason this is important to me (and others?), is that, if this idea
>> (granted there is much more detail to it) is still valid, then it can
>> form  the basis for building up superior-ACID processes, that meet or
>> exceed, the properties of an expensive (think Oracle) transaction
>> process on distributed (parallel) or clustered systems, to a degree of
>> accuracy only limited by the limit of the number of odd numbered voter
>> codes involve in the distributed and replicated parts of the
>> transaction. I even added some code where replicated routines were
>> written in different languages, and the results compared to add an
>> additional layer of verification before the voter step. (gotta love
>> assembler?).
>
> You have seen how "democracies" work, right? :)

Yes I need to shed some light on  telecom processing. I never intend to 
suggest that voting corrected errors; althoght error correction codes 
are usually part of the overall stack. I  tried to suggest that all 
transactions on phone switches are already (Atomic (pass or fail-redo;
Consistent (replications pass on different hardware pathways to 
satisfaction metrics; Isolated via multiple hardware pathways; Durable
passing a voter check scheme and (five nines still is the gold standard 
for a system (even mil-spec).

So the old telecom systems are indeed and infact  the heritage for 
modern ACID transactions.

> The more voters involved, the longer it takes for all the votes to be counted.

Wrong! Voters are all run in parallel. For this level of redundancy (to 
achieve a QA result of 99.999% system pristine, it is more expensive, 
analogous to encryption versus clear text. Nobody, but a business major 
would use an excessive number of voters in their switching fabric. 
Telecom incompetences, in my experiences, has been the domain of mid 
manager too weak to educate upper management on poor ideas many of them 
have had and continue to have (Verizon comes to mind, too often).

> With a small number, it might actually still scale, but when you pass a magic
> number (no clue what this would be), the counting time starts to exceed any
> time you might have gained by adding more voters.

Nope the larger the number, the more expensive. The number of voters 
rarely goes above 5, but it could for some sorts of physics problems 
(think quantum mechanics and logic not bound to [0 1] whole numbers.
Often logic circuits (constructs for programmers, have "dont care" 
states that can be handled in a variety of ways (filters, transforms, 
counters etc etc).

> Also, this, to me, seems to counteract the whole reason for using clusters:
> Have different nodes handle a different part of the problem.

That also occurs. But my point is properly design code for the cluster 
can replace ACID functions, offered by Oracle and other over priced 
solutions, on standard cluster hardware. The problem with todays 
clusters is the vendors that employ the kid-coders, are making things 
far more complicated that necessary, so the average linux hacker just 
outsources via the cloud. DUMB, insecure and not a wise choice for many 
industries. And sooner or later folks are going to get wise can build 
their own clusters that just solve the problems they have. Surely hybrid 
clusters will domiant where the owner of the codes does outsource peak 
loads and mundance collects of ordinary (non-critical) data. Vendors 
know this and have started another 'smoke and mirrors' campaign called 
(brace yourself) 'Unikernels'.....  Problem with that approach is they 
should just be using minized (focused) gentoo on striped and optimize 
linux kernels; but that is another lost art from the linux collection

>
> Clusters of multiple compute-nodes is a quick and "simple" way of increasing
> the amount of computational cores to throw at problems that can be broken down
> in a lot of individual steps with minimal inter-dependencies.

And surpass the ACID features of either postgresql or Oracle, and spend 
less money (maybe not with you and postgresql on their team)!

> I say "simple" because I think designing a 1,000 core chip is more difficult
> than building a 1,000-node cluster using single-core, single cpu boxes.

Today, you are correct. Tomorrow you will be wrong. [1]. Besides once 
that chip or VHDL code or whatever is designed, it can be replicated and 
resused endlessly. Think ASIC designers, folks to take a fpga project to 
completing, An EE can codes on large arrays of DSPs, or a GPU
(think Khronos group) using Vulcan.

>
> I would still consider the cluster to be a single "machine".

Thats the goal.

>
>> I guess my point is 'Douglas' is full of stuffing, OR that is what folks
>> are doing when they 'role their own solution specifically customized to
>> their specific needs' as he alludes to near the end of his commentary?
>
> The response Douglas linked to is closer to what seems to work when dealing
> with large amounts of data.
>
>> (I'd like your opinion of this and maybe some links to current schemes
>> how to have ACID/99.999% accurate transactions on clusters of various
>> architectures.)  Douglas, like yourself, writes of these things in a
>> very lucid fashion, so that is why I'm asking you for your thoughts.
>
> The way Uber created the cluster is useful when having 1 node handle all the
> updates and multiple nodes providing read-only access while also providing
> failover functionality.

SIMD solution, mimic on a cluster? Cool.
>
>> Robustness of transactions, in a distributed (clustered) environment is
>> fundamental to the usefulness of most codes that are trying to migrate
>> to a cluster based processes in (VM/container/HPC) environments.
>
> Whereas I do consider clusters to be very useful, not all work-loads can be
> redesigned to scale properly.

Today, correct. Tomorrow, I think you are going to be wrong. It's like 
the single core, multicore. Granted many old decreped codes had to be 
redesigned and coded anew with threads and other modern constructs to 
take advantage of newer processing platforms. Sure the same is true with 
distributed, but it's far closer than ever. The largest problem with 
cluster, is Vendors with agendas, are making things more complicated 
than necessary and completely ignoring many fundamental issues, like 
kernel stripping and optimizations under the bloated OS they are using.

>
>> I do
>> not have the old articles handy but, I'm sure that many/most of those
>> types of inherent processes can be formulated in the algebraic domain,
>> normalized and used to solve decisions often where other forms of
>> advanced logic failed (not that I'm taking a cheap shot at modern
>> programming languages) (wink wink nudge nudge); or at least that's how
>> we did it.... as young whipper_snappers bask in the day...
>
> If you know what you are doing, the language is just a tool. Sometimes a
> hammer is sufficient, other times one might need to use a screwdriver.
>
>> --an_old_farts_logic
>
> Thinking back on how long I've been playing with computers, I wonder how long
> it will be until I am in the "old fart" category?

Stay young! I run full court hoops all the time with young college 
punks; it's one of my greatest joys in life, run with the young 
stallions, hacking, pushing, shoving, slicing and taunting other 
athletes. Old farts clubs is not something to be proud of, I just like 
to share too much......

> Joost

Thanks !

James

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
  2016-08-02  5:16           ` james
@ 2016-08-04 10:09             ` J. Roeleveld
  2016-08-04 17:08               ` james
  0 siblings, 1 reply; 27+ messages in thread
From: J. Roeleveld @ 2016-08-04 10:09 UTC (permalink / raw
  To: gentoo-user

On Tuesday, August 02, 2016 12:16:32 AM james wrote:
> On 08/01/2016 11:49 AM, J. Roeleveld wrote:
> > On Monday, August 01, 2016 08:43:49 AM james wrote:

<snipped>

> >> Way back, when the earth was cooling and we all had dinosaurs for pets,
> >> some of us hacked on AT&T "3B2" unix systems. They were know for their
> >> 'roll back and recovery', triplicated (or more) transaction processes
> >> and 'voters' system to ferret out if a transaction was complete and
> >> correct. There was no ACID, the current 'gold standard' if you believe
> >> what Douglas and other write about concerning databases.
> >> 
> >> In essence, (from crusted up memories) a basic (SS7) transaction related
> >> to the local telephone switch, was ran  on 3 machines. The results were
> >> compared. If they matched, the transaction went forward as valid. If 2/3
> >> matched,
> > 
> > And what in the likely case when only 1 was correct?
> 
> 1/3 was a failure, in fact X<1 could be defined (parameter setting) as a
> failure depending on the need.

I actually meant:
system A says true
system B and C say false
And "true" was correct.
(Being devil's advocate here)

> > Have you seen the movie "minority report"?
> > If yes, think back to why Tom Cruise was found 'guilty' when he wasn't and
> > how often this actually occured.
> 
> Apples to Oranges. The (3) "pre-cons" were  not equal, ableit the voted,
> most of the time all three in agreement, but the dominant pre-con was
> always on the correct side of the issue. But that is make-believe.

Ofcourse, but it was the first example that I could come up with.

> Comparing results of codes run on 3 different processors or separate
> machines for agreement withing tolerances, is quite different.  The very
> essence of using voting where there a result less that 1.0 (that is
> n-1/n or n-x/n  was requisite on identical (replicated) processes all
> returning the same result ( expecting either a 0 or 1) returned. Results
> being logical or within rounding error of acceptance. Surely we need not
> split hairs. I was merely pointing out that the basis telecom systems
> formed the early and of widespread transaction processing industries and
> is the grand daddy of the ACID model/norms/constructs of modern
> transaction processing.

Hmm... I am having difficulty following how ACID and ensuring results are 
correct by double or triple checking are related.

> And Douglas is

Which Douglas are you referring to? The one in this thread didn't actually 
write the article he linked to. (Unless he has 2 different identities)

> dead wrong that those sorts of (ACID) transactions cannot be made to fly
> on clusters versus a single machine.

It depends on how you define a cluster. I tend to view a cluster as a single 
system that just happens to be spread over multiple physical boxes.

> For massively parallel needs,
> distributed processing rules, but it is not trivial

Agreed.

> and hence Uber, with
> mostly a bunch of kids, seems to be struggling and have made bad
> decisions.

Lets ignore if the decisions are good or bad. Only thing we can be certain of, 
without seeing their code and environment, is that it doesn't scale the way 
they need it to.

> Prolly, there mid managers and software architects are the
> weak link, or they did get expert guidance that was not inhouse, or poor
> decisions to get some code running quickly etc etc. I do not really care
> about UBER.

Neither do I. And decisions are usually made by a single architect or 
developer who starts the project. His/her manager usually just accepts his/her 
word on this and all future decisions. Up until the moment the manager gets 
replaced. Then it depends on how much the manager trusts the original 
developer.
Other developers (internal or external) usually have a hard time pointing out 
potential issues if the first developer doesn't agree and/or understand.

> My singular issue is Douglas was completely dead wrong
> (which nicely promoted himself as a postgress expert and his business
> credentals, and just barely saved his credibility by stating what UBER
> is now doing that is superior to a grade ACID, dB solution.

I didn't see that in the article. Must have missed that part.

> Another point, there are single big GPUs that can be run as thousands of
> different processors on either FPGA or GPU, granted using SIMD/MIMD
> style processors and thing like 'systolic algorithms' but that sort of
> this is out of scope here. (Vulcan might change that, in an open source
> kind of way, maybe). Furthermore, GPU resources combined with DDR-5 can
> blur the line and may actually be more cost effective for many forms of
> transaction processing, but clusters, in their current forms are very
> much general purpose machines.

I don't really agree here. For most software, having a really fast CPU helps. 
Having a lot of mediocre CPUs means the vast majority isn't doing anything 
useful.
Software running on clusters needs to be written with massive parallel 
processing in mind. Most developers don't understand this part.

> My point:: Douglas is dead wrong about
> ACID being dominated by Databases, for technical reasons, particularly
> for advanced teams of experts.

Wikipedia actually disagrees with you:
https://en.wikipedia.org/wiki/ACID
"In computer science, ACID (Atomicity, Consistency, Isolation, Durability) is 
a set of properties of database transactions."

In other words, it's related to databases.

> Surely most MBA, HR and Finance types of
> idiots running these new startups would know know a coder from an
> architect, and that is very sad, because a good consultant could have
> probably designed several robust systems in a week or two. Grant few
> consultants has that sort of unbiased integrity, because we all have
> bills to pay and much is getting outsourced... Integrity has always been
> the rarest of qualities, particularly with humanoids......

The software Uber uses for their business had to be developed in-house as 
there, at least at the time, was nothing available they could use ready-made.
This usually means, they start with something simple they can get running 
quickly. If they want to fully design the whole system first, they would never 
get anything done.

Where these projects usually go wrong is that they wait too long with a good 
robust design, leading to a near impossibility of actually fixing all the, in 
hindsight obvious, design mistakes.
(NOTE: In hindsight, as most of the actual requirements would not be clear on 
day 1)

> >> and the switch was was configured, then the code would
> >> essentially 'vote' and majority ruled. This is what led to phone calls
> >> (switched phone calls) having variable delays, often in the order of
> >> seconds, mis-connections and other problems we all encountered during
> >> periods of excessive demand.
> > 
> > Not sure if that was the cause in the past, but these days it can also
> > still take a few seconds before the other end rings. This is due to the
> > phone-system (all PBXs in the path) needing to setup the routing between
> > both end-points prior to the ring-tone actually starting.
> > When the system is busy, these lookups will take time and can even
> > time-out. (Try wishing everyone you know a happy new year using a wired
> > phone and you'll see what I mean. Mobile phones have a seperate problem
> > at that time)
> I did not intend to argue about the minutia of how a particular Baby
> Bell implemented their SS7 switching systems on unix systems. My point
> was the 'transaction processing' grew out the early telephone network,
> the way I remember it:: ymmv. Banks did dual entry accounting by hand
> and had clerks manually load data sets, then double entry accounting
> became automated and ACID style transaction processing added later. So
> what sql folks refer to as ACID properties, comes from the North
> American  switching heritage and eventually the worlds telecom networks,
> eons ago.

There is a similarity, but where ACID is a way of guaranteeing data integrity, 
a phone-switch does not need this. It simply needs to do the routing 
correctly.
Finance departments still do double-entry accounting and there still is a lot 
of manual writing/typing going on.

> >> That scenario was at the heart of how old, crappy AT&T unix (SVR?) could
> >> perform so well and therefore established the gold standard for RT
> >> transaction processing, aka the "five  9s" 99.999% of up-time (about 5
> >> minutes per year of downtime).
> > 
> > "Unscheduled" downtime. Regular maintenance will require more than 5
> > minutes per year.
> 
> Yes but the redundancy of 3b2 and other computers (Sequent, Sequoia and
> Tandem to name a few) meant that the "phone switching" fabric, at any
> given Central Office (the local building where the copper, Rf and fiber
> lines are muxed)(was, on average up and available 99.999% of the time.
> Ironically gentoo now has a 'sys/fabric group ::
> /usr/portage/sys-fabric, thanks to some forward thinking cluster folk.
> 
> >> Sure this part is only related to
> >> transaction processing as there was much more to the "five 9s" legacy,
> >> but imho, that is the heart of what was the precursor to ACID property's
> >> now so greatly espoused in SQL codes that Douglas refers to.
> >> 
> >> Do folks concur or disagree at this point?
> > 
> > ACID is about data integrity. The "best 2 out of 3" voting was, in my
> > opinion, a work-around for unreliable hardware.
> 
> Absolute true. But the fact that a High Reliability in computer
> processing (including the billing) could be replicated performed
> elsewhere and then 'recombined', proves that the need of any ACID
> function can be split up and ran on clusters and achieve ACID standards
> or even better. So my point, is that the cluster, if used wisely,
> will beat the 'dog shit' out of any Oracle fancy-pants database
> maneuvers. Evidence:: Snoracle is now snapping up billion dollar
> companies in the cluster space, cause their days of extortion are
> winding down rather rapidly, imho.

I disagree here. For some workloads, clusters are really great. But SQL 
databases will remain.

> Also, just because the kids are writing the codes, have not figured all
> of this out, does not mean that SQL and any abstraction is better that
> parallel processing. No way in hell. Cheaper and quicker to set up,
> surely true, but never superior to a well design properly coded
> distributed solution. That's my point.

Workloads where you can split the whole processing into small chunks where the 
same steps can be performed over a random sized chunk and merging at a later 
stage will lead to correct results. Then yes.
However, I deal with processes and reports where the amount of possible chunks 
is definitely limited and any theoretical benefit of splitting it over multiple 
nodes will be lost when having to build a very fancy and complex algorithm to 
merge all the seperate results back together.
This algorithm then also needs to be extensively tested analysed and 
understood by future developers. The additional cost involved will be 
prohibitive.

> Hence, Douglas is full of
> stuffing, except he alludes to the fact that UBER is doing something
> much better, beyond what Oracle has an interest in doing, at the last
> possible moment in his critique. This is back up by Oracles lethargic
> reaction to the data processing market just leaving Oracle to become the
> next IBM.... (ymmv).

I disagree, UBER is still using a relational database as the storage layer 
with something custom put over it to make it simpler for the developers.
Any abstraction layer will have a negative performance impact.

> > It is based on a clever idea, but when
> > 2 computers having the same data and logic come up with 2 different
> > answers, I wouldn't trust either of them.
> 
> Yep, That the QA of  Transactions is rejected and must be resubmitted,
> modified or any number of remedies, is quite common in many forms of
> software. Voting does not correct errors, except maybe a fractional
> rounding up to 1(pass) or down to zero (failure). It does help to
> achieve the ACI of ACID

It's one way of doing it. But it can also cause extra delays due to having to 
wait for seperate nodes to finish and then to check if they all agree.

> Since billions and billions of these (complex) transactions are
> occurring, it is usually just repeated. If it keeps failing then
> engineers/coders take a deeper look. Rare statistical anomalies are
> auto-scrutinized (that would be replications and voting) and the pushed
> to a logical zero or logical one.

The complexity comes from having to mould the algorithm into that structure. 
And additional complexity also makes it more fault-likely.

> >> The reason this is important to me (and others?), is that, if this idea
> >> (granted there is much more detail to it) is still valid, then it can
> >> form  the basis for building up superior-ACID processes, that meet or
> >> exceed, the properties of an expensive (think Oracle) transaction
> >> process on distributed (parallel) or clustered systems, to a degree of
> >> accuracy only limited by the limit of the number of odd numbered voter
> >> codes involve in the distributed and replicated parts of the
> >> transaction. I even added some code where replicated routines were
> >> written in different languages, and the results compared to add an
> >> additional layer of verification before the voter step. (gotta love
> >> assembler?).
> > 
> > You have seen how "democracies" work, right? :)
> 
> Yes I need to shed some light on  telecom processing. I never intend to
> suggest that voting corrected errors; althoght error correction codes
> are usually part of the overall stack. I  tried to suggest that all
> transactions on phone switches are already (Atomic (pass or fail-redo;
> Consistent (replications pass on different hardware pathways to
> satisfaction metrics; Isolated via multiple hardware pathways; Durable
> passing a voter check scheme and (five nines still is the gold standard
> for a system (even mil-spec).
> 
> So the old telecom systems are indeed and infact  the heritage for
> modern ACID transactions.

A lot can be described using 'modern' designs. However, the fact remains that 
ACID was worked out for databases and not for phone systems. Any sane system 
will have some form of consistency checks, but the extent where this is done 
for a data storage layer, like a database, will be different to the extent 
where this is done for a switching layer, like a router or phone switch.

Modern phone switches will not implement a redo.

> > The more voters involved, the longer it takes for all the votes to be
> > counted.
> Wrong! Voters are all run in parallel. For this level of redundancy (to
> achieve a QA result of 99.999% system pristine, it is more expensive,
> analogous to encryption versus clear text. Nobody, but a business major
> would use an excessive number of voters in their switching fabric.
> Telecom incompetences, in my experiences, has been the domain of mid
> manager too weak to educate upper management on poor ideas many of them
> have had and continue to have (Verizon comes to mind, too often).

Those incompetencies are usually in the domain of finances and services 
provided. The basic service of a telecoms company is pretty simple: "Pass 
data/voice between A and B".
There are plenty of proven systems available that can do this. The mistakes 
are usually of the kind: The system that we bought does not handle the load 
the salesperson promised.

> > With a small number, it might actually still scale, but when you pass a
> > magic number (no clue what this would be), the counting time starts to
> > exceed any time you might have gained by adding more voters.
> 
> Nope the larger the number, the more expensive. The number of voters
> rarely goes above 5, but it could for some sorts of physics problems
> (think quantum mechanics and logic not bound to [0 1] whole numbers.
> Often logic circuits (constructs for programmers, have "dont care"
> states that can be handled in a variety of ways (filters, transforms,
> counters etc etc).

"don't care" values should always be ignored. Never actually used. (Except for 
randomizer functionality)

> > Also, this, to me, seems to counteract the whole reason for using
> > clusters:
> > Have different nodes handle a different part of the problem.
> 
> That also occurs. But my point is properly design code for the cluster
> can replace ACID functions, offered by Oracle and other over priced
> solutions, on standard cluster hardware.

All commonly used relational databases have ACID functionality as long as they 
support transactions. There is no need to only choose a commercial version for 
that.

> The problem with todays
> clusters is the vendors that employ the kid-coders, are making things
> far more complicated that necessary, so the average linux hacker just
> outsources via the cloud. DUMB, insecure and not a wise choice for many
> industries.

Moving your entire business into the cloud often is.

> And sooner or later folks are going to get wise can build
> their own clusters that just solve the problems they have. Surely hybrid
> clusters will domiant where the owner of the codes does outsource peak
> loads and mundance collects of ordinary (non-critical) data.

Eg. hybrid solutions...

> Vendors
> know this and have started another 'smoke and mirrors' campaign called
> (brace yourself) 'Unikernels'.....

"unikernels" is something a small group came up with... I see no practical 
benefit for that approach.

> Problem with that approach is they
> should just be using minized (focused) gentoo on striped and optimize
> linux kernels; but that is another lost art from the linux collection

I see "unikernels" as basically, running the applications directly on top of a 
hypervisor. I fail to see how this makes more sense than starting an 
application directly on top of an OS. The whole reason we have an OS is to 
avoid having to reinvent the wheel (networking, storage, memory handling,....) 
for every single program.

> > Clusters of multiple compute-nodes is a quick and "simple" way of
> > increasing the amount of computational cores to throw at problems that
> > can be broken down in a lot of individual steps with minimal
> > inter-dependencies.
> 
> And surpass the ACID features of either postgresql or Oracle, and spend
> less money (maybe not with you and postgresql on their team)!

Large clusters are useful when doing Hadoop ("big data") style things (I 
mostly work with financial systems and the corresponding data).
Storing the entire datawarehouse inside a cluster doesn't work with all the 
additional requirements. Reports still need to be displayed quickly and a 
decently configured database is usually more beneficial. Where systems like 
Exadata really help here is by integrating the underlying storage (SAN) with 
the actual database servers and doing most of the processing in-memory.
Eg. it works like a dedicated and custom build cluster environment specifically 
for a relational database.


> > I say "simple" because I think designing a 1,000 core chip is more
> > difficult than building a 1,000-node cluster using single-core, single
> > cpu boxes.
> Today, you are correct. Tomorrow you will be wrong.

In that case, clusters will be obsolete tomorrow.

> [1]. Besides once
> that chip or VHDL code or whatever is designed, it can be replicated and
> resused endlessly. Think ASIC designers, folks to take a fpga project to
> completing, An EE can codes on large arrays of DSPs, or a GPU
> (think Khronos group) using Vulcan.
> 
> > I would still consider the cluster to be a single "machine".
> 
> Thats the goal.

That, in my opinion, that goal has already been achieved. Unless you want ALL 
machines to be part of the same cluster and all machines being able to push 
work to the entire cluster...
In that case, good luck in achieving this as you then also need to handle 
"randomly dissapearing nodes"

> >> I guess my point is 'Douglas' is full of stuffing, OR that is what folks
> >> are doing when they 'role their own solution specifically customized to
> >> their specific needs' as he alludes to near the end of his commentary?
> > 
> > The response Douglas linked to is closer to what seems to work when
> > dealing
> > with large amounts of data.
> > 
> >> (I'd like your opinion of this and maybe some links to current schemes
> >> how to have ACID/99.999% accurate transactions on clusters of various
> >> architectures.)  Douglas, like yourself, writes of these things in a
> >> very lucid fashion, so that is why I'm asking you for your thoughts.
> > 
> > The way Uber created the cluster is useful when having 1 node handle all
> > the updates and multiple nodes providing read-only access while also
> > providing failover functionality.
> 
> SIMD solution, mimic on a cluster? Cool.

Hmm.... no.
This is load balancing on the data-retrieval side.

> >> Robustness of transactions, in a distributed (clustered) environment is
> >> fundamental to the usefulness of most codes that are trying to migrate
> >> to a cluster based processes in (VM/container/HPC) environments.
> > 
> > Whereas I do consider clusters to be very useful, not all work-loads can
> > be
> > redesigned to scale properly.
> 
> Today, correct. Tomorrow, I think you are going to be wrong. It's like
> the single core, multicore.

And 90+% of developers still don't understand how to properly code for multi-
threading. Just look at how most applications work on your desktop. They all 
tend to max out a single core and the other x-1 cores tend to idle...

> Granted many old decreped codes had to be
> redesigned and coded anew with threads and other modern constructs to
> take advantage of newer processing platforms.

Intel came with Hyperthreading back in 2005 (or even before). We are now in 
2016 and the majority of code is still single-threaded.
The problem is, the algorithms that are being used need to be converted to 
parallel methods.

> Sure the same is true with
> distributed, but it's far closer than ever. The largest problem with
> cluster, is Vendors with agendas, are making things more complicated
> than necessary and completely ignoring many fundamental issues, like
> kernel stripping and optimizations under the bloated OS they are using.

I still want a graphical desktop with full multi media support. I still want 
to easily plugin a USB device or SD-card and use it immediately,.....
That requirement is incompatible with stripping the OS.

> >> I do
> >> not have the old articles handy but, I'm sure that many/most of those
> >> types of inherent processes can be formulated in the algebraic domain,
> >> normalized and used to solve decisions often where other forms of
> >> advanced logic failed (not that I'm taking a cheap shot at modern
> >> programming languages) (wink wink nudge nudge); or at least that's how
> >> we did it.... as young whipper_snappers bask in the day...
> > 
> > If you know what you are doing, the language is just a tool. Sometimes a
> > hammer is sufficient, other times one might need to use a screwdriver.
> > 
> >> --an_old_farts_logic
> > 
> > Thinking back on how long I've been playing with computers, I wonder how
> > long it will be until I am in the "old fart" category?
> 
> Stay young! I run full court hoops all the time with young college
> punks; it's one of my greatest joys in life, run with the young
> stallions, hacking, pushing, shoving, slicing and taunting other
> athletes. Old farts clubs is not something to be proud of, I just like
> to share too much......

Hehe.... One is only as old as he/she feels.

--
Joost


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
  2016-08-04 10:09             ` J. Roeleveld
@ 2016-08-04 17:08               ` james
  2016-08-04 19:19                 ` R0b0t1
  0 siblings, 1 reply; 27+ messages in thread
From: james @ 2016-08-04 17:08 UTC (permalink / raw
  To: gentoo-user

On 08/04/2016 05:09 AM, J. Roeleveld wrote:
> On Tuesday, August 02, 2016 12:16:32 AM james wrote:
>> On 08/01/2016 11:49 AM, J. Roeleveld wrote:
>>> On Monday, August 01, 2016 08:43:49 AM james wrote:
>
> <snipped>
>
>>>> Way back, when the earth was cooling and we all had dinosaurs for pets,
>>>> some of us hacked on AT&T "3B2" unix systems. They were know for their
>>>> 'roll back and recovery', triplicated (or more) transaction processes
>>>> and 'voters' system to ferret out if a transaction was complete and
>>>> correct. There was no ACID, the current 'gold standard' if you believe
>>>> what Douglas and other write about concerning databases.
<snip>
>> Comparing results of codes run on 3 different processors or separate
>> machines for agreement withing tolerances, is quite different.  The very
>> essence of using voting where there a result less that 1.0 (that is
>> n-1/n or n-x/n  was requisite on identical (replicated) processes all
>> returning the same result ( expecting either a 0 or 1) returned. Results
>> being logical or within rounding error of acceptance. Surely we need not
>> split hairs. I was merely pointing out that the basis telecom systems
>> formed the early and of widespread transaction processing industries and
>> is the grand daddy of the ACID model/norms/constructs of modern
>> transaction processing.
>
> Hmm... I am having difficulty following how ACID and ensuring results are
> correct by double or triple checking are related.

Atomicity; Consistency; Isolation, Durability == ACID (so we are all on 
the same page).

Not my thesis. My thesis, inspired by these threads, is that all of 
these (4) properties of ACID, originated in the telephone networks, as 
separate issues. When telephonic switching moved from electro-mechanical 
systems to computers, each of these properties where develop by the 
telephonic software and equipment providers. Banks followed the 
switching systems and these (4) ACID properties were realized to be 
universally useful and instituted  and rebranded as 'transactions'

Database systems, developed by IBM and other quickly realized the value 
of ACID properties in all sorts of forms of data movement and 
modification (ie the transaction).  Database developers and vendors
did not invent ACID properties. Indeed and in fact those properties were 
  first used collectively in the legacy  telephonic systems, best 
desribed by SS(7). Earlier version are a case study in redundancy and 
reliability of those early telecom systems. Granted latency was a big 
problem, that moving from electric circuits to digital circuits was 
fixed; yet still there was the five-nines of quality (99.999%) wonderful.

>> For massively parallel needs,
>> distributed processing rules, but it is not trivial
>
> Agreed.

<snip>

>
>> Another point, there are single big GPUs that can be run as thousands of
>> different processors on either FPGA or GPU, granted using SIMD/MIMD
>> style processors and thing like 'systolic algorithms' but that sort of
>> this is out of scope here. (Vulcan might change that, in an open source
>> kind of way, maybe). Furthermore, GPU resources combined with DDR-5 can
>> blur the line and may actually be more cost effective for many forms of
>> transaction processing, but clusters, in their current forms are very
>> much general purpose machines.
>
> I don't really agree here. For most software, having a really fast CPU helps.
> Having a lot of mediocre CPUs means the vast majority isn't doing anything
> useful.
> Software running on clusters needs to be written with massive parallel
> processing in mind. Most developers don't understand this part.

Where did you get the idea that folks builing clusters, are not as 
interested in using the fastest processors possible; dude, that's just 
failed (non-sequitur)logic.

Well this premise of yours is a corollary to my thesis; and the early 
telecom systems developers were historically 'bad ass' and highly 
intelligent. It has taken the software development world decades to 
catch up to key systems attributes of hardware design (redundancy and 
roll-back and recovery). Now that things are digital, you can run codes 
on a variety of different hardware to abstract the properties of ACID
and supercede ACID, with yet more properties of robust hardware design.
(Sadly, even most EE professors are severly lacking in this knowledge). 
Modern EE experts have most of their magic attributed to European 
Mathmeticians, but that's another issue, too complex for the average 
java* coder. Curiously, you can read all about, Hilbert, should you need 
to scratch that itch....

>> My point:: Douglas is dead wrong about ACID being dominated by Databases,
>> for technical reasons, particularly for advanced teams of experts.
>
> Wikipedia actually disagrees with you:
> https://en.wikipedia.org/wiki/ACID
> "In computer science, ACID (Atomicity, Consistency, Isolation, Durability) is
> a set of properties of database transactions."

Exactly. Database vendors got the ideas and components (literals and 
abstractions
from the telephonics industries to get a leg up on moving electronic 
switching (which already had those key components now referred to as 
ACID) in hardware. When those electro-mechanical systems move to digital 
circuits, Bell labs ensure those properties where a closely hend secret 
wrapped up in the 'unix OS' They did promote ACID in their software and 
the banks were the other customers were likewise saying YES YES YES, we 
want telcom ACID level of performance in our (developing) computer 
software too. But the migration to digital let the 'cat out of the bag' 
on the wonders of ACID (long before Timothy Leary, just so the 
Californians among us can keep up!).

> In other words, it's related to databases

They (vendors) copied it from telecom, and wildly promoted it, very 
successfully. Combine this with the fact that most US EE programs are 
abysmally weak (always have been), so now we indeed and in fact  have 
this severe lapse in robust and fault tolerant systems.

WHY?  Nothing (industrial or commercial) had the "Five-nines" of 
reliability, but those electro-mechanical telephonic systems.
*nothing* Everybody wanted it; hence those (4) components were harvested 
from telephonics and used as a model for all transactions.

Take "atomicity" for example. It has it's roots in "call setup".
Dialogic is a pc board vendor (from decades ago) that followed those 
early systems. Here is a document (from the 70s/80s/?) were they
have "40 Atomic Functions" that they use in software to control the 
hardware for 'call setup and management'. Sure many more documents 
exist, but they may not be publically available in electronic forms.
All of this occurred before those folks that write for Wikipedia were 
ever born, so they could not possible be aware of these issues and 
historical precedence.

[1] 
https://www.dialogic.com/webhelp/MSP1010/10.4.0/WebHelp/ppl_dg/l3p_cic.htm

One can research each of those four properties and discover how telecom 
integrated them into the phone system of North America (Europe almost 
evolved simultaneously). Bell Labs is " the data of ACID"; and it was a 
tightly held secret as long as possible, to delay the expansion of usage 
and eventual break up of that legacy monopoly.

There are many things in the (legacy) communications world that have not 
accurately made it's way to digital in a form freely available on the 
internet.  (like signal intercept). Think of all of those hidden 
antennae arrays in the UK when microwave telecom was all the rage.
MCI was a key player on exploiting microwave (another tenant of EE).

>> Surely most MBA, HR and Finance types of
>> idiots running these new startups would know know a coder from an
>> architect, and that is very sad, because a good consultant could have
>> probably designed several robust systems in a week or two. Grant few
>> consultants has that sort of unbiased integrity, because we all have
>> bills to pay and much is getting outsourced... Integrity has always been
>> the rarest of qualities, particularly with humanoids......
>
> The software Uber uses for their business had to be developed in-house as
> there, at least at the time, was nothing available they could use ready-made.
> This usually means, they start with something simple they can get running
> quickly. If they want to fully design the whole system first, they would never
> get anything done.
>
> Where these projects usually go wrong is that they wait too long with a good
> robust design, leading to a near impossibility of actually fixing all the, in
> hindsight obvious, design mistakes.
> (NOTE: In hindsight, as most of the actual requirements would not be clear on
> day 1)

I could not agree with you more.

The more processors, readily available to codes that know how to use 
them, in parallel the faster and better and more reliable the systems 
developed (including the software) will be. Some are working on extremly 
low latency systems where FPGAs are embedded in general purpose 
processors (Intel is leading on this). The DoD has been using these 
systems for decades. Clusters are superior to single (or multicore) 
systems if these kids knew anything about redundancy and fault 
tolerance; both which originate in hardware and the telecom industries 
perfected to the 99.999% robustness level (while IBM drulled on their 
punch-cards. I know, I was there......

And in my opinion,that was the most important of the collective of 
reasons why AT&T, it's 10,000+ lawyers and assholes in our government 
fought so hard to keep early unix expansion out of the hands of the 
masses. At one point it was easier to get a top-secret clearance than it 
was to code on those early telecom systems.

>>>> and the switch was was configured, then the code would
>>>> essentially 'vote' and majority ruled. This is what led to phone calls
>>>> (switched phone calls) having variable delays, often in the order of
>>>> seconds, mis-connections and other problems we all encountered during
>>>> periods of excessive demand.
>>>
>>> Not sure if that was the cause in the past, but these days it can also
>>> still take a few seconds before the other end rings. This is due to the
>>> phone-system (all PBXs in the path) needing to setup the routing between
>>> both end-points prior to the ring-tone actually starting.
>>> When the system is busy, these lookups will take time and can even
>>> time-out. (Try wishing everyone you know a happy new year using a wired
>>> phone and you'll see what I mean. Mobile phones have a seperate problem
>>> at that time)
>> I did not intend to argue about the minutia of how a particular Baby
>> Bell implemented their SS7 switching systems on unix systems. My point
>> was the 'transaction processing' grew out the early telephone network,
>> the way I remember it:: ymmv. Banks did dual entry accounting by hand
>> and had clerks manually load data sets, then double entry accounting
>> became automated and ACID style transaction processing added later. So
>> what sql folks refer to as ACID properties, comes from the North
>> American  switching heritage and eventually the worlds telecom networks,
>> eons ago.
>
> There is a similarity, but where ACID is a way of guaranteeing data integrity,
> a phone-switch does not need this. It simply needs to do the routing
> correctly.

Have you every talked to an old military officer that worked in 
Intelligence?  Like the spy plan incidence over Afganistan, circa 1960 
[2]?  https://en.wikipedia.org/wiki/1960_U-2_incidentf

Data integrity almost caused WW2.

WRONG. The fives-nines was so coveted by everyone else that there was a 
feeding frezy  on just how these folks at bell labs pulled it off. Early
(1950-1970s) computational systems were abysmal to own or operate and 
yet the sorry ass phone company had 99.999% perfection (thanks to bell 
labs)? They provided the T1 and T3 lines in/out of the pentagon. 
Jealousy was outrageous. Database vendors where struggling with 
assembler and 'board changeouts' as Rich alluded to.

<snip>

>>> ACID is about data integrity. The "best 2 out of 3" voting was, in my
>>> opinion, a work-around for unreliable hardware.

Correct. voting was used as the precursor technology to distributed 
systems (today it's the cluster), It added to the reliablity and 
robustness. It provided consistency. It demonstrated that the entire 
string of what was need for ss7, including call setup, could be 
replicated and run on a cluster (oops another hardware set)....

>> Absolute true. But the fact that a High Reliability in computer
>> processing (including the billing) could be replicated performed
>> elsewhere and then 'recombined', proves that the need of any ACID
>> function can be split up and ran on clusters and achieve ACID standards
>> or even better. So my point, is that the cluster, if used wisely,
>> will beat the 'dog shit' out of any Oracle fancy-pants database
>> maneuvers. Evidence:: Snoracle is now snapping up billion dollar
>> companies in the cluster space, cause their days of extortion are
>> winding down rather rapidly, imho.

> I disagree here. For some workloads, clusters are really great. But SQL
> databases will remain.

As a subset of distributed processing. Oracle (the champion of 
databases) is going to atrophy and slip into irrelevance, once kids 
learn how to supersede ACID with judicious cluster hardware and codes on 
top of heterogeneous clusters..... Granted any corp with billions and 
billions and deep (illegal?) relationships with government officals will 
eventually prosper again....

Once again, EE will light the forward path.

>> Also, just because the kids are writing the codes, have not figured all
>> of this out, does not mean that SQL and any abstraction is better that
>> parallel processing. No way in hell. Cheaper and quicker to set up,
>> surely true, but never superior to a well design properly coded
>> distributed solution. That's my point.
>
> Workloads where you can split the whole processing into small chunks where the
> same steps can be performed over a random sized chunk and merging at a later
> stage will lead to correct results. Then yes.

True, but it's not quite as restrictive  as you think. Large system, 
with even just a small bit of parallism integrated into the overall 
architecture, benefit. Howmuch  depends on the designers. We do need 
more EE coders leading on cluster designs, but the Universities (world 
wide) have let everyone down.

> However, I deal with processes and reports where the amount of possible chunks
> is definitely limited and any theoretical benefit of splitting it over multiple
> nodes will be lost when having to build a very fancy and complex algorithm to
> merge all the seperate results back together.

NoSQL is an abysmal failure. SQL need to be a small subset of robust 
parallel systems design and implementation. The latest venue is 
'unikernels'.

Cluster will dominate because deep pockets can have the latest and 
fastest and cheapest hardware, in massive quantities before the 
commoners even learn how it works. Arm64V8 is a prime example and 
current example. It's heat loading per unit of processing, blows away
Cisc based systems. FPGA can implement any processor or memory structure
and can it in microseconds. But these are areas where attornies via the 
patent system, abuse light-weight competition.

> This algorithm then also needs to be extensively tested analysed and
> understood by future developers. The additional cost involved will be
> prohibitive.

Don't we need more jobs? Are you kidding me? That's way large 
corporations are so vehemently aggressive in these spaces. We have all 
kinds of 'stem graduates' here in the US that cannot get a stem job.
(hence trumps appeal to the middle class:: tarrifs and promote 
competition at home).

> I disagree, UBER is still using a relational database as the storage layer
> with something custom put over it to make it simpler for the developers.
> Any abstraction layer will have a negative performance impact.

Wanna bet that UBER and like minded companies change again and again and 
again, until they start study of what mathematicians and EE have been 
doing for a very long time.

>>> It is based on a clever idea, but when
>>> 2 computers having the same data and logic come up with 2 different
>>> answers, I wouldn't trust either of them.

This is rare occurance in digital systems. However, when you look at 
other forms of computational mathematics, tolerances have to be used
to get consistency (oops another property of acid showing up in legacy 
literature).

I could not care less about UBER's problems, unless they send some funds 
my way. BUT, I am willing to share knowledge, so they 'wise up' because 
fundamentally, I love disruption in the status quo.

>>
>> Yep, That the QA of  Transactions is rejected and must be resubmitted,
>> modified or any number of remedies, is quite common in many forms of
>> software. Voting does not correct errors, except maybe a fractional
>> rounding up to 1(pass) or down to zero (failure). It does help to
>> achieve the ACI of ACID
>
> It's one way of doing it. But it can also cause extra delays due to having to
> wait for seperate nodes to finish and then to check if they all agree.

Once clusters are prototyped on Cisc systems, Those codes will be 
rapidly moving to DSPs, GPUs and FPGA and DDR5+. Those with deep pockets 
will 'smoke' the competition and idiots like Verizon
will be trying to make more stupid acquisitions. Folks do know that 
Verizon sold off billions in data centers, close to fiber highway
to by Yahoo, right? (It "pays out" because they are actually dumping
hundreds of thousands of legacy employees (trump voters); that's what 
that transaction is all about. They are still doom to fail, because the 
software idiots advising Verizon, have no clue about the fundamentals 
and mathematics of Communications. (very sad state of affair for Verizon).

>> Since billions and billions of these (complex) transactions are
>> occurring, it is usually just repeated. If it keeps failing then
>> engineers/coders take a deeper look. Rare statistical anomalies are
>> auto-scrutinized (that would be replications and voting) and the pushed
>> to a logical zero or logical one.
>
> The complexity comes from having to mould the algorithm into that structure.
> And additional complexity also makes it more fault-likely.

Only during development and beta tests. After a while it will become 
'rock solid' and pushed down into the lowest levels of hardware, so it 
is hidden from the average coder. Here is a billionare, who is quite 
stealthy, that has done this exact thing most recently.

[3] https://www.deshawresearch.com/
[4] 
https://www.quora.com/unanswered/Computer-Architecture-How-its-like-working-for-DESHAW-RESEARCH-as-an-ASIC-designer-architect

<snip>

> A lot can be described using 'modern' designs. However, the fact remains that
> ACID was worked out for databases and not for phone systems. Any sane system
> will have some form of consistency checks, but the extent where this is done
> for a data storage layer, like a database, will be different to the extent
> where this is done for a switching layer, like a router or phone switch.

Please reread my previous posts. You, or anyone can do the individual 
(and robust) research on the ACID components and the history of telecom.

Wikipedia and many other sites have failed you here; sorry.

<snip>

> Those incompetencies are usually in the domain of finances and services
> provided. The basic service of a telecoms company is pretty simple: "Pass
> data/voice between A and B".
> There are plenty of proven systems available that can do this. The mistakes
> are usually of the kind: The system that we bought does not handle the load
> the salesperson promised.

ON the surface, you are absolutely correct. Mass education is severly 
thrwated by the entire patent system, grotesque lawyers and legal 
semantics and the 'bought and sold politicians' from around the globe.
(the same folks that brought us globalism). So folks are merely 
"uneducated" in these matters. Yes these globalists continue to consipre 
against commoners, around the globe. Education and sharing of hardware 
and software and mathematics and physics will set the captives free 
(eventually). This is the essence of WW3 imho.

The fact that the masses and even most coders are blissfully unaware of 
where ACID came from, is a testament to the failure of globalism that 
provides the protection to the billionaire class of manipulators, imho.

>
>>> With a small number, it might actually still scale, but when you pass a
>>> magic number (no clue what this would be), the counting time starts to
>>> exceed any time you might have gained by adding more voters.
>>
>> Nope the larger the number, the more expensive. The number of voters
>> rarely goes above 5, but it could for some sorts of physics problems
>> (think quantum mechanics and logic not bound to [0 1] whole numbers.
>> Often logic circuits (constructs for programmers, have "dont care"
>> states that can be handled in a variety of ways (filters, transforms,
>> counters etc etc).
>
> "don't care" values should always be ignored. Never actually used. (Except for
> randomizer functionality)

Dude, you need to find some Rf/analog folks and learn about what's going 
on around "noise" in systems. Once thought to be useless, or a 
hindrance, it is a fertile ground for innovation, again that the masses 
are blissfully unaware of. Much is termed "classified" just so you know.

>
>>> Also, this, to me, seems to counteract the whole reason for using
>>> clusters:
>>> Have different nodes handle a different part of the problem.
>>
>> That also occurs. But my point is properly design code for the cluster
>> can replace ACID functions, offered by Oracle and other over priced
>> solutions, on standard cluster hardware.
>
> All commonly used relational databases have ACID functionality as long as they
> support transactions. There is no need to only choose a commercial version for
> that.

Like the Chinese, they are brilliant copy cats:: nothing wrong with that 
(see my take on 100% absolution of all patents, globally.

>
>> The problem with todays
>> clusters is the vendors that employ the kid-coders, are making things
>> far more complicated that necessary, so the average linux hacker just
>> outsources via the cloud. DUMB, insecure and not a wise choice for many
>> industries.
>
> Moving your entire business into the cloud often is.
I could not agree more. HYBRID systems, where the chief 
architect/designer works exclusively for the custer, is where the future 
will shake out. All of this idiocy on the masses on the web:: who cares 
where it is processed. The closer to the node-idiot-user-consumer, the 
better, mathematically.

>
>> And sooner or later folks are going to get wise can build
>> their own clusters that just solve the problems they have. Surely hybrid
>> clusters will domiant where the owner of the codes does outsource peak
>> loads and mundance collects of ordinary (non-critical) data.
>
> Eg. hybrid solutions...

Yes yes and HELL YES! In fact gentoo stands out for the quintessential
'unikernel' for distributed processing!

>> Vendors  know this and have started another 'smoke and mirrors' campaign called
>> (brace yourself) 'Unikernels'.....
>
> "unikernels" is something a small group came up with... I see no practical
> benefit for that approach.

A minimize gentoo system and an optimize and severly stripped linux 
kernel is pretty much a unikernel. Docker, the leader in 
commercialization of containers, knows this and has subsummed Alpine 
linux. Patients my friend, it will become very clear over time, but not 
exactly the way the current vendors are portraying unikernels.

>
>> Problem with that approach is they
>> should just be using minized (focused) gentoo on striped and optimize
>> linux kernels; but that is another lost art from the linux collection
>
> I see "unikernels" as basically, running the applications directly on top of a
> hypervisor. I fail to see how this makes more sense than starting an
> application directly on top of an OS. The whole reason we have an OS is to
> avoid having to reinvent the wheel (networking, storage, memory handling,....)
> for every single program.

(see above response). For the last few years, I have run into an 
astounding number of brilliant folks that have mastered and use gentoo 
on a daily basis. The more I learn about clusters, the more I realize 
why this massive of gentoo folks are so silent on these matters. 
Strategic business plans, brah. Gentoo is the worlds best kept secret.

>
>>> Clusters of multiple compute-nodes is a quick and "simple" way of
>>> increasing the amount of computational cores to throw at problems that
>>> can be broken down in a lot of individual steps with minimal
>>> inter-dependencies.
>>
>> And surpass the ACID features of either postgresql or Oracle, and spend
>> less money (maybe not with you and postgresql on their team)!
>
> Large clusters are useful when doing Hadoop ("big data") style things (I
> mostly work with financial systems and the corresponding data).
> Storing the entire datawarehouse inside a cluster doesn't work with all the
> additional requirements. Reports still need to be displayed quickly and a
> decently configured database is usually more beneficial. Where systems like
> Exadata really help here is by integrating the underlying storage (SAN) with
> the actual database servers and doing most of the processing in-memory.
> Eg. it works like a dedicated and custom build cluster environment specifically
> for a relational database.

There is a revolution in hardware memory technologies. In a few more 
years massive ram will be an integral part of of the computational 
hardware (think DDR5 and GPUs currently.   Most massive systems can be 
split up into small systems too. Databse vendors have little incentive 
to do this for customers. The art of the design and implementation of 
'transaction processing' need to return to hardware concepts during this 
transition.

>
>
>>> I say "simple" because I think designing a 1,000 core chip is more
>>> difficult than building a 1,000-node cluster using single-core, single
>>> cpu boxes.
>> Today, you are correct. Tomorrow you will be wrong.
>
> In that case, clusters will be obsolete tomorrow.

No, the chips and the cluster will be one in the same. Real time 
sequence stepping in problem->solution domains for things like flight 
simulation and subsurface fluid management are still grand challenges
that are a ways off. The average database solution, even for large 
commercial/global operations, is going to migrate to clusters. Clusters 
and storage will continue to migrate to silicon. The biggest problem is 
the patent system and artificial constructs more commonly known in the 
business world as "cost barrier to entry" economics. These mostly result 
from the way the local/state/federal/global laws are implemented and 
enforced.

>
>> [1]. Besides once
>> that chip or VHDL code or whatever is designed, it can be replicated and
>> resused endlessly. Think ASIC designers, folks to take a fpga project to
>> completing, An EE can codes on large arrays of DSPs, or a GPU
>> (think Khronos group) using Vulcan.
>>
>>> I would still consider the cluster to be a single "machine".
>>
>> Thats the goal.
>
> That, in my opinion, that goal has already been achieved. Unless you want ALL
> machines to be part of the same cluster and all machines being able to push
> work to the entire cluster...
> In that case, good luck in achieving this as you then also need to handle
> "randomly dissapearing nodes"

I think Brexit and Trump will replace globalism with localism and 
tariffs. Goverments will fight over the spoils of tariffs to finance 
their glutony, and locals will figure out how to build and operate 
everything, locally. So you are correct. I actually am promoting hybrids 
clusters, so the commoners can ;'suck the brain-marrow' out of 
walstreet, politicans and the globalists. Once groups of locals learn to 
be self sufficient, think of them and digital omish, the only function 
governemnts and globallist provide is national security. Folks that like 
work can join up and kills folks from other like minded collectives. 
Most will be extraordinarily happy to provide 100% of what they need, 
locally. There will be some exchange of material and those less 
innovative will lag a bit, but that is what globalist should concentrate 
on:: how to teach those less fortunate how to become self sufficient, 
locally.

> And 90+% of developers still don't understand how to properly code for multi-
> threading. Just look at how most applications work on your desktop. They all
> tend to max out a single core and the other x-1 cores tend to idle...

Wonder why Bill Gates (in his tax-dogging world charities) is not 
teaching this stuff?  Rupert Murdock?  Rich Arabs?  Chineese?

The elites of the world are 'selfish bastards' and use the good work 
that come from their ranks to further screw up localism (self 
sufficiency on a local basis). Sooner or later these globalist will have 
to answer to the masses of local citizens, wherever they are hiding. We 
have seen the purging of the Republican party. The Democratic Elites are 
currently undergoing a purging. After Brexit, it
will rapidly expand in Europe. Saudis are running scared. Pandemic
of locals that want to be self sufficient. Folks are tire of listing to 
some (asshole) expert that does not live down the street from them.
Globalism flies in the face of common-sense, and computational 
competence is not except. There is latency and much deceptions in the 
work of computations, but that too will fall (eventually).

>
>> Granted many old decreped codes had to be
>> redesigned and coded anew with threads and other modern constructs to
>> take advantage of newer processing platforms.
>
> Intel came with Hyperthreading back in 2005 (or even before). We are now in
> 2016 and the majority of code is still single-threaded.
> The problem is, the algorithms that are being used need to be converted to
> parallel methods.
>
>> Sure the same is true with
>> distributed, but it's far closer than ever. The largest problem with
>> cluster, is Vendors with agendas, are making things more complicated
>> than necessary and completely ignoring many fundamental issues, like
>> kernel stripping and optimizations under the bloated OS they are using.
>
> I still want a graphical desktop with full multi media support. I still want
> to easily plugin a USB device or SD-card and use it immediately,.....
> That requirement is incompatible with stripping the OS.

Agreed. And I want to build the hardware on my own 3D printer. I am 
flexible to try out many offerings when 3D printing looses those patents 
on using metals and semiconductor materials......
This too will come, hopefully sooner than later and without the shedding 
of blood....

>
>>>> I do
>>>> not have the old articles handy but, I'm sure that many/most of those
>>>> types of inherent processes can be formulated in the algebraic domain,
>>>> normalized and used to solve decisions often where other forms of
>>>> advanced logic failed (not that I'm taking a cheap shot at modern
>>>> programming languages) (wink wink nudge nudge); or at least that's how
>>>> we did it.... as young whipper_snappers bask in the day...
>>>
>>> If you know what you are doing, the language is just a tool. Sometimes a
>>> hammer is sufficient, other times one might need to use a screwdriver.
>>>
>>>> --an_old_farts_logic
>>>
>>> Thinking back on how long I've been playing with computers, I wonder how
>>> long it will be until I am in the "old fart" category?
>>
>> Stay young! I run full court hoops all the time with young college
>> punks; it's one of my greatest joys in life, run with the young
>> stallions, hacking, pushing, shoving, slicing and taunting other
>> athletes. Old farts clubs is not something to be proud of, I just like
>> to share too much......
>
> Hehe.... One is only as old as he/she feels.
>
> --
> Joost

Young kids often show amazing wisdom. The educational processes beat 
this out of kids. Isolation and localism (aka home schooling) does allow 
kids to explode on both technical competence and creativity.
But this flies in the face of the goals of globalism.  When I was young, 
there was a kid that was brilliant and 100% home schooled by mostly 
uneducated parents. They lived in the bush of Alaska, hundreds of miles 
from anyone. Brilliance and innovation are the providence of the youth; 
just look at all of those young, brilliant minds from post-mid-evil 
Europe. Mass education just beat those traits right out of all children. 
Communications and localism will yeild many, many brilliant folks and 
that is the greatest fear of the globalist, who want to remain in power 
and have dominion over the masses. It's the classic struggle. The path 
to a better future is espoused in parallel and distributed and local 
decision/control, from politics to hardware to software.

hth,
James

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
  2016-08-04 17:08               ` james
@ 2016-08-04 19:19                 ` R0b0t1
  0 siblings, 0 replies; 27+ messages in thread
From: R0b0t1 @ 2016-08-04 19:19 UTC (permalink / raw
  To: gentoo-user

On Thu, Aug 4, 2016 at 12:08 PM, james <garftd@verizon.net> wrote:
>
> Atomicity; Consistency; Isolation, Durability == ACID (so we are all on the
> same page).
>
> Not my thesis. My thesis, inspired by these threads, is that all of these
> (4) properties of ACID, originated in the telephone networks, as separate
> issues.

https://en.wikipedia.org/wiki/Two_Generals'_Problem
http://tvtropes.org/pmwiki/pmwiki.php/Main/OlderThanDirt


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
  2016-08-01 13:43       ` james
  2016-08-01 16:49         ` J. Roeleveld
@ 2016-08-11 12:43         ` Douglas J Hunley
  1 sibling, 0 replies; 27+ messages in thread
From: Douglas J Hunley @ 2016-08-11 12:43 UTC (permalink / raw
  To: Gentoo

[-- Attachment #1: Type: text/plain, Size: 859 bytes --]

On Mon, Aug 1, 2016 at 9:43 AM, james <garftd@verizon.net> wrote:

> I guess my point is 'Douglas' is full of stuffing, OR that is what folks
> are doing when they 'role their own solution specifically customized to
> their specific needs' as he alludes to near the end of his commentary? (I'd
> like your opinion of this and maybe some links to current schemes how to
> have ACID/99.999% accurate transactions on clusters of various
> architectures.)  Douglas, like yourself, writes of these things in a very
> lucid fashion, so that is why I'm asking you for your thoughts.


Douglas didn't write the damn thing, merely added it to the discussion
here. Thank you very much


-- 
{
  "name": "douglas j hunley",
  "email": "doug.hunley@gmail.com",
  "social": [
    {
        "blog": "https://hunleyd.github.io/",
        "twitter": "@hunleyd"
    }
    ]
}

[-- Attachment #2: Type: text/html, Size: 1520 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
  2016-08-01  7:16     ` J. Roeleveld
  2016-08-01 13:43       ` james
@ 2016-08-01 15:01       ` Rich Freeman
  2016-08-01 17:31         ` J. Roeleveld
  2016-08-01 23:18         ` Alan McKinnon
  1 sibling, 2 replies; 27+ messages in thread
From: Rich Freeman @ 2016-08-01 15:01 UTC (permalink / raw
  To: gentoo-user

On Mon, Aug 1, 2016 at 3:16 AM, J. Roeleveld <joost@antarean.org> wrote:
>
> Check the link posted by Douglas.
> Ubers article has some misunderstandings about the architecture with
> conclusions drawn that are, at least also, caused by their database design and
> usage.

I've read it.  I don't think it actually alleges any misunderstandings
about the Postgres architecture, but rather that it doesn't perform as
well in Uber's design.  I don't think it actually alleges that Uber's
design is a bad one in any way.

But, I'm certainly interested in anything else that develops here...

>
>> And of course almost any FOSS project could have a bug.  I
>> don't know if either project does the kind of regression testing to
>> reliably detect this sort of issue.
>
> Not sure either, I do think PostgreSQL does a lot with regression tests.
>

Obviously they missed that bug.  Of course, so did Uber in their
internal testing.  I've seen a DB bug in production (granted, only one
so far) and they aren't pretty.  A big issue for Uber is that their
transaction rate and DB size is such that they really don't have a
practical option of restoring backups.  Obviously they'd do that in a
complete disaster, but short of that they can't really afford to do
so.  By the time a backup is recorded it would be incredibly out of
date.  They have the same issue with the lack of online upgrades
(which the responding article doesn't really talk about).  They really
need it to just work all the time.

>> I'd think that it is more likely
>> that the likes of Oracle would (for their flagship DB (not for MySQL),
>
> Never worked with Oracle (or other big software vendors), have you? :)

Actually, I almost exclusively work with them.  Some are better than
others.  I don't work directly with Oracle, but I can say that the two
times I've worked with an Oracle consultant they've been worth their
weight in gold, and cost about as much.  The one was fixing some kind
of RDB data corruption on a VAX that was easily a decade out of date
at the time; I was shocked that they could find somebody who knew how
to fix it.  interestingly, it looks like they only abandoned RDB
recently.

They do tend to be a solution that involves throwing money at
problems.  My employer was having issues with a database from another
big software vendor which I'm sure was the result of bad application
design, but throwing Exadata at it did solve the problem, at an
astonishing price.  Neither my employer nor the big software provider
in question is likely to attract top-notch DB talent (indeed, mine has
steadily gotten rid of anybody who knows how to do anything in Oracle
beyond creating schemas it seems, though I can only imagine how much
they pay annually in their license fees; and yes, I'm sure 99.9% of
what they use Oracle (or SQL Server) for would work just fine in
Postgres).

>
> Only if you're a big (as in, spend a lot of money with them) customer.
>

So, we are that (and I think a few of our IT execs used to be Oracle
employees, which I'm sure isn't hurting their business).  I'll admit
that Uber might not get the same attention.  Seems like Oracle is the
solution at work from everything to software that runs the entire
company to software that hosts one table for 10 employees (well, when
somebody notices and gets it out of Access).  Well, unless it involves
an MS-oriented dev or Sharepoint, in which case somebody inevitably
wants it on SQL Server.  I did mention that we're not a world-class IT
shop, didn't I?

-- 
Rich

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
  2016-08-01 15:01       ` Rich Freeman
@ 2016-08-01 17:31         ` J. Roeleveld
  2016-08-02  1:07           ` Rich Freeman
  2016-08-01 23:18         ` Alan McKinnon
  1 sibling, 1 reply; 27+ messages in thread
From: J. Roeleveld @ 2016-08-01 17:31 UTC (permalink / raw
  To: gentoo-user

On Monday, August 01, 2016 11:01:28 AM Rich Freeman wrote:
> On Mon, Aug 1, 2016 at 3:16 AM, J. Roeleveld <joost@antarean.org> wrote:
> > Check the link posted by Douglas.
> > Ubers article has some misunderstandings about the architecture with
> > conclusions drawn that are, at least also, caused by their database design
> > and usage.
> 
> I've read it.  I don't think it actually alleges any misunderstandings
> about the Postgres architecture, but rather that it doesn't perform as
> well in Uber's design.  I don't think it actually alleges that Uber's
> design is a bad one in any way.

It was written quite diplomatic. Seeing the create table for the sample tables 
already make me wonder how they designed their database schema. Especially 
from a performance point of view. But that is a seperate discussion :)

> But, I'm certainly interested in anything else that develops here...

Same here, and I am hoping some others will also come up with some interesting 
bits.

> >> And of course almost any FOSS project could have a bug.  I
> >> don't know if either project does the kind of regression testing to
> >> reliably detect this sort of issue.
> > 
> > Not sure either, I do think PostgreSQL does a lot with regression tests.
> 
> Obviously they missed that bug.  Of course, so did Uber in their
> internal testing.  I've seen a DB bug in production (granted, only one
> so far) and they aren't pretty.  A big issue for Uber is that their
> transaction rate and DB size is such that they really don't have a
> practical option of restoring backups.

From the slides on their migration from MySQL to PostgreSQL in 2013, I see it 
took them 45 minutes to migrate 50GB of data.
To me, that seems like a very bad transfer-rate for, what I would consider, a 
dev environment. It's only about 20MB/s.
I've seen "bad performing" ETL processes reading from 300GB of XML files and 
loading that into 3 DB-tables within 1.5 hours. That's about 57MB/s.
With the XML-engine using up nearly 98% of the total CPU-load.

If the data would have been supplied in CSV-files, it would have been roughly 
100GB of data. This could be easily loaded within 20 minutes. Equalling to 
85MB/s. (Filling up the network bandwidth)

I think their database design and infrastructure isn't optimized for their 
specific work-load. Which is, unfortunately, quite common.

> Obviously they'd do that in a
> complete disaster, but short of that they can't really afford to do
> so.  By the time a backup is recorded it would be incredibly out of
> date.  They have the same issue with the lack of online upgrades
> (which the responding article doesn't really talk about).  They really
> need it to just work all the time.

When I migrate a Postgresql to a new major version, I migrate 1 database at a 
time to minimize downtime. This is done by piping the output of the backup-
process straight into a restore-proces connected to the new server.

If it were even more time-critical, I would develop a migration proces that 
would:
1) copy all the current (as in, needed today) to the new database
2) disable the application
3) copy all the latest changes for today to the new database
4) reenable the application (pointing to new database)
5) copy all the historical data I might need

I would add a note on the website and send out an email first informing the 
customers that the data is being migrated and historical data might be 
incomplete during this proces.

> >> I'd think that it is more likely
> >> that the likes of Oracle would (for their flagship DB (not for MySQL),
> > 
> > Never worked with Oracle (or other big software vendors), have you? :)
> 
> Actually, I almost exclusively work with them.  Some are better than
> others.  I don't work directly with Oracle, but I can say that the two
> times I've worked with an Oracle consultant they've been worth their
> weight in gold, and cost about as much.

They do have some good ones...

> The one was fixing some kind
> of RDB data corruption on a VAX that was easily a decade out of date
> at the time; I was shocked that they could find somebody who knew how
> to fix it.  interestingly, it looks like they only abandoned RDB
> recently.

Probably one of the few people in the world. And he/she might have been hired 
in by Oracle for this particular issue.

> They do tend to be a solution that involves throwing money at
> problems.  My employer was having issues with a database from another
> big software vendor which I'm sure was the result of bad application
> design, but throwing Exadata at it did solve the problem, at an
> astonishing price.

I was at Collaborate last year and spoke to some of the guys from Oracle. (Not 
going into specifics to protect their jobs). When asked if one of my customers 
should be using Oracle RAC or Exadata, the answer came down to: "If you think 
RAC might be sufficient, it usually is"

Exadata, however, is a really nice design. But throwing faster machines at a 
problem should only be part of the solution.
I know someone who claims he can make a "standard" Oracle database outperform 
an Exadata database. That claim is based on the (usually true) assumption that 
databases are not designed for performance.
Mind, if the same tricks would be done on an Exadata environment, you'd see 
phenominal performance.

> Neither my employer nor the big software provider
> in question is likely to attract top-notch DB talent (indeed, mine has
> steadily gotten rid of anybody who knows how to do anything in Oracle
> beyond creating schemas it seems,

Actively? Or by simply letting the good ones go while replacing them with 
someone less clued up?

> though I can only imagine how much
> they pay annually in their license fees; and yes, I'm sure 99.9% of
> what they use Oracle (or SQL Server) for would work just fine in
> Postgres).

That is my feeling as well. The problem is that the likes of Informatica (one 
of the leading ETL software vendors) don't actually support PostgreSQL. That 
is a bit of a downside. I'd need to use ODBC (yes, that also works on non-MS 
Windows) to connect.

> > Only if you're a big (as in, spend a lot of money with them) customer.
> 
> So, we are that (and I think a few of our IT execs used to be Oracle
> employees, which I'm sure isn't hurting their business). 

I actually didn't join Oracle. I did, however, used to work for one of the 
companies Oracle bought. I decided not to wait for the inevitable job cuts. In 
hindsight, that one wasn't too bad as they actually kept that part for nearly 
8 years.

> I'll admit
> that Uber might not get the same attention.  Seems like Oracle is the
> solution at work from everything to software that runs the entire
> company to software that hosts one table for 10 employees (well, when
> somebody notices and gets it out of Access).

Don't forget the Finance departments. They tend to use Excel files for 
everything.

> Well, unless it involves
> an MS-oriented dev or Sharepoint, in which case somebody inevitably
> wants it on SQL Server.  I did mention that we're not a world-class IT
> shop, didn't I?

I won't actually name companies, but I've seen plenty of big ones that would 
fit your description. So not sure what a "world-class" IT shop would look like 
when having to deal with the internal politics, bureaucracy and procedures 
that come as standard with big companies.

--
Joost

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
  2016-08-01 17:31         ` J. Roeleveld
@ 2016-08-02  1:07           ` Rich Freeman
  2016-08-02  7:03             ` J. Roeleveld
  0 siblings, 1 reply; 27+ messages in thread
From: Rich Freeman @ 2016-08-02  1:07 UTC (permalink / raw
  To: gentoo-user

On Mon, Aug 1, 2016 at 1:31 PM, J. Roeleveld <joost@antarean.org> wrote:
> On Monday, August 01, 2016 11:01:28 AM Rich Freeman wrote:
>> Neither my employer nor the big software provider
>> in question is likely to attract top-notch DB talent (indeed, mine has
>> steadily gotten rid of anybody who knows how to do anything in Oracle
>> beyond creating schemas it seems,
>
> Actively? Or by simply letting the good ones go while replacing them with
> someone less clued up?

A bit of both.  A big part of it was probably sacking anybody doing
anything other than creating tables (since you can't keep operating
without that), and outsourcing to 3rd parties and wanting
bottom-dollar prices.

There are accidentally some reasonably competent people in IT at my
company, but I don't think it is because we really are good at
targeting world-class talent.

>
> The problem is that the likes of Informatica (one
> of the leading ETL software vendors) don't actually support PostgreSQL.

Please tell me that it actually does support xml in a sane way, and it
is only our incompetent developers who seem to be hand-generating xml
files by printing strings?

I have an integration that involves Informatica, and another solution
that just synchronizes files from an smb share to a foreign FTP site.
Of course I don't have access to the share that lies in-between, so
when the interface breaks I get to play with two different groups to
try to figure out where the process died.  Informatica appears to be
running on Unix and I get helpful questions from the maintainers about
what path the files are on, as if I'd have any idea where some SMB
share (whose path I am not told) is mounted on some Unix server I have
no access to.

Gotta love division of labor.  Heaven forbid anybody have visibility
to the full picture so that the right group can be engaged on the
first try...

-- 
Rich

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
  2016-08-02  1:07           ` Rich Freeman
@ 2016-08-02  7:03             ` J. Roeleveld
  0 siblings, 0 replies; 27+ messages in thread
From: J. Roeleveld @ 2016-08-02  7:03 UTC (permalink / raw
  To: gentoo-user

On Monday, August 01, 2016 09:07:05 PM Rich Freeman wrote:
> On Mon, Aug 1, 2016 at 1:31 PM, J. Roeleveld <joost@antarean.org> wrote:
> > On Monday, August 01, 2016 11:01:28 AM Rich Freeman wrote:
> >> Neither my employer nor the big software provider
> >> in question is likely to attract top-notch DB talent (indeed, mine has
> >> steadily gotten rid of anybody who knows how to do anything in Oracle
> >> beyond creating schemas it seems,
> > 
> > Actively? Or by simply letting the good ones go while replacing them with
> > someone less clued up?
> 
> A bit of both.  A big part of it was probably sacking anybody doing
> anything other than creating tables (since you can't keep operating
> without that), and outsourcing to 3rd parties and wanting
> bottom-dollar prices.

Yes, one of the more common decisions. Often because the person hired to 
handle the department comes from an outsourcing company or because they happen 
to meet at the golf course.

> There are accidentally some reasonably competent people in IT at my
> company, but I don't think it is because we really are good at
> targeting world-class talent.

I wonder which companies are actually good at that?

> > The problem is that the likes of Informatica (one
> > of the leading ETL software vendors) don't actually support PostgreSQL.
> 
> Please tell me that it actually does support xml in a sane way, and it
> is only our incompetent developers who seem to be hand-generating xml
> files by printing strings?

<OT>
There are actually 2 supported methods (not counting randomly sticking strings 
together):

1) The default XML handling (source/target and transformation). This sort-of 
works for "simple" XML files. The definition for "simple" is in the sales-
contract: No more then ?? levels deep, XSD less then ???MB and XML file less 
than ???MB. I don't remember the actual numbers, but check with whoever has 
the actual contract in your company. It should be listed there or call 
Informatica support.

2) B2B / UDO. The UDO stands for Unstructured Data Option. Bit strange, but 
that's where it lives. It's a proper XML handling engine that should be able 
to handle any XML you care to throw at it. Also documents with a standardised 
layout. It's the preferred method of handling XML files with Informatica. (Do 
use at least 9.6.1 for this. 9.5 has a very annoying feature...

</OT>

> I have an integration that involves Informatica, and another solution
> that just synchronizes files from an smb share to a foreign FTP site.
> Of course I don't have access to the share that lies in-between, so
> when the interface breaks I get to play with two different groups to
> try to figure out where the process died.  Informatica appears to be
> running on Unix and I get helpful questions from the maintainers about
> what path the files are on, as if I'd have any idea where some SMB
> share (whose path I am not told) is mounted on some Unix server I have
> no access to.

Check the session-log (from Informatica), that should contain the actual path 
Informatica uses to write the file to.

> Gotta love division of labor.  Heaven forbid anybody have visibility
> to the full picture so that the right group can be engaged on the
> first try...

I see this all too often. They usually claim it's because of security. Not 
understanding that by obscuring all the details, the first person to get the 
full picture is the one going to cause havoc and the people that are then 
tasked to fix it, don't know enough to do it right in a reasonable time-frame.

--
Joost

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
  2016-08-01 15:01       ` Rich Freeman
  2016-08-01 17:31         ` J. Roeleveld
@ 2016-08-01 23:18         ` Alan McKinnon
  2016-08-02  0:55           ` Rich Freeman
  1 sibling, 1 reply; 27+ messages in thread
From: Alan McKinnon @ 2016-08-01 23:18 UTC (permalink / raw
  To: gentoo-user

On 01/08/2016 17:01, Rich Freeman wrote:
> On Mon, Aug 1, 2016 at 3:16 AM, J. Roeleveld <joost@antarean.org> wrote:
>> >
>> > Check the link posted by Douglas.
>> > Ubers article has some misunderstandings about the architecture with
>> > conclusions drawn that are, at least also, caused by their database design and
>> > usage.
> I've read it.  I don't think it actually alleges any misunderstandings
> about the Postgres architecture, but rather that it doesn't perform as
> well in Uber's design.  I don't think it actually alleges that Uber's
> design is a bad one in any way.


He does also make the stinger at the end:

On 2013 Uber migrated FROM mysql TO postgres, and now in 2016 they 
migrated FROM postgres TO Schemaless (with just happens to have InnoDB 
as backend).

So the original article very much seems to have been written with a 
skewed bias and wrong focus. That's bias as in "shifted to one side as 
used in math" not bias as in "opinionated asshat beating some special drum"


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
  2016-08-01 23:18         ` Alan McKinnon
@ 2016-08-02  0:55           ` Rich Freeman
  0 siblings, 0 replies; 27+ messages in thread
From: Rich Freeman @ 2016-08-02  0:55 UTC (permalink / raw
  To: gentoo-user

On Mon, Aug 1, 2016 at 7:18 PM, Alan McKinnon <alan.mckinnon@gmail.com> wrote:
>
> So the original article very much seems to have been written with a skewed
> bias and wrong focus. That's bias as in "shifted to one side as used in
> math" not bias as in "opinionated asshat beating some special drum"
>

Well, I wouldn't say "wrong focus" so much as "particular focus."  The
original article doesn't really purport to be a holistic comparison of
the two systems, just an explanation of why they're migrating.  I
think people are reading a bit too much into it.

However, the original article would probably benefit from a few
caveats thrown in.

-- 
Rich

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
  2016-07-29 20:58 [gentoo-user] PostgreSQL Vs MySQL @Uber Mick
  2016-07-29 22:24 ` Alan McKinnon
@ 2016-08-02 17:49 ` Rich Freeman
  1 sibling, 0 replies; 27+ messages in thread
From: Rich Freeman @ 2016-08-02 17:49 UTC (permalink / raw
  To: gentoo-user

On Fri, Jul 29, 2016 at 4:58 PM, Mick <michaelkintzios@gmail.com> wrote:
> Interesting article explaining why Uber are moving away from PostgreSQL.  I am
> running both DBs on different desktop PCs for akonadi and I'm also running
> MySQL on a number of websites.  Let's which one goes sideways first.  :p
>
>  https://eng.uber.com/mysql-migration/
>

There is a thread on this on the Postgres lists as well (unsurprisingly):

https://www.postgresql.org/message-id/flat/579795DF.10502%40commandprompt.com#579795DF.10502@commandprompt.com

I'm only halfway through it but the Postgres devs strike me as being
very levelheaded and competent.  They seem to acknowledge the genuine
issues and point of some of the tradeoffs that Uber is making without
pointing them out.

One thing I really did like about the Uber post was that even if it
isn't a complete/fair comparison/etc it is really informative as an
introduction into how some of the architecture works.  The same
applies to much of the Postgres thread.  I found it really useful for
understanding how both indexing/replication solutions work under the
hood.

-- 
Rich

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2016-08-12 18:01 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-29 20:58 [gentoo-user] PostgreSQL Vs MySQL @Uber Mick
2016-07-29 22:24 ` Alan McKinnon
2016-07-29 22:38   ` Rich Freeman
2016-07-29 23:01     ` Mick
2016-08-01  1:48       ` Douglas J Hunley
2016-08-01  7:16     ` J. Roeleveld
2016-08-01 13:43       ` james
2016-08-01 16:49         ` J. Roeleveld
2016-08-01 18:03           ` Rich Freeman
2016-08-02  5:51             ` james
2016-08-11 12:48               ` Douglas J Hunley
2016-08-12 13:00                 ` james
2016-08-12 14:13                   ` R0b0t1
2016-08-12 14:15                     ` R0b0t1
2016-08-12 18:01                       ` james
2016-08-02  5:16           ` james
2016-08-04 10:09             ` J. Roeleveld
2016-08-04 17:08               ` james
2016-08-04 19:19                 ` R0b0t1
2016-08-11 12:43         ` Douglas J Hunley
2016-08-01 15:01       ` Rich Freeman
2016-08-01 17:31         ` J. Roeleveld
2016-08-02  1:07           ` Rich Freeman
2016-08-02  7:03             ` J. Roeleveld
2016-08-01 23:18         ` Alan McKinnon
2016-08-02  0:55           ` Rich Freeman
2016-08-02 17:49 ` Rich Freeman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox