ZickZack

ZickZack@fedia.io · 3 months ago

exactly: It’s “open source” like android. The core android is open source (in many cases because they are required to), but that does not include anything that makes the actual system work for normal users. The core android is open source (“Android Open Source Project”), but that includes practically nothing: Essentially the stuff that is in there are things that have to be open source (like the linux kernel they use). However, if you want to have the system “practically useable” you need a lot more, which is usually the “Google Mobile Services”, which are proprietary. You are also generally required to install all items in the GMS, i.e. even if you only need the play store, you still have to install google chrome.

Further, the android name and logo are trademarked by google, so even if you want to roll your own android, you would not be allowed to call it android. WearOS is essentially the same thing: The android subsystem is open, the actual thing you call WearOS (plus trademarks, etc.) are not.

ZickZack@fedia.io · 6 months ago

Here is the more burning question: What is worse? Case “It was not made to design standards”: Then boing might have a problem in their manufacturing processes, which is going to have ramifications on the entire fleet. This would be bad, but fixable.

Case “It was made to design standards”: In that case you only have a problem with this one type of jet, but you have a problem in your fundamental design, which might ground the entire fleet (again).

ZickZack@fedia.io · 6 months ago

And that would be completely legal, just like any random guy on deviantart can draw something in the style of e.g. Picasso without getting into trouble (unless of course they claim it was painted by picasso, but that should be obvious).

ZickZack@fedia.io · 6 months ago

train one with all the Nintendo leaks

This is fine

generate some Zelda art and a new Mario title

This is copyright infringement.

The ruling in japan (and as I predict also in other countries) is that the act of training a model (which is just a statistical estimator) is not copyrightable, so cannot be copyright infringement. This is already standard practice for everything else: You cannot copyright a mathematical function, regardless of how much data you use to fit to it (that is sensible: CERN has fit physics models to petabytes worth of data, that doesn’t mean they hold a copyright on laws of nature, they just hold the copyright on the data itself). However, if you generate something that is copyrighted, that item is still copyrighted: It doesn’t matter whether you used an AI image generator, photoshop, or a tattoo gun.

ZickZack@fedia.io · edit-2 6 months ago

First, I don’t think that’s the right comparison. You need to compare them to taxis.

It’s not just that, you generally have a significant distribution shift when comparing the self-drivers/driving assistants to normal humans. This is because people only use self-driving in situations where it has a chance of working, which is especially true with stuff like tesla’s self-driving where ultimately people are not even going to start the autopilot when it gets tricky (nevermind intervening dynamically: they won’t start it in the first place!)

For instance, one of the most common confounding factors is the ratio of highway driving vs non-highway driving: Highways are inherently less accident prone since you don’t have to deal with intersections, oncoming traffic, people merging in from every random house, or children chasing a ball into the street. Self-drivers tend to report a lot more highway traffic than ordinary drivers, due to how the availability of technology dictates where you end up measuring. You can correct for that by e.g. explicitly computing the likelihood p(accident|highway) and use a common p(highway) derived from the entire population of car traffic.

ZickZack@fedia.io · 6 months ago

Not necessarily: there have been recent works that indicate that filtering effects of fine tuned LLMs greatly improves the data efficiency (e.g phi-1). Further, if you have e.g. human selection on top of LLM generated content you can get great results as the LLM generation can be used as a soft curriculum, with the human selection biasing towards higher quality.

ZickZack@fedia.io · 7 months ago

Honestly, I recommend everyone without existing Linux experience to use Fedora: it’s reasonable modern (nice for, e.g. gaming), while also not being a full rolling release model like Arch (which needs expertise to fix in case something breaks). It’s also reasonably popular, meaning you will find enough guidance in case something does break.

ZickZack@fedia.io · 7 months ago

Basically the stuff they need to detect whether ads are actually shown needs information of the device state that are generally not available according to Article 5(3) ePR.

ZickZack@fedia.io · 7 months ago

The problem is that the model is actually doing exactly what it’s supposed to, it’s just not what openai wants it to do. The reason the prompt extraction method works is because the underlying statistical model gets shifted far outside the domain of “real” language. In that case the correct maximizing posterior becomes a sample from the prior (here that would be a sample from the dataset, this is combined with things like repetition penalties).

This is the correct way a statistical estimator is supposed to work, but not the way you want it to work. That’s also why they can’t really fix this: there’s nothing broken to begin with (and “unbreaking” it would almost surely blow something take up)

ZickZack@fedia.io · 7 months ago

Could be none of them and the complaint comes from of the academy teams

ZickZack@fedia.io · 7 months ago

You cannot run Signal without “Signal - the company” existing. All of their systems are designed to be attached to one specific backend, namely the signal-run backend, meaning without re-engineering the existing infrastructure you cannot simply swap over.

As @kpw already mentioned, “Signal - the company” dying would involve a functional reset of everything: No contacts, no servers, no infrastructure. COULD you fork the thing and build you own system? Sure, but it would be functionally unusable since no one else would be using it, since everything relies on specifically the signal servers to function. A post-signal system could re-use some of their code (if it runs outside signal corp - “works on my machine” could be present in this project as well), but would need to rebuild the actual network.

This is in contrast to something like the matrix protocol: If a specific matrix instance goes kaput, you still have the overall network working. This means that even if an instance implodes, you would have an easy migration path since the matrix network itself persists.

ZickZack@fedia.io · 7 months ago

Why?
It’s part of the track that everybody has to deal with. If the car/driver is incapable of working around the real-world constraints of the track, they crash or need to drive slower: The track not being smooth as glass is part of driving the car.

ZickZack@fedia.io · 7 months ago

Essentially the same argument: Due to the fact the HBO show was syndicated throughout the united states, he can file in the federal courts in e.g. Texas (usually the argument is something like “They damaged business relations/contracts in XYZ state, therefore we file in XYZ state”).

ZickZack@fedia.io · 7 months ago

I answered a little more in detail in a different comment (https://fedia.io /m/technology@lemmy.world/t/411563/-/comment/2556033) but to address the last point: They did file in federal court (specifically the federal district court in north texas).

ZickZack@fedia.io · edit-2 7 months ago

The issue with the internet is that it did take place in texas as well: The news article was available in texas, so the news corp can be sued there. Basically the argument is: “Media Matters harmed X’s brand in texas using misleading information” (you can read their arguments for filing in texas under the “Jurisdiction and Venue” section of their filing).

Also remember that this is currently X’s wish list: Media Matters can file for a change in venue.

Edit: Quick update.

Looking at their filing, the case will probably fail under a motion for summary judgment: They basically agree with Media Matters that they did show ads under extremist’s posts. They simply argue that you need to push the twitter algorithm to its limits by doomscrolling for a long time until the algorithm fails. However, this doesn’t make any of the facts provided by Media Matters (https://www.mediamatters.org/twitter/musk-endorses-antisemitic-conspiracy-theory-x-has-been-placing-ads-apple-bravo-ibm-oracle) wrong.

ZickZack@fedia.io · 7 months ago

Surely a company should be governed by the laws of the state in which they are based

This is not true and wouldn’t make why sense: let’s say you are a delivery company and one of your drivers runs over a dog in Texas. The lawsuit can be filed in Texas, regardless of whether your company is in Texas, California, or even outside the united states. The place you are incorporated in doesn’t change the damages or laws you violated when running over the dog. Of course you can also move the venue to the state the company is based in.

You cannot (generally) move it to another state, since that state doesn’t even have jurisdiction over any part of the incident.

The internet is just special in the sense that really something that happened on the internet happened everywhere on earth at the same time, meaning any venue is a place where potential damages were accrued.

ZickZack@fedia.io · 8 months ago

You are vastly overestimating the amount of storage you need since you are looking at some download which itself has to choose the encoding (which is independent of whatever youtube does: youtube absolutely crushes the quality).
Most estimates assume that youtube has 1 exabyte of storage, let’s say we buy this in bulk from retail (which we wouldn’t do: you wait as long as possible since storage prices are going down and retail stores would give you the finger if you ordered and exabyte worth).
Let’s take that number and run with it:
Buying retail, you can get Seagate Exos X20 20TB drives for 280€, 1 exabyte is 1Mio terabyte, meaning we have 1_000_000/20 * 280 = 14 Mio € (you’d need machines to put those into but you also wouldn’t buy the entire thing upfront, and using retail prices either).

Compute also isn’t that big of a deal if you do it correctly: the expensive part in video hosting is usually video encoding since to get small video sizes you need to spend compute beforehand to compress it.
However, you can shift this in significant parts to the user by implementing the transcoding in WASM and running this clientside (see e.g. https://www.w3.org/2021/03/media-production-workshop/talks/qiang-fu-video-transcoding.html) in that case users would compress locally in the browser before uploading (this presumably wouldn’t even take longer than normal uploads for most people since you trade off transcoding time against upload time).
There are still other compute expenses but those are much more limited.
These mechanisms don’t (at least to my knowledge) exist in peertube yet, but would be possible.

The actually expensive part is always the actual networking: Networking is one of the few things that actually get more expensive at scale due to the complexity explosion, rather than cheaper (e.g. having dedicated transcoding hardware drops in price per user since you have higher utilization).
Networking quickly runs into bottlenecks where you have to account for all the covariances between datasets in the network.
Basically to increase the amount of e.g. storage available everything in the network needs to be increased (from the local machines connections, over the cables and switches up to routers and outgoing connections) due to you increasing the density at one point, you have to increase the network everywhere.
That’s why networking dwarfs everything: you just get crushed by networking being the bottleneck between your increasingly dense devices.

The clue behind peertube is that this is not as extreme of an effect due to

federation (certain connections just aren’t dense due to the overall network topology being distributed)
torrents

The latter is the important part: instead of having network cost rising (super) linearly to the amount of users you have it rise linearly to the amount of simultaneous unique videos.
This is a much smaller number which means you do not need to compete in that space, which is the dominant cost factor. (if you have a method where one user can retain the video and share it without actively watching that same video, you can probably get real-world sublinear scaling)

Mind you, the costs involved here are still large, but not insurmountably large, especially considering there is not one unique organisation that would have to pay for the entire thing and its not an upfront expense. Fundamentally though the system is built such that it won’t be crushed as users flood into the network.

ZickZack@fedia.io · 8 months ago

There are certain things you are allowed to use cookies for even without asking for permission (i.e. they wouldn’t even need to tell you about them). These are effectively the kinds of things that are necessary for your website to work in the first place: For instance if you have a dark and a light mode and you want people to change this even without logging in, another example is language settings (this is why sites like e.g. duckduckgo can have a “settings” tab despite the fact you are not logged into anything).

The rule-of-thumb is that everything that is directly related to the functionality of your website is fair even without asking (they are “essential”).
Of course the specifics are a little more tricky: For instance you could have a shop in which you can put things into your “shopping basket” without being logged in. This is fine since it’s core functionality. However, if you use that same cookie to also inform your recommendation algorithm, you could get into trouble. Another aspect is 3rd party cookies: These, while not theoretically always requiring permissions, in practice do need expressed permission since you, as the website host, cannot guarantee what happens with these cookies (and 3rd party cookies are, in general, an easy way to track users, which isn’t core functionality for most websites).