ValueError: assignment destination is read-only [Solved]

avatar

Last updated: Apr 11, 2024 Reading time · 2 min

banner

# ValueError: assignment destination is read-only [Solved]

The NumPy "ValueError: assignment destination is read-only" occurs when you try to assign a value to a read-only array.

To solve the error, create a copy of the read-only array and modify the copy.

You can use the flags attribute to check if the array is WRITABLE .

Running the code sample produces the following output:

check if writeable

In your case, WRITEABLE will likely be set to False .

In older NumPy versions, you used to be able to set the flag to true by calling the setflags() method.

However, setting the WRITEABLE flag to True ( 1 ) will likely fail if the OWNDATA flag is set to False .

You will likely get the following error:

  • "ValueError: cannot set WRITEABLE flag to True of this array"

To solve the error, create a copy of the array when converting it from a Pillow Image to a NumPy array.

Passing the Pillow Image to the numpy.array() method creates a copy of the array.

You can also explicitly call the copy() method.

check if writable is set to true

You can also call the copy() method on the Pillow Image and modify the copy.

The image copy isn't read-only and allows assignment.

You can change the img_copy variable without getting the "assignment destination is read-only" error.

You can also use the numpy.copy() method.

The numpy.copy() method returns an array copy of the given object.

The only argument we passed to the method is the image.

You can safely modify the img_copy variable without running into issues.

You most likely don't want to make changes to the original image.

Creating a copy and modifying the copy should be your preferred approach.

If you got the error when using the np.asarray() method, try changing it to np.array() .

Change the following:

To the following:

As long as the WRITEABLE flag is set to True , you will be able to modify the array.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

  • TypeError: Object of type ndarray is not JSON serializable
  • ValueError: numpy.ndarray size changed, may indicate binary incompatibility
  • NumPy RuntimeWarning: divide by zero encountered in log10
  • ValueError: x and y must have same first dimension, but have shapes

book cover

Borislav Hadzhiev

Web Developer

buy me a coffee

Copyright © 2024 Borislav Hadzhiev

Embarrassingly parallel for loops ¶

Common usage ¶.

Joblib provides a simple helper class to write parallel for loops using multiprocessing. The core idea is to write the code to be executed as a generator expression, and convert it to parallel computing:

can be spread over 2 CPUs using the following:

The output can be a generator that yields the results as soon as they’re available, even if the subsequent tasks aren’t completed yet. The order of the outputs always matches the order the inputs have been submitted with:

This generator enables reducing the memory footprint of joblib.Parallel calls in case the results can benefit from on-the-fly aggregation, as illustrated in Returning a generator in joblib.Parallel .

Future releases are planned to also support returning a generator that yields the results in the order of completion rather than the order of submission, by using return_as="generator_unordered" instead of return_as="generator" . In this case the order of the outputs will depend on the concurrency of workers and will not be guaranteed to be deterministic, meaning the results can be yielded with a different order every time the code is executed.

Thread-based parallelism vs process-based parallelism ¶

By default joblib.Parallel uses the 'loky' backend module to start separate Python worker processes to execute tasks concurrently on separate CPUs. This is a reasonable default for generic Python programs but can induce a significant overhead as the input and output data need to be serialized in a queue for communication with the worker processes (see Serialization & Processes ).

When you know that the function you are calling is based on a compiled extension that releases the Python Global Interpreter Lock (GIL) during most of its computation then it is more efficient to use threads instead of Python processes as concurrent workers. For instance this is the case if you write the CPU intensive part of your code inside a with nogil block of a Cython function.

To hint that your code can efficiently use threads, just pass prefer="threads" as parameter of the joblib.Parallel constructor. In this case joblib will automatically use the "threading" backend instead of the default "loky" backend:

The parallel_config() context manager helps selecting a specific backend implementation or setting the default number of jobs:

The latter is especially useful when calling a library that uses joblib.Parallel internally without exposing backend selection as part of its public API.

Note that the prefer="threads" option was introduced in joblib 0.12. In prior versions, the same effect could be achieved by hardcoding a specific backend implementation such as backend="threading" in the call to joblib.Parallel but this is now considered a bad pattern (when done in a library) as it does not make it possible to override that choice with the parallel_config() context manager.

The loky backend may not always be available

Some rare systems do not support multiprocessing (for instance Pyodide). In this case the loky backend is not available and the default backend falls back to threading.

In addition to the builtin joblib backends, there are several cluster-specific backends you can use:

Dask backend for Dask clusters (see Using Dask for single-machine parallel computing for an example),

Ray backend for Ray clusters,

Joblib Apache Spark Backend to distribute joblib tasks on a Spark cluster.

Serialization & Processes ¶

To share function definition across multiple python processes, it is necessary to rely on a serialization protocol. The standard protocol in python is pickle but its default implementation in the standard library has several limitations. For instance, it cannot serialize functions which are defined interactively or in the __main__ module.

To avoid this limitation, the loky backend now relies on cloudpickle to serialize python objects. cloudpickle is an alternative implementation of the pickle protocol which allows the serialization of a greater number of objects, in particular interactively defined functions. So for most usages, the loky backend should work seamlessly.

The main drawback of cloudpickle is that it can be slower than the pickle module in the standard library. In particular, it is critical for large python dictionaries or lists, where the serialization time can be up to 100 times slower. There is two ways to alter the serialization process for the joblib to temper this issue:

If you are on an UNIX system, you can switch back to the old multiprocessing backend. With this backend, interactively defined functions can be shared with the worker processes using the fast pickle . The main issue with this solution is that using fork to start the process breaks the standard POSIX and can have weird interaction with third party libraries such as numpy and openblas .

If you wish to use the loky backend with a different serialization library, you can set the LOKY_PICKLER=mod_pickle environment variable to use the mod_pickle as the serialization library for loky . The module mod_pickle passed as an argument should be importable as import mod_pickle and should contain a Pickler object, which will be used to serialize to objects. It can be set to LOKY_PICKLER=pickle to use the pickling module from stdlib. The main drawback with LOKY_PICKLER=pickle is that interactively defined functions will not be serializable anymore. To cope with this, you can use this solution together with the joblib.wrap_non_picklable_objects() wrapper, which can be used as a decorator to locally enable using cloudpickle for specific objects. This way, you can have fast pickling of all python objects and locally enable slow pickling for interactive functions. An example is given in loky_wrapper .

Shared-memory semantics ¶

The default backend of joblib will run each function call in isolated Python processes, therefore they cannot mutate a common Python object defined in the main program.

However if the parallel function really needs to rely on the shared memory semantics of threads, it should be made explicit with require='sharedmem' , for instance:

Keep in mind that relying a on the shared-memory semantics is probably suboptimal from a performance point of view as concurrent access to a shared Python object will suffer from lock contention.

Reusing a pool of workers ¶

Some algorithms require to make several consecutive calls to a parallel function interleaved with processing of the intermediate results. Calling joblib.Parallel several times in a loop is sub-optimal because it will create and destroy a pool of workers (threads or processes) several times which can cause a significant overhead.

For this case it is more efficient to use the context manager API of the joblib.Parallel class to re-use the same pool of workers for several calls to the joblib.Parallel object:

Note that the 'loky' backend now used by default for process-based parallelism automatically tries to maintain and reuse a pool of workers by it-self even for calls without the context manager.

Working with numerical data in shared memory (memmapping) ¶

By default the workers of the pool are real Python processes forked using the multiprocessing module of the Python standard library when n_jobs != 1 . The arguments passed as input to the Parallel call are serialized and reallocated in the memory of each worker process.

This can be problematic for large arguments as they will be reallocated n_jobs times by the workers.

As this problem can often occur in scientific computing with numpy based datastructures, joblib.Parallel provides a special handling for large arrays to automatically dump them on the filesystem and pass a reference to the worker to open them as memory map on that file using the numpy.memmap subclass of numpy.ndarray . This makes it possible to share a segment of data between all the worker processes.

The following only applies with the "loky"` and ``'multiprocessing' process-backends. If your code can release the GIL, then using a thread-based backend by passing prefer='threads' is even more efficient because it makes it possible to avoid the communication overhead of process-based parallelism.

Scientific Python libraries such as numpy, scipy, pandas and scikit-learn often release the GIL in performance critical code paths. It is therefore advised to always measure the speed of thread-based parallelism and use it when the scalability is not limited by the GIL.

Automated array to memmap conversion ¶

The automated array to memmap conversion is triggered by a configurable threshold on the size of the array:

By default the data is dumped to the /dev/shm shared-memory partition if it exists and is writable (typically the case under Linux). Otherwise the operating system’s temporary folder is used. The location of the temporary data files can be customized by passing a temp_folder argument to the Parallel constructor.

Passing max_nbytes=None makes it possible to disable the automated array to memmap conversion.

Manual management of memmapped input data ¶

For even finer tuning of the memory usage it is also possible to dump the array as a memmap directly from the parent process to free the memory before forking the worker processes. For instance let’s allocate a large array in the memory of the parent process:

Dump it to a local file for memmapping:

The large_memmap variable is pointing to a numpy.memmap instance:

The original array can be freed from the main process memory:

It is possible to slice large_memmap into a smaller memmap:

Finally a np.ndarray view backed on that same memory mapped file can be used:

All those three datastructures point to the same memory buffer and this same buffer will also be reused directly by the worker processes of a Parallel call:

Note that here max_nbytes=None is used to disable the auto-dumping feature of Parallel . small_array is still in shared memory in the worker processes because it was already backed by shared memory in the parent process. The pickling machinery of Parallel multiprocessing queues are able to detect this situation and optimize it on the fly to limit the number of memory copies.

Writing parallel computation results in shared memory ¶

If data are opened using the w+ or r+ mode in the main program, the worker will get r+ mode access. Thus the worker will be able to write its results directly to the original data, alleviating the need of the serialization to send back the results to the parent process.

Here is an example script on parallel processing with preallocated numpy.memmap datastructures NumPy memmap in joblib.Parallel .

Having concurrent workers write on overlapping shared memory data segments, for instance by using inplace operators and assignments on a numpy.memmap instance, can lead to data corruption as numpy does not offer atomic operations. The previous example does not risk that issue as each task is updating an exclusive segment of the shared result array.

Some C/C++ compilers offer lock-free atomic primitives such as add-and-fetch or compare-and-swap that could be exposed to Python via CFFI for instance. However providing numpy-aware atomic constructs is outside of the scope of the joblib project.

A final note: don’t forget to clean up any temporary folder when you are done with the computation:

Avoiding over-subscription of CPU resources ¶

The computation parallelism relies on the usage of multiple CPUs to perform the operation simultaneously. When using more processes than the number of CPU on a machine, the performance of each process is degraded as there is less computational power available for each process. Moreover, when many processes are running, the time taken by the OS scheduler to switch between them can further hinder the performance of the computation. It is generally better to avoid using significantly more processes or threads than the number of CPUs on a machine.

Some third-party libraries – e.g. the BLAS runtime used by numpy – internally manage a thread-pool to perform their computations. The default behavior is generally to use a number of threads equals to the number of CPUs available. When these libraries are used with joblib.Parallel , each worker will spawn its own thread-pools, resulting in a massive over-subscription of resources that can slow down the computation compared to a sequential one. To cope with this problem, joblib tells supported third-party libraries to use a limited number of threads in workers managed by the 'loky' backend: by default each worker process will have environment variables set to allow a maximum of cpu_count() // n_jobs so that the total number of threads used by all the workers does not exceed the number of CPUs of the host.

This behavior can be overridden by setting the proper environment variables to the desired number of threads. This override is supported for the following libraries:

OpenMP with the environment variable 'OMP_NUM_THREADS' , OpenBLAS with the 'OPENBLAS_NUM_THREADS' , MKL with the environment variable 'MKL_NUM_THREADS' , Accelerated with the environment variable 'VECLIB_MAXIMUM_THREADS' , Numexpr with the environment variable 'NUMEXPR_NUM_THREADS' .

Since joblib 0.14, it is also possible to programmatically override the default number of threads using the inner_max_num_threads argument of the parallel_config() function as follows:

In this example, 4 Python worker processes will be allowed to use 2 threads each, meaning that this program will be able to use up to 8 CPUs concurrently.

Custom backend API ¶

New in version 0.10.

User can provide their own implementation of a parallel processing backend in addition to the 'loky' , 'threading' , 'multiprocessing' backends provided by default. A backend is registered with the joblib.register_parallel_backend() function by passing a name and a backend factory.

The backend factory can be any callable that returns an instance of ParallelBackendBase . Please refer to the default backends source code as a reference if you want to implement your own custom backend.

Note that it is possible to register a backend class that has some mandatory constructor parameters such as the network address and connection credentials for a remote cluster computing service:

The connection parameters can then be passed to the parallel_config() context manager:

Using the context manager can be helpful when using a third-party library that uses joblib.Parallel internally while not exposing the backend argument in its own API.

A problem exists that external packages that register new parallel backends must now be imported explicitly for their backends to be identified by joblib:

This can be confusing for users. To resolve this, external packages can safely register their backends directly within the joblib codebase by creating a small function that registers their backend, and including this function within the joblib.parallel.EXTERNAL_PACKAGES dictionary:

This is subject to community review, but can reduce the confusion for users when relying on side effects of external package imports.

Old multiprocessing backend ¶

Prior to version 0.12, joblib used the 'multiprocessing' backend as default backend instead of 'loky' .

This backend creates an instance of multiprocessing.Pool that forks the Python interpreter in multiple processes to execute each of the items of the list. The delayed function is a simple trick to be able to create a tuple (function, args, kwargs) with a function-call syntax.

Under Windows, the use of multiprocessing.Pool requires to protect the main loop of code to avoid recursive spawning of subprocesses when using joblib.Parallel . In other words, you should be writing code like this when using the 'multiprocessing' backend:

No code should run outside of the "if __name__ == '__main__'" blocks, only imports and definitions.

The 'loky' backend used by default in joblib 0.12 and later does not impose this anymore.

Bad interaction of multiprocessing and third-party libraries ¶

Using the 'multiprocessing' backend can cause a crash when using third party libraries that manage their own native thread-pool if the library is first used in the main process and subsequently called again in a worker process (inside the joblib.Parallel call).

Joblib version 0.12 and later are no longer subject to this problem thanks to the use of loky as the new default backend for process-based parallelism.

Prior to Python 3.4 the 'multiprocessing' backend of joblib can only use the fork strategy to create worker processes under non-Windows systems. This can cause some third-party libraries to crash or freeze. Such libraries include Apple vecLib / Accelerate (used by NumPy under OSX), some old version of OpenBLAS (prior to 0.2.10) or the OpenMP runtime implementation from GCC which is used internally by third-party libraries such as XGBoost, spaCy, OpenCV…

The best way to avoid this problem is to use the 'loky' backend instead of the multiprocessing backend. Prior to joblib 0.12, it is also possible to get joblib.Parallel configured to use the 'forkserver' start method on Python 3.4 and later. The start method has to be configured by setting the JOBLIB_START_METHOD environment variable to 'forkserver' instead of the default 'fork' start method. However the user should be aware that using the 'forkserver' method prevents joblib.Parallel to call function interactively defined in a shell session.

You can read more on this topic in the multiprocessing documentation .

Under Windows the fork system call does not exist at all so this problem does not exist (but multiprocessing has more overhead).

Parallel reference documentation ¶

Helper class for readable parallel mapping.

Read more in the User Guide .

The maximum number of concurrently running jobs, such as the number of Python worker processes when backend="loky" or the size of the thread-pool when backend="threading" . This argument is converted to an integer, rounded below for float. If -1 is given, joblib tries to use all CPUs. The number of CPUs n_cpus is obtained with cpu_count() . For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. For instance, using n_jobs=-2 will result in all CPUs but one being used. This argument can also go above n_cpus , which will cause oversubscription. In some cases, slight oversubscription can be beneficial, e.g., for tasks with large I/O operations. If 1 is given, no parallel computing code is used at all, and the behavior amounts to a simple python for loop. This mode is not compatible with timeout . None is a marker for ‘unset’ that will be interpreted as n_jobs=1 unless the call is performed under a parallel_config() context manager that sets another value for n_jobs . If n_jobs = 0 then a ValueError is raised.

Specify the parallelization backend implementation. Supported backends are:

“loky” used by default, can induce some communication and memory overhead when exchanging input and output data with the worker Python processes. On some rare systems (such as Pyiodide), the loky backend may not be available.

“multiprocessing” previous process-based backend based on multiprocessing.Pool . Less robust than loky .

“threading” is a very low-overhead backend but it suffers from the Python Global Interpreter Lock if the called function relies a lot on Python objects. “threading” is mostly useful when the execution bottleneck is a compiled extension that explicitly releases the GIL (for instance a Cython loop wrapped in a “with nogil” block or an expensive call to a library such as NumPy).

finally, you can register backends by calling register_parallel_backend() . This will allow you to implement a backend of your liking.

It is not recommended to hard-code the backend name in a call to Parallel in a library. Instead it is recommended to set soft hints (prefer) or hard constraints (require) so as to make it possible for library users to change the backend from the outside using the parallel_config() context manager.

If ‘list’, calls to this instance will return a list, only when all results have been processed and retrieved. If ‘generator’, it will return a generator that yields the results as soon as they are available, in the order the tasks have been submitted with. If ‘generator_unordered’, the generator will immediately yield available results independently of the submission order. The output order is not deterministic in this case because it depends on the concurrency of the workers.

Soft hint to choose the default backend if no specific backend was selected with the parallel_config() context manager. The default process-based backend is ‘loky’ and the default thread-based backend is ‘threading’. Ignored if the backend parameter is specified.

Hard constraint to select the backend. If set to ‘sharedmem’, the selected backend will be single-host and thread-based even if the user asked for a non-thread based backend with parallel_config() .

The verbosity level: if non zero, progress messages are printed. Above 50, the output is sent to stdout. The frequency of the messages increases with the verbosity level. If it more than 10, all iterations are reported.

Timeout limit for each task to complete. If any task takes longer a TimeOutError will be raised. Only applied when n_jobs != 1

The number of batches (of tasks) to be pre-dispatched. Default is ‘2*n_jobs’. When batch_size=”auto” this is reasonable default and the workers should never starve. Note that only basic arithmetics are allowed here and no modules can be used in this expression.

The number of atomic tasks to dispatch at once to each worker. When individual evaluations are very fast, dispatching calls to workers can be slower than sequential computation because of the overhead. Batching fast computations together can mitigate this. The 'auto' strategy keeps track of the time it takes for a batch to complete, and dynamically adjusts the batch size to keep the time on the order of half a second, using a heuristic. The initial batch size is 1. batch_size="auto" with backend="threading" will dispatch batches of a single task at a time as the threading backend has very little overhead and using larger batch size has not proved to bring any gain in that case.

Folder to be used by the pool for memmapping large arrays for sharing memory with worker processes. If None, this will try in order:

a folder pointed by the JOBLIB_TEMP_FOLDER environment variable,

/dev/shm if the folder exists and is writable: this is a RAM disk filesystem available by default on modern Linux distributions,

the default system temporary folder that can be overridden with TMP, TMPDIR or TEMP environment variables, typically /tmp under Unix operating systems.

Only active when backend="loky" or "multiprocessing" .

Threshold on the size of arrays passed to the workers that triggers automated memory mapping in temp_folder. Can be an int in Bytes, or a human-readable string, e.g., ‘1M’ for 1 megabyte. Use None to disable memmapping of large arrays. Only active when backend="loky" or "multiprocessing" .

Memmapping mode for numpy arrays passed to workers. None will disable memmapping, other modes defined in the numpy.memmap doc: https://numpy.org/doc/stable/reference/generated/numpy.memmap.html Also, see ‘max_nbytes’ parameter documentation for more details.

This object uses workers to compute in parallel the application of a function to many different arguments. The main functionality it brings in addition to using the raw multiprocessing or concurrent.futures API are (see examples for details):

More readable code, in particular since it avoids constructing list of arguments.

informative tracebacks even when the error happens on the client side

using ‘n_jobs=1’ enables to turn off parallel computing for debugging without changing the codepath

early capture of pickling errors

An optional progress meter.

Interruption of multiprocesses jobs with ‘Ctrl-C’

Flexible pickling control for the communication to and from the worker processes.

Ability to use shared memory efficiently with worker processes for large numpy-based datastructures.

Note that the intended usage is to run one call at a time. Multiple calls to the same Parallel object will result in a RuntimeError

A simple example:

Reshaping the output when the function has several return values:

The progress meter: the higher the value of verbose , the more messages:

Traceback example, note how the line of the error is indicated as well as the values of the parameter passed to the function that triggered the exception, even though the traceback happens in the child process:

Using pre_dispatch in a producer/consumer situation, where the data is generated on the fly. Note how the producer is first called 3 times before the parallel loop is initiated, and then called to generate new data on the fly:

Dispatch more data for parallel processing

This method is meant to be called concurrently by the multiprocessing callback. We rely on the thread-safety of dispatch_one_batch to protect against concurrent consumption of the unprotected iterator.

Prefetch the tasks for the next batch and dispatch them.

The effective size of the batch is computed here. If there are no more jobs to dispatch, return False, else return True.

The iterator consumption and dispatching is protected by the same lock so calling this function should be thread safe.

Return the formatted representation of the object.

Display the process of the parallel execution only a fraction of time, controlled by self.verbose.

Decorator used to capture the arguments of a function.

Set the default backend or configuration for Parallel .

This is an alternative to directly passing keyword arguments to the Parallel class constructor. It is particularly useful when calling into library code that uses joblib internally but does not expose the various parallel configuration arguments in its own API.

If backend is a string it must match a previously registered implementation using the register_parallel_backend() function.

By default the following backends are available:

‘loky’: single-host, process-based parallelism (used by default),

‘threading’: single-host, thread-based parallelism,

‘multiprocessing’: legacy single-host, process-based parallelism.

‘loky’ is recommended to run functions that manipulate Python objects. ‘threading’ is a low-overhead alternative that is most efficient for functions that release the Global Interpreter Lock: e.g. I/O-bound code or CPU-bound code in a few calls to native code that explicitly releases the GIL. Note that on some rare systems (such as pyodide), multiprocessing and loky may not be available, in which case joblib defaults to threading.

In addition, if the dask and distributed Python packages are installed, it is possible to use the ‘dask’ backend for better scheduling of nested parallel calls without over-subscription and potentially distribute parallel calls over a networked cluster of several hosts.

It is also possible to use the distributed ‘ray’ backend for distributing the workload to a cluster of nodes. See more details in the Examples section below.

Alternatively the backend can be passed directly as an instance.

the default system temporary folder that can be overridden with TMP , TMPDIR or TEMP environment variables, typically /tmp under Unix operating systems.

Threshold on the size of arrays passed to the workers that triggers automated memory mapping in temp_folder. Can be an int in Bytes, or a human-readable string, e.g., ‘1M’ for 1 megabyte. Use None to disable memmapping of large arrays.

Soft hint to choose the default backend. The default process-based backend is ‘loky’ and the default thread-based backend is ‘threading’. Ignored if the backend parameter is specified.

Hard constraint to select the backend. If set to ‘sharedmem’, the selected backend will be single-host and thread-based.

If not None, overwrites the limit set on the number of threads usable in some third-party library threadpools like OpenBLAS, MKL or OpenMP. This is only used with the loky backend.

Additional parameters to pass to the backend constructor when backend is a string.

Joblib tries to limit the oversubscription by limiting the number of threads usable in some third-party library threadpools like OpenBLAS, MKL or OpenMP. The default limit in each worker is set to max(cpu_count() // effective_n_jobs, 1) but this limit can be overwritten with the inner_max_num_threads argument which will be used to set this limit in the child processes.

New in version 1.3.

To use the ‘ray’ joblib backend add the following lines:

Wrapper for non-picklable object to use cloudpickle to serialize them.

Note that this wrapper tends to slow down the serialization process as it is done with cloudpickle which is typically slower compared to pickle. The proper way to solve serialization issues is to avoid defining functions and objects in the main scripts and to implement __reduce__ functions for complex classes.

Register a new Parallel backend factory.

The new backend can then be selected by passing its name as the backend argument to the Parallel class. Moreover, the default backend can be overwritten globally by setting make_default=True.

The factory can be any callable that takes no argument and return an instance of ParallelBackendBase .

Warning: this function is experimental and subject to change in a future version of joblib.

Helper abc which defines all methods a ParallelBackend must implement

A helper class for automagically batching jobs.

User manual

  • Why joblib: project goals
  • Installing joblib
  • On demand recomputing: the Memory class
  • Common usage
  • Thread-based parallelism vs process-based parallelism
  • Serialization & Processes
  • Shared-memory semantics
  • Reusing a pool of workers
  • Working with numerical data in shared memory (memmapping)
  • Avoiding over-subscription of CPU resources
  • Custom backend API
  • Old multiprocessing backend
  • Bad interaction of multiprocessing and third-party libraries
  • Parallel reference documentation
  • Persistence
  • Development

Module reference

  • joblib.Memory
  • joblib.Parallel
  • joblib.parallel_config
  • joblib .dump
  • joblib .load
  • joblib .hash
  • joblib .register_compressor

Deprecated functionalities

  • joblib.parallel_backend

System error: assignment destination is read-only

  • High: It blocks me to complete my task.

I would like to ask about an issue that I encountered when I try to distribute my work on multiple cpu nodes using ray.

My input file is a simulation file consisting of multiple time frames, so I would like to distribute the calculation of one frame to one task. It works fine when I just used pool from the multiprocessing python library, where only one node (128 tasks in total) can be used. Since I have more than 2,000 time frames, I would like to use multiple nodes in this calculation, and the multiprocessing python library isn’t the best choice.

I created my code using this template: ray/simple-trainer.py at master · ray-project/ray · GitHub . Here’s a brief summary of my code:

import socket import sys import time import ray

@ray.remote def hydration_water_calculation2(t, u): # in one frame return xyz

ray.init(address=os.environ[“ip_head”])

print(“Nodes in the Ray cluster:”) print(ray.nodes())

for i in frame_values: ip_addresses = ray.get([hydration_water_calculation2.remote(i, u0) for _ in range(1)]) print(Counter(ip_addresses))

But I got the following error: Traceback (most recent call last): File “…/…/hydration_whole_global2_ray.py”, line 269, in ip_addresses = ray.get([hydration_water_calculation2.remote(i, u0) for _ in range(1)]) File “/jet/home/chy20004/.conda/envs/ray/lib/python3.8/site-packages/ray/_private/client_mode_hook.py”, line 105, in wrapper return func(*args, **kwargs) File “/jet/home/chy20004/.conda/envs/ray/lib/python3.8/site-packages/ray/worker.py”, line 1809, in get raise value.as_instanceof_cause() ray.exceptions.RayTaskError: ray::hydration_water_calculation2() (pid=27283, ip=10.8.9.236) At least one of the input arguments for this task could not be computed: ray.exceptions.RaySystemError: System error: assignment destination is read-only traceback: Traceback (most recent call last): File “/jet/home/chy20004/.conda/envs/ray/lib/python3.8/site-packages/ray/serialization.py”, line 332, in deserialize_objects obj = self._deserialize_object(data, metadata, object_ref) File “/jet/home/chy20004/.conda/envs/ray/lib/python3.8/site-packages/ray/serialization.py”, line 235, in _deserialize_object return self._deserialize_msgpack_data(data, metadata_fields) File “/jet/home/chy20004/.conda/envs/ray/lib/python3.8/site-packages/ray/serialization.py”, line 190, in _deserialize_msgpack_data python_objects = self._deserialize_pickle5_data(pickle5_data) File “/jet/home/chy20004/.conda/envs/ray/lib/python3.8/site-packages/ray/serialization.py”, line 178, in _deserialize_pickle5_data obj = pickle.loads(in_band, buffers=buffers) File “/jet/home/chy20004/.conda/envs/ray/lib/python3.8/site-packages/MDAnalysis/coordinates/base.py”, line 2106, in setstate self[self.ts.frame] File “/jet/home/chy20004/.conda/envs/ray/lib/python3.8/site-packages/MDAnalysis/coordinates/base.py”, line 1610, in getitem return self._read_frame_with_aux(frame) File “/jet/home/chy20004/.conda/envs/ray/lib/python3.8/site-packages/MDAnalysis/coordinates/base.py”, line 1642, in _read_frame_with_aux ts = self._read_frame(frame) # pylint: disable=assignment-from-no-return File “/jet/home/chy20004/.conda/envs/ray/lib/python3.8/site-packages/MDAnalysis/coordinates/XDR.py”, line 255, in _read_frame timestep = self._read_next_timestep() File “/jet/home/chy20004/.conda/envs/ray/lib/python3.8/site-packages/MDAnalysis/coordinates/XDR.py”, line 273, in _read_next_timestep self._frame_to_ts(frame, ts) File “/jet/home/chy20004/.conda/envs/ray/lib/python3.8/site-packages/MDAnalysis/coordinates/XTC.py”, line 144, in _frame_to_ts ts.dimensions = triclinic_box(*frame.box) File “/jet/home/chy20004/.conda/envs/ray/lib/python3.8/site-packages/MDAnalysis/coordinates/base.py”, line 810, in dimensions self._unitcell[:] = box ValueError: assignment destination is read-only (hydration_water_calculation2 pid=27283) 2022-05-01 22:53:55,714 ERROR serialization.py:334 – assignment destination is read-only (hydration_water_calculation2 pid=27283) Traceback (most recent call last): (hydration_water_calculation2 pid=27283) File “/jet/home/chy20004/.conda/envs/ray/lib/python3.8/site-packages/ray/serialization.py”, line 332, in deserialize_objects (hydration_water_calculation2 pid=27283) obj = self._deserialize_object(data, metadata, object_ref) (hydration_water_calculation2 pid=27283) File “/jet/home/chy20004/.conda/envs/ray/lib/python3.8/site-packages/ray/serialization.py”, line 235, in _deserialize_object (hydration_water_calculation2 pid=27283) return self._deserialize_msgpack_data(data, metadata_fields) (hydration_water_calculation2 pid=27283) File “/jet/home/chy20004/.conda/envs/ray/lib/python3.8/site-packages/ray/serialization.py”, line 190, in _deserialize_msgpack_data (hydration_water_calculation2 pid=27283) python_objects = self._deserialize_pickle5_data(pickle5_data) (hydration_water_calculation2 pid=27283) File “/jet/home/chy20004/.conda/envs/ray/lib/python3.8/site-packages/ray/serialization.py”, line 178, in _deserialize_pickle5_data (hydration_water_calculation2 pid=27283) obj = pickle.loads(in_band, buffers=buffers) (hydration_water_calculation2 pid=27283) File “/jet/home/chy20004/.conda/envs/ray/lib/python3.8/site-packages/MDAnalysis/coordinates/base.py”, line 2106, in setstate (hydration_water_calculation2 pid=27283) self[self.ts.frame] (hydration_water_calculation2 pid=27283) File “/jet/home/chy20004/.conda/envs/ray/lib/python3.8/site-packages/MDAnalysis/coordinates/base.py”, line 1610, in getitem (hydration_water_calculation2 pid=27283) return self._read_frame_with_aux(frame) (hydration_water_calculation2 pid=27283) File “/jet/home/chy20004/.conda/envs/ray/lib/python3.8/site-packages/MDAnalysis/coordinates/base.py”, line 1642, in _read_frame_with_aux (hydration_water_calculation2 pid=27283) ts = self._read_frame(frame) # pylint: disable=assignment-from-no-return (hydration_water_calculation2 pid=27283) File “/jet/home/chy20004/.conda/envs/ray/lib/python3.8/site-packages/MDAnalysis/coordinates/XDR.py”, line 255, in _read_frame (hydration_water_calculation2 pid=27283) timestep = self._read_next_timestep() (hydration_water_calculation2 pid=27283) File “/jet/home/chy20004/.conda/envs/ray/lib/python3.8/site-packages/MDAnalysis/coordinates/XDR.py”, line 273, in _read_next_timestep (hydration_water_calculation2 pid=27283) self._frame_to_ts(frame, ts) (hydration_water_calculation2 pid=27283) File “/jet/home/chy20004/.conda/envs/ray/lib/python3.8/site-packages/MDAnalysis/coordinates/XTC.py”, line 144, in _frame_to_ts (hydration_water_calculation2 pid=27283) ts.dimensions = triclinic_box(*frame.box) (hydration_water_calculation2 pid=27283) File “/jet/home/chy20004/.conda/envs/ray/lib/python3.8/site-packages/MDAnalysis/coordinates/base.py”, line 810, in dimensions (hydration_water_calculation2 pid=27283) self._unitcell[:] = box (hydration_water_calculation2 pid=27283) ValueError: assignment destination is read-only

Could anyone help me diagnose the issue? I’m new to ray and still learning why I was getting the “assignment destination is read-only” error. Many thanks in advance!

Hey @Chengeng-Yang , the read-only errors are happening because Ray stores arguments in the shared memory object store. This allows arguments to be shared with process very efficiently with zero memory copies, but has a side-effect of rendering numpy arrays immutable.

In this case, it seems that during setstate for your program an assignment is made that will update an existing array. Is it possible to modify the code around there to make a copy of the array prior to calling self._unitcell[:] = box ? I.e., self._unitcell = self._unitcell.copy(); self._unitcell[:] = box . That should fix the deserialization problem.

Reference stack line: File “/jet/home/chy20004/.conda/envs/ray/lib/python3.8/site-packages/MDAnalysis/coordinates/base.py”, line 2106, in setstate

Hi @ericl , many thanks for your help! Your suggestion DID work. The deserialization issue has been fixed after I made a copy of self._unitcell.

Thanks again and have a wonderful day :))

Related Topics

Topic Replies Views Activity
0 344 August 14, 2023
Ray Core 4 687 July 8, 2021
Ray Core 5 313 August 25, 2022
Ray Core 1 601 March 10, 2021
Ray Core 2 785 November 30, 2022

Valueerror assignment destination is read-only

One of the errors that developers often come across is the ValueError: assignment destination is read-only .

This error typically occurs when you try to modify a read-only object or variable.

What does the ValueError: assignment destination is read-only error mean?

How the error occurs.

Example 1: Modifying an Immutable Tuple

However, in Python, constants are typically immutable and cannot be modified once defined.

Therefore, this code will likely raise a valueerror .

Example 4: Altering an Immutable Data Structure

The code then changes the value associated with the ‘key’ to ‘ new_value ‘ by assigning it directly using indexing.

Solutions for ValueError: assignment destination is read-only

Solution 1: use mutable data structures.

Instead of using immutable data structures like tuples or strings, switch to mutable ones such as lists or dictionaries.

For example:

Solution 2: Reassign Variables

Solution 3: check documentation and restrictions.

It’s important to consult the documentation or source code to understand any restrictions required.

Solution 4: Use a Copy or Clone

If the object you’re working with is meant to be read-only, consider creating a copy or clone of it.

Solution 5: Identify Context-Specific Solutions

Understanding the possible cause will aid in finding a proper resolution.

Solution 6: Seek Help from the Community

Online forums, developer communities, and platforms like Stack Overflow can provide valuable insights and guidance from experienced programmers.

Frequently Asked Questions

To resolve this valueerror, you can apply different solutions such as using mutable data structures, reassigning variables, checking documentation and restrictions, using copies or clones, identifying context-specific solutions, or seeking help from the programming community.

Yes, there are similar errors that you may encounter in different programming languages. For example, in JavaScript, you might encounter the error TypeError: Assignment to a constant variable when trying to modify a constant.

By following the solutions provided in this article, such as using mutable data structures, reassigning variables, checking restrictions, making copies or clones, considering context-specific solutions, and seeking community help, you can effectively resolve this error.

Additional Resources

Leave a comment cancel reply.

Determine how a param is being set as readonly

I have a class similar to the example below

I’m using this basic example to test the usage pattern where I store a numpy.ndarray once but keep both a numpy recarray and pandas Dataframe view of the same data to allow for different access patterns a viewing with panel without duplicating the large array in memory.

this example works as you can see from the example below

However I have this same basic pattern in a much larger class but when I try to assign a value to the main numpy array I get a ValueError: assignment destination is read-only . I have not declared any of the paramerters in param.Parameterized class as readonly or constant but continue to get this simple ValueError.

Is there a way to track down were the readonly designation is being applied via some other more detailed stack trace? I’m beyond frustrated with trying ot figure out why the main array is being assigned as readonly when I have made no such designation in my code.

I have found the source of my error. It appears this is an issue associated with reading numpy arrays from a buffer not with the parameters themselves.

Hopefully this helps someone avoid a couple of days of frustrating searching in the future.

If you read a numpy array in from a buffer and would like to manipulate the values across views add a .copy() to the end of the initial read so that the array is not set as read-only.

Just out of curiosity, did the error message ( ValueError: assignment ... ) have anything to do with Param itself?

No. I have been really impressed with param so I’m rewriting my old code into param.Parameterized classes to better document the purpose of the code and to take advantage of the visualization capabilities of panel. One of the features I was using in Jupyter as I get better at parm and panel was the with param.exceptions_summarized(): context manager.

I mistakenly thought this context manager would only summarize the param related exceptions since the code was previously working as non-param classes. The ValueError was the only thing being dumped out so I assumed it was a param related error. Once I removed the context manager and explored the full trace I found the numpy related issue. This more a gap in my understanding than a real issue with any one library.

Thanks for all the help. My 2022 resolution is to get all my code into param and panel related classes so I’m sure I will be pestering everyone with issues that may end up being more me than the libraries themselves. Merry Christmas!

:wink:

I believe exceptions_summarized was added to param purely to write the documentation, it’s useful to only print the error message of an instead long and boring exception traceback, exception that would stop the notebook execution. But in your code you should definitely not use it since it will skip exceptions. This context manager would be better off in a separate module dedicated to the docs.

Scikit-learn: ValueError: assignment destination is read-only, when paralleling with n_jobs > 1

When I run SparseCoder with n_jobs > 1, there is a chance to raise exception ValueError: assignment destination is read-only . The code is shown as follow:

The bigger data_dims is, the higher chance get. When data_dims is small (lower than 2000, I verified), everything works fine. Once data_dims is bigger than 2000, there is a chance to get the exception. When data_dims is bigger than 5000, it is 100% raised.

My version infor:

OS: OS X 10.11.1 python: Python 2.7.10 |Anaconda 2.2.0 numpy: 1.10.1 sklearn: 0.17

The full error information is shown as follow

 picture

Most helpful comment

@coldfog @garciaev I don't know if it is still relevant for you, but I ran into the same problem using joblib without scikit-learn.

The reason is the max_nbytes parameter within the Parallel invocation of the Joblib-library when you set n_jobs>1, which is 1M by default. The definition of this parameter is: "Threshold on the size of arrays passed to the workers that triggers automated memory mapping in temp_folder". More details can be found here: https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html#

So, once the arrays pass the size of 1M, joblib will throw the error "ValueError: assignment destination is read-only". In order to overcome this, the parameter has to be set higher, e.g. max_nbytes='50M'.

If you want a quick-fix, you can add max_nbytes='50M' to the file "sklearn/decomposition/dict_learning.py" at line 297 in the Parallel class initiation to increase the allowed size of temporary files.

 picture

All 23 comments

I am taking a look at this

vighneshbirodkar picture

Is it not related to #5481, which seems more generic?

lesteve picture

It is, but SparseEncoder is not an estimator

Not that it matters but SparseCoder is an estimator:

I guess the error wasn't detected in #4807 as it is raised only when using algorithm='omp' . It should be raised when testing read only data on OrthogonalMatchingPursuit though.

arthurmensch picture

Was there a resolution to this bug? I've run into something similar while doing n_jobs=-1 on RandomizedLogisticRegression, and didn't know whether I should open a new issue here. Here's the top of my stack:

Someone ran into the same exact problem on StackOverflow - ValueError: output array is read-only . Both provided solutions on SO are useless (the first one doesn't even bother solving the problem, and the second one is solving the problem by bypassing joblib completely).

alichaudry picture

@alichaudry I just commented on a similar issue here .

I confirm that there is an error and it is floating in nature.

sklearn.decomposition.SparseCoder(D, transform_algorithm = 'omp', n_jobs=64).transform(X)

if X.shape[0] > 4000 it fails with ValueError: assignment destination is read-only If X.shape[0] <100 it is ok.

OS: Linux 3.10.0-327.13.1.el7.x86_64 python: Python 2.7.5 numpy: 1.10.1 sklearn: 0.17

fkrasnov picture

Hi there, I'm running into the same problem, using MiniBatchDictionaryLearning with jobs>1. I see a lot of referencing to other issues, but was there ever a solution to this? Sorry in advance if a solution was mentioned and I missed it.

OS: OSX python: 3.5 numpy: 1.10.1 sklearn: 0.17

zaino picture

The problem is in modifying arrays in-place. @lesteve close as duplicate of #5481?

amueller picture

currently I am still dealing with this issue and it is nearly a year since. this is still an open issue.

wderekjones picture

If you have a solution, please contribute it, @williamdjones

jnothman picture

https://github.com/scikit-learn/scikit-learn/pull/4807 is probably the more advanced effort to address this.

agramfort picture

@williamdjones I was not suggesting that it's solved, but that it's an issue that is reported at a different place, and having multiple issues related to the same problem makes keeping track of it harder.

Not sure where to report this, or if it's related, but I get the ValueError: output array is read-only when using n_jobs > 1 with RandomizedLasso and other functions.

ghost picture

@JGH1000 NOT A SOLUTION, but I would try using a random forest for feature selection instead since it is stable and has working joblib functionality.

Thanks @williamdjones , I used several different methods but found that RandomizedLasso works best for couple of particular datasets. In any case, it works but a bit slow. Not a deal breaker.

@JGH1000 No problem. If you don't mind, I'm curious about the dimensionality of the datasets for which RLasso was useful versus those for which it was not.

@williamdjones it was a small sample size (40-50), high-dimension (40,000-50,000) dataset. I would not say that other methods were bad, but RLasso provided results/ranking that were much more consistent with several univariate tests + domain knowledge. I guess this might not be the 'right' features but I had more trust in this method. Shame to hear it will be removed from scikit.

The problem still seems to exist on 24 core Ubuntu processor for RLasso with n_jobs = -1 and sklearn 0.19.1

garciaev picture

Just to complement @lvermue answer. I did what he suggested, but instead inside sklearn/decomposition/dict_learning.py the Parallel class that is instantiated doesn't have the max_nbytes set. What worked for me was to increase the default max_nbytes inside sklearn/externals/joblib/parallel.py from "1M" to "10M" I think you can put more if needed.

chriys picture

And you can find max_nbytes at line 475 of file parallel.py

rishabhgarg7 picture

Related issues

shauli-ravfogel picture

问 错误是什么:"ValueError: assignment destination is read-only"? EN

当我使用cv2.imread()打开一个jpg文件时,它有时会失败,这可能是由于我使用的BGR格式造成的。所以我改用PLT来使用RGB。

然而,当我将图像转换为灰度时,我得到了一个错误。:"ValueError: assignment destination is read-only“如何更改此处的代码以避免这种情况?

Stack Overflow用户

发布于 2019-03-17 15:24:08

此行似乎是多余的,并导致错误,请删除它:

发布于 2020-12-14 23:27:24

不确定PIL库,但是如果它是基于numpy数组的,可以试试这个( https://numpy.org/doc/stable/reference/generated/numpy.copy.html ):

这将通过初始化一个完全不同的实例来创建一个完整的副本,而不是引用它(使用'=‘操作符来引用它)

https://pythonexamples.org/python-numpy-duplicate-copy-array/#:~:text=Following%20is%20the%20syntax%20to%20make%20a%20copy,of%20array1.%20Example%201%3A%20Copy%20Array%20using%20Numpy

https://stackoverflow.com/questions/55204505

 alt=

Copyright © 2013 - 2024 Tencent Cloud. All Rights Reserved. 腾讯云 版权所有 

深圳市腾讯计算机系统有限公司 ICP备案/许可证号: 粤B2-20090059  深公网安备号 44030502008569

腾讯云计算(北京)有限责任公司 京ICP证150476号 |   京ICP备11018762号 | 京公网安备号11010802020287

  • Stack Overflow for Teams Where developers & technologists share private knowledge with coworkers
  • Advertising & Talent Reach devs & technologists worldwide about your product, service or employer brand
  • OverflowAI GenAI features for Teams
  • OverflowAPI Train & fine-tune LLMs
  • Labs The future of collective knowledge sharing
  • About the company Visit the blog

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Get early access and see previews of new features.

Replace values of a numpy index array with values of a list

Suppose you have a numpy array and a list:

I'd like to replace values in an array, so that 1 is replaced by 0, and 2 by 10.

I found a similar problem here - http://mail.python.org/pipermail//tutor/2011-September/085392.html

But using this solution:

Throws me an error:

I guess that's because I can't really write into a numpy array.

P.S. The actual size of the numpy array is 514 by 504 and of the list is 8.

jns's user avatar

6 Answers 6

Well, I suppose what you need is

alex_jordan's user avatar

  • 7 When I do this, I get "assignment destination is read-only", do you know why this is? –  wolfsatthedoor Commented May 30, 2014 at 3:19
  • 1 This is significantly simpler than the other solutions, thank you –  reabow Commented Apr 17, 2015 at 13:07
  • What would we do, if we want to change the elements at indexes which are multiple of given n, simultaneously. Like simultaneously change a[2],a[4],a[6].... for n = 2., what should be done? –  lavee_singh Commented Oct 7, 2015 at 18:57

Instead of replacing the values one by one, it is possible to remap the entire array like this:

Credit for the above idea goes to @JoshAdel . It is significantly faster than my original answer:

I benchmarked the two versions this way:

Community's user avatar

  • Thanks unutbu! I'll accept your answer as it's more versatile. Cheers. –  abudis Commented Nov 26, 2012 at 20:40
  • "index = np.digitize(a.reshape(-1,), palette)-1" could be replaced with "index = np.digitize(a.reshape(-1,), palette, right=True)", right? (=True?) –  Pietro Battiston Commented Mar 11, 2015 at 13:06
  • @PietroBattiston: Since every value in a is in palette , yes I think right=True returns the same result. Thanks for the improvement! –  unutbu Commented Mar 11, 2015 at 13:30
  • What would we do, if we wanted to change values at indexes which are multiple of given n, like a[2],a[4],a[6],a[8]..... for n=2? –  lavee_singh Commented Oct 7, 2015 at 19:01

Read-only array in numpy can be made writable:

This will then allow assignment operations like this one:

The real problem was not assignment itself but the writable flag.

DorinPopescu's user avatar

I found another solution with the numpy function place . (Documentation here )

Using it on your example:

Linda's user avatar

  • This is perfect and simple. Thank you @Linda! –  amc Commented Sep 1, 2021 at 14:21

You can also use np.choose(idx, vals) , where idx is an array of indices that indicate which value of vals should be put in their place. The indices must be 0-based, though. Also make sure that idx has an integer datatype. So you would only need to do:

I was unable to set the flags, or use a mask to modify the value. In the end I just made a copy of the array.

Paul Bendevis's user avatar

Your Answer

Reminder: Answers generated by artificial intelligence tools are not allowed on Stack Overflow. Learn more

Sign up or log in

Post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

Not the answer you're looking for? Browse other questions tagged python numpy indexing or ask your own question .

  • Featured on Meta
  • Upcoming sign-up experiments related to tags
  • The return of Staging Ground to Stack Overflow
  • Policy: Generative AI (e.g., ChatGPT) is banned
  • Should we burninate the [lib] tag?
  • What makes a homepage useful for logged-in users

Hot Network Questions

  • Isn't it problematic to look at the data to decide to use a parametric vs. non-parametric test?
  • Collaborators write their departments for my (undergraduate) affiliation
  • What is the difference between a Blockchain link (Blink) and an Action?
  • Why do many philosophers consider a past-eternal universe to be self-explanatory but not a universe that began with no cause?
  • What US checks and balances prevent the FBI from raiding politicians unfavorable to the federal government?
  • Derivative of the Score Function in Fisher Information
  • Weird behavior by car insurance - is this legit?
  • Is it legal to discriminate on marital status for car insurance/pensions etc.?
  • What's the meaning of "nai gar"?
  • What rights does an employee retain, if any, who does not consent to being monitored on a work IT system?
  • How to turn a desert into a fertile farmland with engineering?
  • Vespertide affairs
  • What exactly is beef bone extract, beef extract, beef fat (all powdered form) and where can I find it?
  • Can a planet have a warm, tropical climate both at the poles and at the equator?
  • What does "acceptable" refer to in Romans 12:2?
  • Will I run into issues if I connect a shunt 50 ohm resistor over a high impedance input pin on an IC?
  • Why is ACAT not identifying the difference between 2 bridge sites on a surface slab?
  • Impact of high-power USB-C chargers on Li-ion battery longevity
  • How exactly does a seashell make the humming sound?
  • How to find your contract and employee handbook in the UK?
  • Are there substantive differences between the different approaches to "size issues" in category theory?
  • Why does c show up in Schwarzschild's equation for the horizon radius?
  • Exception handling: 'catch' without explicit 'try'
  • DIY Rack/Mount In Trailer

assignment destination is read only joblib

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception "assignment destination is read-only" when reading from a read-only array #3758

@khinsen

khinsen commented Sep 17, 2013

Some array methods fail when passed a immutable view of an array, although they should not try to write to it. Tested with 1.6.1 and 1.7.1.

A short demonstration:

numpy as np class ImmutableNDArray(np.ndarray): def __new__(cls, *args, **kwargs): return np.ndarray.__new__(cls, *args, **kwargs) def __array_finalize__(self, obj): self.setflags(write=False) indices = np.array([2, 3], np.uint16).view(ImmutableNDArray) data = np.array([1, 1, 5, 5, 4, 3]) # These two raise the exception # ValueError: assignment destination is read-only print data.take(indices) print data[:2].repeat(indices) # These two works fine print data.take(np.array(indices)) print data[:2].repeat(np.array(indices))
  • ❤️ 1 reaction

@njsmith

njsmith commented Sep 17, 2013

The problem isn't that the input is read-only. These examples work fine
when passed a read-only array.

In [1]: indices = np.array([2, 3], np.uint16)

In [2]: data = np.array([1, 1, 5, 5, 4, 3])

In [3]: indices.setflags(write=False)

In [4]: data.take(indices)
Out[4]: array([5, 5])

In [5]: data[:2].repeat(indices)
Out[5]: array([1, 1, 1, 1, 1])

The problem is that .take and .repeat are instantiating an instance of your
subclass for their return value (because they try to preserve array
subclasses), this array is set to read-only by your
, and then the methods try to fill in the output array
and fail.

On Tue, Sep 17, 2013 at 4:44 PM, khinsen wrote:

A short demonstration:

import numpy as np
class ImmutableNDArray(np.ndarray):

indices = np.array([2, 3], np.uint16).view(ImmutableNDArray)
data = np.array([1, 1, 5, 5, 4, 3])


Reply to this email directly or view it on GitHubhttps://github.com/
.

Sorry, something went wrong.

khinsen commented Sep 18, 2013

After a quick check, I confirm: a plain array with works, but not an array of a subclass that enforces .

It's not quite clear to me why in , NumPy wants the result to be the type of rather than the type of . But then, I guess the subclass rules are rather complex and this may be a surprising side effect.

However, converting the result array to some subclass setting the values seems like a bug to me. The array finalizer should be called the array contents have been defined. The fact that allocates an array and then writes to it should be an implementation detail.

njsmith commented Sep 18, 2013

I agree that 'a.take(b)' using b's type for the return is pretty weird. (If
b has type 'list', should it return a list?) None of the subclass stuff
makes much sense to me though.

And unfortunately if you're going to be messing about inside the array
object initialization code then you're going to run into implementation
details. You could make an argument (PR) for changing this particular
implementation detail to something different if you want I guess.

On Wed, Sep 18, 2013 at 11:25 AM, khinsen wrote:

It's not quite clear to me why in a.take(b), NumPy wants the result to be
the type of b rather than the type of a. But then, I guess the subclass
rules are rather complex and this may be a surprising side effect.

However, converting the result array to some subclass setting
the values seems like a bug to me. The array finalizer should be called *
after* the array contents have been defined. The fact that take allocates
an array and then writes to it should be an implementation detail.


Reply to this email directly or view it on GitHubhttps://github.com/ #issuecomment-24654526
.

@charris

charris commented Feb 23, 2014

I agree that the result of take should be the same type as a. This may be a bug, so marking as such until resolved.

@charris

No branches or pull requests

@charris

COMMENTS

  1. Error in Joblib Parallelisation Python : assignment destination is read

    Thanks for contributing an answer to Stack Overflow! Please be sure to answer the question.Provide details and share your research! But avoid …. Asking for help, clarification, or responding to other answers.

  2. ValueError: assignment destination is read-only, when ...

    When I run SparseCoder with n_jobs > 1, there is a chance to raise exception ValueError: assignment destination is read-only. The code is shown as follow: from sklearn.decomposition import SparseCoder import numpy as np data_dims = 4103 ...

  3. ValueError: assignment destination is read-only [Solved]

    The NumPy "ValueError: assignment destination is read-only" occurs when you try to assign a value to a read-only array. To solve the error, create a copy of the read-only array and modify the copy. You can use the flags attribute to check if the array is WRITABLE. main.py. from PIL import Image.

  4. ValueError: assignment destination is read-only #48

    No Describe the bug When using the joblib as the parallel distributor, if the number of processes / size of them ... Skip to content. Navigation Menu Toggle navigation. Sign in Product Actions. Automate any workflow Packages ... ValueError: assignment destination is read-only """ The above exception was the direct cause of the following exception:

  5. Embarrassingly parallel for loops

    Joblib version 0.12 and later are no longer subject to this problem thanks to the use of loky as the new default backend for process-based parallelism. Prior to Python 3.4 the 'multiprocessing' backend of joblib can only use the fork strategy to create worker processes under non-Windows systems. This can cause some third-party libraries to ...

  6. When using the Catboost model to call sklearn's permanence ...

    As an example, when I use the Catboost model to call sklearn's permanence importance to calculate the contribution rate, the joblib library prompts ValueError: assignment destination is read-only. This cannot be solved by modifying max nbytes, even if I increase it to 99999M.

  7. python

    joblib.Parallel(n_jobs=-1)(joblib.delayed(im_ll)(i) for i in range(0,2045)) ... ValueError: assignment destination is read-only. I was trying to run the specific function im_ll for 2000+ times, it will take 40+ minutes using for loop so I tried Parallelizing the for loop using Joblib. The Function runs smoothly when called individually but I ...

  8. Save and Load Machine Learning Models with joblib in Python ...

    joblib.dump to serialize an object hierarchy joblib.load to deserialize a data stream. Save the model. from sklearn.externals import joblib joblib.dump(knn, 'my_model_knn.pkl.pkl') Load the model ...

  9. System error: assignment destination is read-only

    High: It blocks me to complete my task. Hi, I would like to ask about an issue that I encountered when I try to distribute my work on multiple cpu nodes using ray. My input file is a simulation file consisting of multiple time frames, so I would like to distribute the calculation of one frame to one task. It works fine when I just used pool from the multiprocessing python library, where only ...

  10. issue with Parallel and pandas series backend interaction with large

    Hi there, I'm using Parallel on my function "make_masses". I'm only asking it to loop 100 times, but the data that goes into the function has about 1.5 million rows (and two columns). Parallel works perfectly fine on smaller slices of th...

  11. [SOLVED] Valueerror assignment destination is read-only

    Solutions for ValueError: assignment destination is read-only. Here are some solutions to solve the ValueError: assignment destination is read-only: Solution 1: Use Mutable Data Structures. Instead of using immutable data structures like tuples or strings, switch to mutable ones such as lists or dictionaries.

  12. Determine how a param is being set as readonly

    But in your code you should definitely not use it since it will skip exceptions. This context manager would be better off in a separate module dedicated to the docs. I have a class similar to the example below class P (param.Parameterized): a = param.Array () r = param.Array () d = param.DataFrame () def setup (self): axis_names = [f"Axis_ {i+1 ...

  13. ValueError: assignment destination is read-only #14972

    ValueError: assignment destination is read-only """ The above exception was the direct cause of the following exception: ValueErrorTraceback (most recent call last) in 5 6----> 7 logit_scores = cross_val_score(full_pipeline_with_predictor, x_train, y_train, scoring='roc_auc', cv=5, n_jobs = -1) 8 9 logit_score_train = logit_scores.mean()

  14. temp_folder and mmap_mode parameters in Parallel #1373

    By default it's using a shared memory folder (I think it's /run/shm on linux for instance). mmap_mode makes it possible to change the mode. You should probably never use mmap_mode="r+" because it means that you allow one worker to corrupt the input data of another worker processing the same argument concurrently.

  15. Scikit-learn: ValueError: assignment destination is read-only, when

    When I run SparseCoder with n_jobs > 1, there is a chance to raise exception ValueError: assignment destination is read-only.The code is shown as follow: from sklearn.decomposition import SparseCoder import numpy as np data_dims = 4103 init_dict = np.random.rand(500, 64) data = np.random.rand(data_dims, 64) c = SparseCoder(init_dict , transform_algorithm='omp', n_jobs=8).fit_transform(data)

  16. 错误是什么:"ValueError: assignment destination is read-only"?-腾讯云开发者社区-腾讯云

    当我使用cv2.imread()打开一个jpg文件时,它有时会失败,这可能是由于我使用的BGR格式造成的。所以我改用PLT来使用RGB。 import matplotlib.pyplot as pltimport numpy as npdef rgb_to_gray(img): grayImage = np.zeros(img.shape) R = np.array(

  17. Numpy: assignment destination is read-only

    Numpy: assignment destination is read-only - broadcast. Ask Question Asked 6 years, 2 months ago. Modified 6 years, 2 months ago. Viewed 18k times 9 I start with a 2D array and want to broadcast it to a 3D array (eg. from greyscale image to rgb image). This is the code I use.

  18. Replace values of a numpy index array with values of a list

    ValueError: assignment destination is read-only I guess that's because I can't really write into a numpy array. P.S. The actual size of the numpy array is 514 by 504 and of the list is 8. python; numpy; indexing; Share. Follow edited Nov 29, 2017 at 22:47. jns. 1,280 1 1 gold ...

  19. Exception "assignment destination is read-only" when reading from a

    After a quick check, I confirm: a plain array with writeable=False works, but not an array of a subclass that enforces writeable=False.. It's not quite clear to me why in a.take(b), NumPy wants the result to be the type of b rather than the type of a.But then, I guess the subclass rules are rather complex and this may be a surprising side effect.